Exploring weird maths with code

Sometimes, while reading an innocuous-seeming article, I stumble across an aside that makes me sit bolt upright and mutter something incredulous. Asides like this one:

A counterintuitive property of coin-tossing: If Alice tosses a coin until she sees a head followed by a tail, and Bob tosses a coin until he sees two heads in a row, then on average, Alice will require four tosses while Bob will require six tosses (try this at home!), even though head-tail and head-head have an equal chance of appearing after two coin tosses.
— Wired

This was a surprise! The four possible outcomes of two tosses are equally likely, so it seems weird that a heads-tails outcome would take longer to reach than a heads-heads. Weird enough to try it at home – at least by programming. Let's write some Ruby and see if we get the same result. (I recommend opening irb and exploring these examples for yourself if you want to fully understand them.)

Checking some assumptions

First of all, let's agree to toss a coin by picking a random symbol from an array¹:

def coin_toss
  %i(heads tails).sample #=> :heads or :tails. 
end

And let's confirm that this is close enough to 50/50, by counting the result of tossing a coin 100,000 times:

results = {heads: 0, tails: 0}
1000000.times { results[coin_toss] += 1 }

puts "After 100000 tosses we saw #{results[:heads]} heads and #{results[:tails]} tails."

.sample chooses an element at random, so the result will be a little different each time. I ran this program 10 times, and got these results:

After 100000 tosses we saw 50131 heads and 49869 tails.
After 100000 tosses we saw 49845 heads and 50155 tails.
After 100000 tosses we saw 50094 heads and 49906 tails.
After 100000 tosses we saw 49672 heads and 50328 tails.
After 100000 tosses we saw 50062 heads and 49938 tails.
After 100000 tosses we saw 50046 heads and 49954 tails.
After 100000 tosses we saw 50003 heads and 49997 tails.
After 100000 tosses we saw 50094 heads and 49906 tails.
After 100000 tosses we saw 50124 heads and 49876 tails.
After 100000 tosses we saw 49838 heads and 50162 tails.

I think these results look OK, but the next thing I tried was busting out some statistics and checking the standard deviation. You can think of it as a measure of how closely-clustered our results are – we'd expect to get a low standard deviation if .sample is fair. Calculating the standard deviation is a little bit complicated, so I used the descriptive_statistics gem to make it easier. Let's calculate the standard deviation of the number of heads in each run:

require 'descriptive_statistics'
[50131, 49845, 50094, 49672, 50062, 50046, 50003, 50094, 50124, 49838].standard_deviation #=> 146.014

But is 146.014 low or not? I have no idea! This is where my statistics knowledge runs out. For now, let's presume our eyeballs are correct and our coin tosses are fair.

Back to the question

If we can toss a coin fairly, we can return to our original question: how many tosses, on average, does it take to reach a given combination?

We'll need a target combination, we need to toss at least twice, and we want to toss until we hit the target:

target = [:heads, :heads]
tosses = [coin_toss, coin_toss]

until tosses.last(2) == target
  tosses << coin_toss
end

I ran this in irb and I got [:tails, :tails, :heads, :tails, :heads, :heads]. It works! Let's turn this into a method so we can reuse it:

def tosses_until(target)
  tosses = [coin_toss, coin_toss]
  until tosses.last(2) == target
    tosses << coin_toss
  end
  tosses
end

Running the experiment repeatedly will make our result more reliable. If something weird happens once it could be a fluke, but you can't fluke something thousands of times. We could use the .times method again, and build up an array of results like we built the array of tosses:

experiments = []
100000.times { experiments << tosses_until([:heads, :heads]) }

Or we can make this shorter by using Ruby's .map method. .map applies a method to every element in a list. It's normally used to modify an existing list:

["cat", "dog", "avocado"].map { |t| t.upcase } #=> ["CAT", "DOG", "AVOCADO"]
(1..4).map { |n| n * 3 } #=> [3, 6, 9, 12]

But it doesn't matter if we throw the original elements away instead. You can try this in the console, but beware! It's going to print out all 100,000 results.

experiments = (0..100000).map { tosses_until([:heads, :heads]) }

It's not really relevant to our experiment, but I wondered what the shortest and longest sequence until our target was. You might expect that we can use experiments.min and experiments.max to find out:

experiments.min #=> [:heads, :heads]
experiments.min.length #=> 2
experiments.max #=> [:tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :heads, :tails, :heads, :tails, :heads, :heads]
experiments.max.length #=> 21

But that's not quite right² for the maximum case. It looks right, though – a handy reminder that verifying data by eye can lead you astray. Instead, we need to use .max_by to explicitly look at the length of the array:

experiments.max_by { |e| e.length }

This pattern – calling a method on the value passed into the block – is common, so Ruby provides a shorthand for this:

experiments.max_by(&:length) #=> [:heads, :tails, :tails, :tails, :heads, :tails, :tails, :tails, :heads, :tails, :tails, :tails, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :heads, :tails, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :heads, :heads]
experiments.max_by(&:length).length #=> 61

Let's put all this together in one place, and add some output about our results:

def coin_toss
  %i(heads tails).sample #=> :heads or :tails. 
end

def tosses_until(target)
  tosses = [coin_toss, coin_toss]
  until tosses.last(2) == target
    tosses << coin_toss
  end
  tosses
end

experiments = (0..100000).map { tosses_until([:heads, :heads]) }
average_toss_count = experiments.reduce(0) { |sum, n| sum + n.length } / experiments.length.to_f # We'll talk about this line below.

puts "Our shortest sequence was #{experiments.min_by(&:length)}"
puts "Our longest sequence was #{experiments.max_by(&:length)}"
puts "On average, we had to toss #{average_toss_count} times before (heads, heads) came up."

.reduce is a close cousin of .map. .map does something to every element in a list; .reduce takes two elements from a list and boils them down into one. It does that repeatedly to produce a final value:

[1, 2].reduce { |a, b| a + b } #=> 3
[1, 2, 3].reduce { |a, b| a + b } #=> 6: [1, 2, 3] → [3, 3] → 6.
[1, 2, 3, 4].reduce { |a, b| a + b } #=> 10: [1, 2, 3, 4] → [3, 3, 4] → [6, 4] → 10.

You can also give .reduce a starting value, which is what we did in our program:

[1, 2].reduce(10) { |sum, a| sum + a } #=> 13: 10 + 1 = 11 then 11 + 2 = 13.
[1, 2, 3].reduce(10) { |total, a| total + (a * 2) } #=> 22.

We started our toss count at 0, then added the length of each run to that total. Finally, we divided it by the total number of runs to get an average. The .to_f on the end converts the length to a floating point number, because we'd like to see the decimal places in the result.

9 / 2 #=> 4; really "4 remainder 1", but Ruby throws the remainder away
9 / 2.to_f #=> 4.5

Simplifying our code

This works, but is more complicated than it needs to be. Our goal was to find out how many tosses, on average, it takes to hit our target – we don't care about the sequence of tosses to get there. Let's change our tosses_until method to return the number of tosses instead of the sequence itself:

def tosses_until(target)
  tosses = [coin_toss, coin_toss]
  until tosses.last(2) == target
    tosses << coin_toss
  end
  tosses.length
end

This lets us make our trial run code simpler. We could build an array of the sequence counts, then add it up:

experiments = (0..100000).map { tosses_until([:heads, :heads]) }
average_toss_count = experiments.reduce(&:+) / experiments.length.to_f

We could skip the array entirely, and just maintain a total:

total_experiments = 100000
total_tosses = 0
total_experiments.times { total_tosses += tosses_until([:heads, :heads]) }
average_toss_count = total_tosses / total_experiments.to_f

Or we could use reduce again:

total_experiments = 100000
total_tosses = (0..total_experiments).reduce(0) { |sum, _| tosses_until([:heads, heads]) }
average_toss_count = total_tosses / total_experiments.to_f

The "best" version is a matter of taste, but personally I prefer the first version. It uses more memory, but that doesn't matter in experiments like these. It's the shortest code, we can find the longest run of tosses, and it's reasonably clear how it works once you get your head around .reduce.

Let's put the first version into a method that runs the experiment and reports the outcome for a given target:

def coin_toss
  %i(heads tails).sample
end

def tosses_until(target)
  tosses = [coin_toss, coin_toss]
  until tosses.last(2) == target
    tosses << coin_toss
  end
  tosses.length
end

def average_toss_count(target, num_experiments)
  experiments = (0..num_experiments).map { tosses_until(target) }
  average_toss_count = experiments.reduce(&:+) / experiments.length.to_f

  # sprintf formats the average so it prints to two decimal places only.
  puts "On average, we had to toss #{sprintf('%.2f', average_toss_count)} times before #{target.inspect} came up. Our longest run was #{experiments.max} tosses."
end

The other cases

Now we have all the building blocks to run the experiment for each of the four possible outcomes:

targets = [[:heads, :heads], [:heads, :tails], [:tails, :heads], [:tails, :tails]]
targets.each { |target| average_toss_count(target, 100000) }

Which produces:

On average, we had to toss 5.98 times before [:heads, :heads] came up. Our longest run was 52 tosses.
On average, we had to toss 4.00 times before [:heads, :tails] came up. Our longest run was 22 tosses.
On average, we had to toss 3.99 times before [:tails, :heads] came up. Our longest run was 20 tosses.
On average, we had to toss 6.00 times before [:tails, :tails] came up. Our longest run was 55 tosses.

Sure enough, it takes longer on average to hit [:heads, :heads] or [:tails, :tails] than [:heads, :tails] or [:tails, :heads], even though each outcome has an equal probability. It's still weird, but now I'm satisfied it's true.

Why does this happen?

Let's go back to Alice and Bob, who are targeting [:heads, :tails] and [:heads, :heads] respectively:

Player	Target
Alice	H T
Bob	H H

Let's presume they both win their first toss – they both get a result they're looking for:

Player	Target	Result 1
Alice	H T	H
Bob	H H	H

Then, presume they lose their second toss:

Player	Target	Result 1	Result 2
Alice	H T	H	H
Bob	H H	H	T

There's now a major difference between the two players: Alice can hit her target on toss 3, but Bob can't until toss 4. Bob must start over after losing on toss 2; Alice's loss can be part of a win if she gets a tails on turn 3.

Exercises

If you'd like to explore this some more, here's some suggestions for things to try:

Change the program so it runs the experiment a million times instead of 100,000.
If we toss three coins, there's eight possible outcomes. How long does it take, on average, to hit each combination? Are there some sequences that take longer than others?
We left our proof of a fair coin toss at "Yeah, that looks OK." Can you do better? How would you satisfy yourself that it's producing fair results?

%i() is Ruby shorthand that generates an array of symbols. %i(foo bar baz) means the same as [:foo, :bar, :baz]. ↩
But why doesn't this work? When we call .min, Ruby uses the <=> comparison operator to find the smallest value in the list. experiments is an array of arrays; <=> for arrays calls <=> on each of the elements of the list in turn until it finds a difference. In this case, our list elements are symbols. Symbols get converted to strings before comparison, and "heads" < "tails" because "h" < "t". So the upshot of this is that experiments.max returns the result with the longest initial streak of tails.

Yes, I had to look this up in the documentation. ↩