Sometimes, while reading an innocuousseeming article, I stumble across an aside that makes me sit bolt upright and mutter something incredulous. Asides like this one:
A counterintuitive property of cointossing: If Alice tosses a coin until she sees a head followed by a tail, and Bob tosses a coin until he sees two heads in a row, then on average, Alice will require four tosses while Bob will require six tosses (try this at home!), even though headtail and headhead have an equal chance of appearing after two coin tosses.
— Wired
This was a surprise! The four possible outcomes of two tosses are equally likely, so it seems weird that a headstails outcome would take longer to reach than a headsheads. Weird enough to try it at home – at least by programming. Let's write some Ruby and see if we get the same result. (I recommend opening irb
and exploring these examples for yourself if you want to fully understand them.)
Checking some assumptions
First of all, let's agree to toss a coin by picking a random symbol from an array^{1}:
def coin_toss
%i(heads tails).sample #=> :heads or :tails.
end
And let's confirm that this is close enough to 50/50, by counting the result of tossing a coin 100,000 times:
results = {heads: 0, tails: 0}
1000000.times { results[coin_toss] += 1 }
puts "After 100000 tosses we saw #{results[:heads]} heads and #{results[:tails]} tails."
.sample
chooses an element at random, so the result will be a little different each time. I ran this program 10 times, and got these results:
After 100000 tosses we saw 50131 heads and 49869 tails.
After 100000 tosses we saw 49845 heads and 50155 tails.
After 100000 tosses we saw 50094 heads and 49906 tails.
After 100000 tosses we saw 49672 heads and 50328 tails.
After 100000 tosses we saw 50062 heads and 49938 tails.
After 100000 tosses we saw 50046 heads and 49954 tails.
After 100000 tosses we saw 50003 heads and 49997 tails.
After 100000 tosses we saw 50094 heads and 49906 tails.
After 100000 tosses we saw 50124 heads and 49876 tails.
After 100000 tosses we saw 49838 heads and 50162 tails.
I think these results look OK, but the next thing I tried was busting out some statistics and checking the standard deviation. You can think of it as a measure of how closelyclustered our results are – we'd expect to get a low standard deviation if .sample
is fair. Calculating the standard deviation is a little bit complicated, so I used the descriptive_statistics
gem to make it easier. Let's calculate the standard deviation of the number of heads in each run:
require 'descriptive_statistics'
[50131, 49845, 50094, 49672, 50062, 50046, 50003, 50094, 50124, 49838].standard_deviation #=> 146.014
But is 146.014 low or not? I have no idea! This is where my statistics knowledge runs out. For now, let's presume our eyeballs are correct and our coin tosses are fair.
Back to the question
If we can toss a coin fairly, we can return to our original question: how many tosses, on average, does it take to reach a given combination?
We'll need a target combination, we need to toss at least twice, and we want to toss until we hit the target:
target = [:heads, :heads]
tosses = [coin_toss, coin_toss]
until tosses.last(2) == target
tosses << coin_toss
end
I ran this in irb
and I got [:tails, :tails, :heads, :tails, :heads, :heads]
. It works! Let's turn this into a method so we can reuse it:
def tosses_until(target)
tosses = [coin_toss, coin_toss]
until tosses.last(2) == target
tosses << coin_toss
end
tosses
end
Running the experiment repeatedly will make our result more reliable. If something weird happens once it could be a fluke, but you can't fluke something thousands of times. We could use the .times
method again, and build up an array of results like we built the array of tosses:
experiments = []
100000.times { experiments << tosses_until([:heads, :heads]) }
Or we can make this shorter by using Ruby's .map
method. .map
applies a method to every element in a list. It's normally used to modify an existing list:
["cat", "dog", "avocado"].map { t t.upcase } #=> ["CAT", "DOG", "AVOCADO"]
(1..4).map { n n * 3 } #=> [3, 6, 9, 12]
But it doesn't matter if we throw the original elements away instead. You can try this in the console, but beware! It's going to print out all 100,000 results.
experiments = (0..100000).map { tosses_until([:heads, :heads]) }
It's not really relevant to our experiment, but I wondered what the shortest and longest sequence until our target was. You might expect that we can use experiments.min
and experiments.max
to find out:
experiments.min #=> [:heads, :heads]
experiments.min.length #=> 2
experiments.max #=> [:tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :heads, :tails, :heads, :tails, :heads, :heads]
experiments.max.length #=> 21
But that's not quite right^{2} for the maximum case. It looks right, though – a handy reminder that verifying data by eye can lead you astray. Instead, we need to use .max_by
to explicitly look at the length of the array:
experiments.max_by { e e.length }
This pattern – calling a method on the value passed into the block – is common, so Ruby provides a shorthand for this:
experiments.max_by(&:length) #=> [:heads, :tails, :tails, :tails, :heads, :tails, :tails, :tails, :heads, :tails, :tails, :tails, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :heads, :tails, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :heads, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :tails, :heads, :heads]
experiments.max_by(&:length).length #=> 61
Let's put all this together in one place, and add some output about our results:
def coin_toss
%i(heads tails).sample #=> :heads or :tails.
end
def tosses_until(target)
tosses = [coin_toss, coin_toss]
until tosses.last(2) == target
tosses << coin_toss
end
tosses
end
experiments = (0..100000).map { tosses_until([:heads, :heads]) }
average_toss_count = experiments.reduce(0) { sum, n sum + n.length } / experiments.length.to_f # We'll talk about this line below.
puts "Our shortest sequence was #{experiments.min_by(&:length)}"
puts "Our longest sequence was #{experiments.max_by(&:length)}"
puts "On average, we had to toss #{average_toss_count} times before (heads, heads) came up."
.reduce
is a close cousin of .map
. .map
does something to every element in a list; .reduce
takes two elements from a list and boils them down into one. It does that repeatedly to produce a final value:
[1, 2].reduce { a, b a + b } #=> 3
[1, 2, 3].reduce { a, b a + b } #=> 6: [1, 2, 3] → [3, 3] → 6.
[1, 2, 3, 4].reduce { a, b a + b } #=> 10: [1, 2, 3, 4] → [3, 3, 4] → [6, 4] → 10.
You can also give .reduce
a starting value, which is what we did in our program:
[1, 2].reduce(10) { sum, a sum + a } #=> 13: 10 + 1 = 11 then 11 + 2 = 13.
[1, 2, 3].reduce(10) { total, a total + (a * 2) } #=> 22.
We started our toss count at 0, then added the length of each run to that total. Finally, we divided it by the total number of runs to get an average. The .to_f
on the end converts the length to a floating point number, because we'd like to see the decimal places in the result.
9 / 2 #=> 4; really "4 remainder 1", but Ruby throws the remainder away
9 / 2.to_f #=> 4.5
Simplifying our code
This works, but is more complicated than it needs to be. Our goal was to find out how many tosses, on average, it takes to hit our target – we don't care about the sequence of tosses to get there. Let's change our tosses_until
method to return the number of tosses instead of the sequence itself:
def tosses_until(target)
tosses = [coin_toss, coin_toss]
until tosses.last(2) == target
tosses << coin_toss
end
tosses.length
end
This lets us make our trial run code simpler. We could build an array of the sequence counts, then add it up:
experiments = (0..100000).map { tosses_until([:heads, :heads]) }
average_toss_count = experiments.reduce(&:+) / experiments.length.to_f
We could skip the array entirely, and just maintain a total:
total_experiments = 100000
total_tosses = 0
total_experiments.times { total_tosses += tosses_until([:heads, :heads]) }
average_toss_count = total_tosses / total_experiments.to_f
Or we could use reduce
again:
total_experiments = 100000
total_tosses = (0..total_experiments).reduce(0) { sum, _ tosses_until([:heads, heads]) }
average_toss_count = total_tosses / total_experiments.to_f
The "best" version is a matter of taste, but personally I prefer the first version. It uses more memory, but that doesn't matter in experiments like these. It's the shortest code, we can find the longest run of tosses, and it's reasonably clear how it works once you get your head around .reduce
.
Let's put the first version into a method that runs the experiment and reports the outcome for a given target:
def coin_toss
%i(heads tails).sample
end
def tosses_until(target)
tosses = [coin_toss, coin_toss]
until tosses.last(2) == target
tosses << coin_toss
end
tosses.length
end
def average_toss_count(target, num_experiments)
experiments = (0..num_experiments).map { tosses_until(target) }
average_toss_count = experiments.reduce(&:+) / experiments.length.to_f
# sprintf formats the average so it prints to two decimal places only.
puts "On average, we had to toss #{sprintf('%.2f', average_toss_count)} times before #{target.inspect} came up. Our longest run was #{experiments.max} tosses."
end
The other cases
Now we have all the building blocks to run the experiment for each of the four possible outcomes:
targets = [[:heads, :heads], [:heads, :tails], [:tails, :heads], [:tails, :tails]]
targets.each { target average_toss_count(target, 100000) }
Which produces:
On average, we had to toss 5.98 times before [:heads, :heads] came up. Our longest run was 52 tosses.
On average, we had to toss 4.00 times before [:heads, :tails] came up. Our longest run was 22 tosses.
On average, we had to toss 3.99 times before [:tails, :heads] came up. Our longest run was 20 tosses.
On average, we had to toss 6.00 times before [:tails, :tails] came up. Our longest run was 55 tosses.
Sure enough, it takes longer on average to hit [:heads, :heads]
or [:tails, :tails]
than [:heads, :tails]
or [:tails, :heads]
, even though each outcome has an equal probability. It's still weird, but now I'm satisfied it's true.
Why does this happen?
Let's go back to Alice and Bob, who are targeting [:heads, :tails]
and [:heads, :heads]
respectively:
Player  Target 

Alice  H T 
Bob  H H 
Let's presume they both win their first toss – they both get a result they're looking for:
Player  Target  Result 1 

Alice  H T  H 
Bob  H H  H 
Then, presume they lose their second toss:
Player  Target  Result 1  Result 2 

Alice  H T  H  H 
Bob  H H  H  T 
There's now a major difference between the two players: Alice can hit her target on toss 3, but Bob can't until toss 4. Bob must start over after losing on toss 2; Alice's loss can be part of a win if she gets a tails on turn 3.
Exercises
If you'd like to explore this some more, here's some suggestions for things to try:
 Change the program so it runs the experiment a million times instead of 100,000.
 If we toss three coins, there's eight possible outcomes. How long does it take, on average, to hit each combination? Are there some sequences that take longer than others?
 We left our proof of a fair coin toss at "Yeah, that looks OK." Can you do better? How would you satisfy yourself that it's producing fair results?

%i()
is Ruby shorthand that generates an array of symbols.%i(foo bar baz)
means the same as[:foo, :bar, :baz]
. ↩ 
But why doesn't this work? When we call
.min
, Ruby uses the<=>
comparison operator to find the smallest value in the list.experiments
is an array of arrays;<=>
for arrays calls<=>
on each of the elements of the list in turn until it finds a difference. In this case, our list elements are symbols. Symbols get converted to strings before comparison, and"heads" < "tails"
because"h" < "t"
. So the upshot of this is thatexperiments.max
returns the result with the longest initial streak of tails.Yes, I had to look this up in the documentation. ↩