URL: https://fivethirtyeight.com/features/santa-needs-some-help-with-math/
Santa Claus is getting up there in age, and his memory has begun to falter. (After all, why do you think he keeps a list?) It’s gotten so bad that this year Santa forgot what order to put the reindeer in. Obviously, he remembers that Rudolph goes first because of the red organic light bulb in the middle of his face, but big guy just can’t remember what to do with the other eight.
If he doesn’t get the right order, the aerodynamics of his sleigh will be all wrong and he won’t be able to get all of his deliveries done in time. Yes, Santa has Moneyballed Christmas Eve. Luckily, the reindeer know where they should each be, but since they’re just animals they can only grunt in approval if they are put in the right spot.
Determined to get it right, Santa first creates a list of the reindeer in some random order. He then goes to the first position and harnesses each reindeer one by one, starting at the top of his list. When a reindeer grunts, Santa leaves it in that correct position, moves onto the next position, and works down that same list once again.
If harnessing a reindeer into any spot takes one minute, how long on average would it take Santa to get the correct reindeer placement?
Extra credit: Is there a strategy that Santa could use that does better?
sim <- crossing(trial = 1:1e6,
Position = 1:8) %>%
mutate(Attempts = case_when(
rbinom(n(), 1, 1 / (9 - Position)) == 1 ~ 1,
rbinom(n(), 1, 1 / (8 - Position)) == 1 ~ 2,
rbinom(n(), 1, 1 / (7 - Position)) == 1 ~ 3,
rbinom(n(), 1, 1 / (6 - Position)) == 1 ~ 4,
rbinom(n(), 1, 1 / (5 - Position)) == 1 ~ 5,
rbinom(n(), 1, 1 / (4 - Position)) == 1 ~ 6,
rbinom(n(), 1, 1 / (3 - Position)) == 1 ~ 7,
TRUE ~ 8) # This should only happen on first position
)
sim %>%
mutate(Reindeer_Left_Start = 9 - Position,
Test = Reindeer_Left_Start < Attempts) %>%
summarise(sum(Test))
## # A tibble: 1 x 1
## `sum(Test)`
## <int>
## 1 0
sim %>%
group_by(trial) %>%
summarise(Total_Minutes = sum (Attempts)) %>%
summarise(Min_Time = min(Total_Minutes),
Max_Time = max(Total_Minutes),
Median_Time = median(Total_Minutes),
Avg_Time = mean(Total_Minutes))
## # A tibble: 1 x 4
## Min_Time Max_Time Median_Time Avg_Time
## <dbl> <dbl> <dbl> <dbl>
## 1 8 36 22 22.0
Approximately 22 minutes
Extra credit: Is there a strategy that Santa could use that does better?
From Steven Pratt, the best way to spread Christmas cheer is singing loud for all to hear:
In Santa’s workshop, elves make toys during a shift each day. On the overhead radio, Christmas music plays, with a program randomly selecting songs from a large playlist.
During any given shift, the elves hear 100 songs. A cranky elf named Cranky has taken to throwing snowballs at everyone if he hears the same song twice. This has happened during about half of the shifts. One day, a mathematically inclined elf named Mathy tires of Cranky’s sodden outbursts. So Mathy decides to use what he knows to figure out how large Santa’s playlist actually is.
Help Mathy out: How large is Santa’s playlist?
sim2 <- crossing(trial = 1:100,
playlist = seq(from = 1000, to = 10000, by = 100),
songs_played = 100) %>%
rowwise() %>%
mutate(times_same_song_played = sum(duplicated(
sample(playlist, 100, replace = TRUE))),
same_song_played = ifelse(times_same_song_played > 0, TRUE, FALSE)
)
sim2
## Source: local data frame [9,100 x 5]
## Groups: <by row>
##
## # A tibble: 9,100 x 5
## trial playlist songs_played times_same_song_played same_song_played
## <int> <dbl> <dbl> <int> <lgl>
## 1 1 1000 100 4 TRUE
## 2 1 1100 100 5 TRUE
## 3 1 1200 100 3 TRUE
## 4 1 1300 100 6 TRUE
## 5 1 1400 100 3 TRUE
## 6 1 1500 100 4 TRUE
## 7 1 1600 100 3 TRUE
## 8 1 1700 100 4 TRUE
## 9 1 1800 100 3 TRUE
## 10 1 1900 100 2 TRUE
## # ... with 9,090 more rows
###Test Simluation & review results
test.sim2 <- sim2 %>%
group_by(playlist) %>%
summarise(percent_times_same_song_played = scales::percent(sum(same_song_played) / max(trial)))
test.sim2
## # A tibble: 91 x 2
## playlist percent_times_same_song_played
## <dbl> <chr>
## 1 1000 100%
## 2 1100 98%
## 3 1200 99%
## 4 1300 97%
## 5 1400 98%
## 6 1500 94%
## 7 1600 94%
## 8 1700 93%
## 9 1800 96%
## 10 1900 94%
## # ... with 81 more rows
sim3 <- crossing(trial = 1:100000,
playlist = seq(from = 6500, to = 7500, by = 25),
songs_played = 100) %>%
rowwise() %>%
mutate(times_same_song_played = sum(duplicated(
sample(playlist, 100, replace = TRUE))),
same_song_played = ifelse(times_same_song_played > 0, TRUE, FALSE)
)
sim3 %>%
group_by(playlist) %>%
summarise(percent_times_same_song_played = sum(same_song_played) / max(trial))
## # A tibble: 41 x 2
## playlist percent_times_same_song_played
## <dbl> <dbl>
## 1 6500 0.537
## 2 6525 0.535
## 3 6550 0.532
## 4 6575 0.529
## 5 6600 0.531
## 6 6625 0.528
## 7 6650 0.525
## 8 6675 0.527
## 9 6700 0.526
## 10 6725 0.522
## # ... with 31 more rows
sim3 %>%
group_by(playlist) %>%
summarise(percent_times_same_song_played = sum(same_song_played) / max(trial)) %>%
ggplot(aes(x = playlist, y = percent_times_same_song_played)) +
geom_line() +
geom_hline(yintercept = .5, color = "red") +
xlab('Songs on Playlist') +
ylab('Percent Change of a Repeat Song')