Riddler Express: Santa needs some help with math

URL: https://fivethirtyeight.com/features/santa-needs-some-help-with-math/

Santa Claus is getting up there in age, and his memory has begun to falter. (After all, why do you think he keeps a list?) It’s gotten so bad that this year Santa forgot what order to put the reindeer in. Obviously, he remembers that Rudolph goes first because of the red organic light bulb in the middle of his face, but big guy just can’t remember what to do with the other eight.

If he doesn’t get the right order, the aerodynamics of his sleigh will be all wrong and he won’t be able to get all of his deliveries done in time. Yes, Santa has Moneyballed Christmas Eve. Luckily, the reindeer know where they should each be, but since they’re just animals they can only grunt in approval if they are put in the right spot.

Determined to get it right, Santa first creates a list of the reindeer in some random order. He then goes to the first position and harnesses each reindeer one by one, starting at the top of his list. When a reindeer grunts, Santa leaves it in that correct position, moves onto the next position, and works down that same list once again.

If harnessing a reindeer into any spot takes one minute, how long on average would it take Santa to get the correct reindeer placement?

Extra credit: Is there a strategy that Santa could use that does better?

Set up simulation

I started with just 100 trials to test my set up, then upped it to 1,000,000 once I proved it out.
I debated how to set this up - I needed to adjust the probabilty for the first (and subsequent positions) baseed on remaing reindeer.
I thnk this proves out and adjusts appropiately -

sim <- crossing(trial = 1:1e6,
               Position = 1:8) %>%
        mutate(Attempts = case_when(
                 rbinom(n(), 1, 1 / (9 - Position)) == 1 ~ 1,
                 rbinom(n(), 1, 1 / (8 - Position)) == 1 ~ 2,
                 rbinom(n(), 1, 1 / (7 - Position)) == 1 ~ 3,
                 rbinom(n(), 1, 1 / (6 - Position)) == 1 ~ 4,
                 rbinom(n(), 1, 1 / (5 - Position)) == 1 ~ 5,
                 rbinom(n(), 1, 1 / (4 - Position)) == 1 ~ 6,
                 rbinom(n(), 1, 1 / (3 - Position)) == 1 ~ 7,
                 TRUE ~ 8) # This should only happen on first position
               )

Test Simulation

I visually checked and it made sense, but I wanted a quick sanity check. There should never be more Attempts then Reindeer_left.

sim %>%
  mutate(Reindeer_Left_Start = 9 - Position,
         Test = Reindeer_Left_Start < Attempts) %>% 
  summarise(sum(Test))

## # A tibble: 1 x 1
##   `sum(Test)`
##         <int>
## 1           0

Group by Trial for summary stats

Since it is one minute per attempt, I can use attempts = minutes.

sim %>% 
  group_by(trial) %>% 
  summarise(Total_Minutes = sum (Attempts)) %>% 
  summarise(Min_Time = min(Total_Minutes),
            Max_Time = max(Total_Minutes),
            Median_Time = median(Total_Minutes),
            Avg_Time = mean(Total_Minutes))

## # A tibble: 1 x 4
##   Min_Time Max_Time Median_Time Avg_Time
##      <dbl>    <dbl>       <dbl>    <dbl>
## 1        8       36          22     22.0

Riddler Express Answer:

Approximately 22 minutes

Extra Credit:

Extra credit: Is there a strategy that Santa could use that does better?

Riddler Classic: Santa needs some help with math

From Steven Pratt, the best way to spread Christmas cheer is singing loud for all to hear:

In Santa’s workshop, elves make toys during a shift each day. On the overhead radio, Christmas music plays, with a program randomly selecting songs from a large playlist.

During any given shift, the elves hear 100 songs. A cranky elf named Cranky has taken to throwing snowballs at everyone if he hears the same song twice. This has happened during about half of the shifts. One day, a mathematically inclined elf named Mathy tires of Cranky’s sodden outbursts. So Mathy decides to use what he knows to figure out how large Santa’s playlist actually is.

Help Mathy out: How large is Santa’s playlist?

Set up Simulation

sim2 <- crossing(trial = 1:100,
                 playlist = seq(from = 1000, to = 10000, by = 100),
                 songs_played = 100) %>% 
        rowwise() %>% 
        mutate(times_same_song_played = sum(duplicated(
                                            sample(playlist, 100, replace = TRUE))),
               same_song_played = ifelse(times_same_song_played > 0, TRUE, FALSE)
               )
sim2

## Source: local data frame [9,100 x 5]
## Groups: <by row>
## 
## # A tibble: 9,100 x 5
##    trial playlist songs_played times_same_song_played same_song_played
##    <int>    <dbl>        <dbl>                  <int> <lgl>           
##  1     1     1000          100                      4 TRUE            
##  2     1     1100          100                      5 TRUE            
##  3     1     1200          100                      3 TRUE            
##  4     1     1300          100                      6 TRUE            
##  5     1     1400          100                      3 TRUE            
##  6     1     1500          100                      4 TRUE            
##  7     1     1600          100                      3 TRUE            
##  8     1     1700          100                      4 TRUE            
##  9     1     1800          100                      3 TRUE            
## 10     1     1900          100                      2 TRUE            
## # ... with 9,090 more rows

###Test Simluation & review results

test.sim2 <- sim2 %>% 
  group_by(playlist) %>% 
  summarise(percent_times_same_song_played = scales::percent(sum(same_song_played) / max(trial))) 

test.sim2

## # A tibble: 91 x 2
##    playlist percent_times_same_song_played
##       <dbl> <chr>                         
##  1     1000 100%                          
##  2     1100 98%                           
##  3     1200 99%                           
##  4     1300 97%                           
##  5     1400 98%                           
##  6     1500 94%                           
##  7     1600 94%                           
##  8     1700 93%                           
##  9     1800 96%                           
## 10     1900 94%                           
## # ... with 81 more rows

This seems to make sense. I am going to adjust my playlist length and rerun to narrow in on the 50% occurence of a repeated song, which appear to be near 7,000 range.

Adjustments to Simulation

sim3 <- crossing(trial = 1:100000,
                 playlist = seq(from = 6500, to = 7500, by = 25),
                 songs_played = 100) %>% 
        rowwise() %>% 
        mutate(times_same_song_played = sum(duplicated(
                                            sample(playlist, 100, replace = TRUE))),
               same_song_played = ifelse(times_same_song_played > 0, TRUE, FALSE)
               )

sim3 %>% 
  group_by(playlist) %>% 
  summarise(percent_times_same_song_played = sum(same_song_played) / max(trial))

## # A tibble: 41 x 2
##    playlist percent_times_same_song_played
##       <dbl>                          <dbl>
##  1     6500                          0.537
##  2     6525                          0.535
##  3     6550                          0.532
##  4     6575                          0.529
##  5     6600                          0.531
##  6     6625                          0.528
##  7     6650                          0.525
##  8     6675                          0.527
##  9     6700                          0.526
## 10     6725                          0.522
## # ... with 31 more rows

sim3 %>% 
  group_by(playlist) %>% 
  summarise(percent_times_same_song_played = sum(same_song_played) / max(trial)) %>% 
  ggplot(aes(x = playlist, y = percent_times_same_song_played)) +
  geom_line() +
  geom_hline(yintercept = .5, color = "red") +
  xlab('Songs on Playlist') +
  ylab('Percent Change of a Repeat Song')

Riddler Classic Answer:

Approximately 7,200 based on a quick review
I’m worried my logic was flawed on this one

Santa needs some help

Matt Leary

December 23, 2018