Part 1

Here are the data we are working with.

We create a grid of 10001 points along \([0, 1]\).

The probabilities are thus:

We can sample from the posterior many times, then just calculate the quantiles from the sample.

# A tibble: 1 x 5
   q005   q25   q50   q75  q995
  <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.234 0.449 0.531 0.614 0.808

Part 2

The only change here is specifying that the prior is 0 below 0.5.

The lower bound for the quantiles is now higher.

# A tibble: 1 x 5
   q005   q25   q50   q75  q995
  <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.501 0.547 0.596 0.654 0.830

We can calculate the probabilty that the posterior lies within 0.05 of the true value by sampling and averaging.

# A tibble: 2 x 2
  exercise prob_p_near_true
  <chr>               <dbl>
1 1                   0.130
2 2                   0.219

Here we see that the prior from this exercise has more probability mass around the true value.

Part 3

An easy way is just to simulate the calculation for different values of N using our best guess as to the true value we are trying to estimate. Note that we run multiple iterations for each value of N.

# A tibble: 5 x 5
      N  q005   q50  q995  width
  <dbl> <dbl> <dbl> <dbl>  <dbl>
1    10 0.247 0.589 0.875 0.629 
2   100 0.570 0.695 0.801 0.231 
3  1000 0.659 0.697 0.734 0.0743
4  3000 0.676 0.698 0.719 0.0431
5 10000 0.685 0.697 0.709 0.0237

Thus it seems we need around 3000 trials to get a 99th percentile interval to have a width below 0.05.

Alternatively, we can use a normal approximaton to get a rough idea how large N is. The standard deviation is

\[ \sqrt{p_{true} (1 - p_{true}) / N}. \]

The 99th percentile of our estimate must be plus/minus 3 standard deviations. So

\[ 6\sqrt{p_{true} (1 - p_{true}) / N} < 0.05 \]

which means

\[ N > (p_{true} (1 - p_{true})) / (0.05 / 6)^2. \]

Thus, \(N\) will be somewhere around

[1] 3024