Populations and Samples

The probability, \(p\), that a randomly selected member of a population has a particular attribute is known as the population proportion. The population proportion is not easy to obtain as it is impractical to survey an entire population.
Thus, in statistics, we talk about a sample proportion \(\hat{P}\), which can be used to approximate the true population proportion. The sample proportion is simply the probability of randomly selecting a member of a predefined part of a population with a particular trait.
In short, p represents the real probability and does not change. \(\hat{P}\) represents the estimated probability and will change between samples.
\[p = \frac{\text{Number in population with trait}}{\text{Population size}}\] \[\hat{P} = \frac{\text{Number in sample with trait}}{\text{Sample size}}\]

Sampling Proportions

In the discrete probability chapter we looked at the number of times something is done. Very little changes in this chapter, except that we are interested in the proportion of times something is done. So, for example, in the discrete probability chapter \(X\) may have represented the number of red balls in a bag. Now, in this chapter, we are dealing with \(\hat{P}\) which would represent the proportion of red balls in a bag. That is \[\hat{P} = \frac{X}{n}\]


\[ \text{Example 13.1: 80\% of the balls in a bag are red.}\\ \text{Create a sampling distribution table of the number of red balls selected if two balls are drawn.}\\ \text{ }\\ \text{Let } \hat{P} \text{ be the proportion of red balls selected}\\ \begin{aligned} \varepsilon &= \{0,\frac{1}{2},1\}\\ \text{ }\\ \Pr(\hat{P} = 1) &= 0.80 \cdot 0.80 \\ &= 0.64\\ \Pr(\hat{P} = \frac{1}{2}) &= \binom{2}{1} \cdot 0.8 \cdot 0.2\\ &= 0.32\\ \Pr(\hat{P} = 0) &= 0.2 \cdot 0.2 \\ &=0.04\\ \end{aligned}\\ \]

\[\hat{P}\] \[0\] \[\frac{1}{2}\] \[1\]
\[Pr(\hat{P}=\hat{P})\] \[0.04\] \[0.32\] \[0.64\]

Compare this with the question in the discrete probability chapter. The results are very similar. Your knowledge from the discrete probability and binomial distribution chapters will help you solve all questions like this!

Mean, Variance and Standard Deviation

Using the mean and variance formulae from the binomial distribution, we are able to derive the mean, variance and standard deviation formulae for the population proportion as: \[ \hat{P} = \frac{X}{n} \]
\[ \begin{aligned} E(\hat{P}) &= p \\ Var(\hat{P}) &= \frac{p(1-p)}{n} \\ sd(\hat{P}) &= \sqrt{Var(\hat{P})} \end{aligned} \]

Approximating the Distribution

For the binomial distribution, we mentioned that if np and np (1-p) were both greater than 5, then the binomial distribution could be reasonably approximated by the normal distribution. Thus, when selecting a sufficiently large sample from a sufficiently large population, the distribution of a sample proportion can likewise be approximated by the normal distribution. Only use a normal approximation if the question asks you to. In all other cases use the binomial distribution to calculate an exact probability.

Example 13.2 Use the normal approximation to the binomial distribution to find the approximate probability, correct to 4 decimal places, that in the next 600 rolls of a fair die, the proportion of sixes rolled will be less than 0.2.

\[ \text{Let } \hat{P} \text{ be the proportion of sixes rolled}\\ \begin{aligned} E(\hat{P}) &= p\\ &= \frac{1}{6}\\ sd(\hat{P}) &= \sqrt{\frac{p(1-p)}{n}}\\ &= \sqrt{\frac{\frac{1}{6}(1-\frac{5}{6})}{600}}\\ &= \sqrt{\frac{\frac{5}{36}}{600}}\\ &= \sqrt{\frac{1}{4320}}\\ \text{ }\\ \end{aligned}\\ \text{Using a normal approximation: }\\ \hat{P} \sim N(\frac{1}{6}, (\sqrt{\frac{1}{4320}})^2)\\ \begin{aligned} \Pr(\hat{P}<0.2) &= normCdf(0, 0.2, \frac{1}{6}, \sqrt{\frac{1}{4320}})\\ &= 0.9858\\ \end{aligned}\\ \]

Confidence Intervals and Margins of Error

Many of the questions in this chapter have given you a value for p, but in reality the actual value of the population proportion is generally unknown. (What is the point of doing sampling if we already know the true proportion of people in a population with a particular trait!)

The value of the sample proportion, \(\hat{P}\), can be used as a point estimate (proxy) of the population proportion, \(p\). However, as the value of the sample proportion varies from sample to sample, it is better to take into consideration the margin of error, \(M\), associated with a sample proportion by instead using a confidence interval, \((\hat{P} -M, \hat{P} + M)\), to determine a range of values within which \(p\) is likely to fall. As expected, the margin of error diminishes as the sample size is increased.
In general, the margin of error for a \(C\%\) confidence interval is given by \(M = k\sigma\), where \(\Pr(-k < Z < k) = \frac{C}{100}\) and \(\hat{P}\) is used when calculating the standard deviation.

It is important that we understand what a confidence interval actually represents. It is not correct to say that there is a \(C\%\) chance that the population proportion, \(p\), falls within the confidence interval calculated. What is true, is that if the confidence interval was calculated for different samples, we would expect \(C\%\) of those confidence intervals to contain the population proportion. Remember, the population proportion is a fixed number. For a given confidence interval, \(p\) is either in it or not in it.

\[ \text{Example 13.3: Find a 90\% confidence interval, correcting to four decimal points, for the proportion, }\hat{P}, \\ \text{of people who like pineapple on pizza given 25 out of 100 randomly selected like it.}\\ \text{ }\\ Z \sim N(0,1)\\ \begin{aligned} \Pr(-k < Z < k) &= 0.9\\ \Pr(-k > Z) &= \frac{1-0.9}{2}\\ \Pr(-k > Z) &= 0.05\\ -k &= invNorm(0.05, 0, 1)\\ -k &= -1.6448…\\ k &= 1.6448…\\ \end{aligned} \text{ }\\ \text{Let } \hat{P} \text{ be the proportion of people who like it} \\ \begin{aligned} \hat{P} &= \frac{25}{100}\\ \hat{P} &= \frac{1}{4}\\ \text{ }\\ sd(\hat{P}) &= \sqrt{\frac{p(1-p)}{n}}\\ &= \sqrt{\frac{\frac{1}{4}(1-\frac{1}{4})}{100}}\\ &= \sqrt{\frac{\frac{1}{4} \cdot \frac{3}{4}}{100}}\\ &= \sqrt{\frac{\frac{3}{16}}{100}}\\ &= \sqrt{\frac{3}{1600}}\\ &= \frac{\sqrt{3}}{40}\\ \text{ }\\ M &= k \sigma\\ &= 0.0712…\\ \text{ }\\ 90 \% \, CI &= (\hat{P} - M, \hat{P} + M)\\ &= (0.2322,0.2678)\\ \end{aligned} \]