All exercises taken from the OpenIntro Statistics textbook, \(4^{th}\) edition, Chapter 3.
This one is all about conditional and compound probabilities and could be represented as a tree diagram (if you find those useful).
\[\begin{array}{l} P(support | college) = \frac{P(support and college)}{P(college)}\\ \phantom{P(support | college)} = \frac{0.1961}{0.1961 + 0.2068}\\ \phantom{P(support | college)} = 0.49 \end{array}\]
Once you have one person’s birthday, the probability that the second person has the same birthday is:
\[P(first~two~share~birthday) = \frac{1}{365} = 0.0027\]
This one is more challenging! There are many possible approaches, but I find it easiest to think about the probability that none of the three share a birthday in the following way: start with the probability that the first two don’t share a birthday, followed by the probability that the next person doesn’t share a birthday either. This makes it possible to apply the general multiplication rule:
\[\begin{array}{l} P(at~least~two~share~birthday) = 1-P(none~of~three~share~birthday)\\ \phantom{P(at~least~two~share~birthday)}=1-P(first~two~don't~share) \times P(third~doesn't~share~either)\\ \phantom{P(at~least~two~share~birthday)}=1-(\frac{364}{365}) \times (\frac{363}{365})\\ \phantom{P(at~least~two~share~birthday)}=0.0082 \end{array}\]
First, the average fee per passenger (let’s call that \(\bar{F}\)) is the sum of the expected values of the fees per passenger at each of the three possible fee levels (determined by number of bags checked). This works out to the sum of the fees per bag (let’s write this \(f_{b}\)) times the probability (proportion in the case of a binomial process) of passengers with each number of bags (call that \(P_{b}\)). We can now put that in slightly more formal notation and work out the arithmetic:
\[\begin{array}{l} \bar{F} = E(Fee~per~passenger) = \sum_{b=0}^2{(f_{b}\times P_{b})}\\ \phantom{\bar{F} } = \$0(0.54) + \$25(0.34) + \$60(0.12)\\ \phantom{\bar{F} } = \$0 + \$8.5 + \$7.2 = \$15.70 \end{array}\]
To calculate the standard deviation of the expected value, we need to find the square root of the variance. To find the variance, we need to find the deviance (difference from the expected value) at each fee level, multiply those deviances by the probability (again, the proportion, in a binomial process) of the respective fee levels, and then sum them up. Here’s what that looks like:
\[\begin{array}{l|r r} \text{Bags} & (F - E(F))^2 = & \text{Deviance}\\ \hline 0 & (0-15.70)^2 = & 246.49 \\ 1 & (25-15.70)^2 = & 86.49 \\ 2 & (60-15.70)^2 = & 1962.49 \end{array}\]
\[\begin{array}{l| r r} \text{Bags} & (F - E(F))^2\times P(F) = & \text{Deviance}\\ \hline 0 & 246.49 \times 0.54 =& 133.10\\ 1 & 86.49 \times 0.34 =& 29.41\\ 2 & 1962.49 \times 0.12 =& 235.50 \end{array}\]
I sum that last column of values to find the variance (traditionally notated using the greek letter sigma squared (\(\sigma^2\)):
\[{\sigma_{\bar{F}}}^2 = \$133.10 + \$29.41 + \$235.50 = \$398.01\]
And take the square root to find the standard deviation (traditionally notated as sigma (\(\sigma\)):
\[ \sigma_{\bar{F}} = \sqrt{\$398.01} = \$19.95 \]
To calculate this using the tools introduced in the chapter, we’ll need to assume independence between the baggage choices of individual passengers (and this is probably wrong, but maybe not catastrophic for the precision of our estimate? Who knows.):
Once we assume independence between passengers, we can calculate the expected total revenue (let’s call that \(E(revenue)\)) by summing the individual expected revenue over the 120 passengers. We can calculate the corresponding standard deviation of the expected total revenue by summing the individual variances and then taking the square root of that sum. Plug and chug using the values we calculated in Part a of this exercise to find the answers:
\[\begin{array}{r r r} E(revenue) =& 120 \times \$15.70 =& \$1,884\\ {\sigma_{E(revenue)}}^2 =& 120 \times \$398.01 =& \$47,761.20\\ {\sigma_{E(revenue)}}\phantom{^2} =& \sqrt{\$47,761.20} =& \$218.54 \end{array}\]
The distribution is right skewed, with a median somewhere around $35-$50,000. There’s a long tail out to the right (high positive values).
By the addition rule:
\[P(Income <\$50k) = 2.2 + 4.7 + 15.5 + 18.3 + 21.2 = 62.2\%\]
\[P(Income <\$50k~and~female) = P(Income <\$50k) \times P(female) = 0.622 \times 0.41 = 0.255\]