From: aaronshaw Date: Thu, 5 Nov 2020 18:27:16 +0000 (-0600) Subject: ch7b X-Git-Url: https://code.communitydata.science/stats_class_2020.git/commitdiff_plain/18fa58b01b5debbe23d96b30fc08432674d5ad88?ds=sidebyside;hp=674fe432955a88fd5e2ad794624367b237f2e384 ch7b --- diff --git a/os_exercises/ch7b_exercises_solutions.html b/os_exercises/ch7b_exercises_solutions.html new file mode 100644 index 0000000..74dd6df --- /dev/null +++ b/os_exercises/ch7b_exercises_solutions.html @@ -0,0 +1,1679 @@ + + + + + + + + + + + + + + + +Chapter 7 Textbook exercises (part b) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +

All exercises taken from the OpenIntro Statistics textbook, \(4^{th}\) edition, Chapter 7.

+
+

7.42 Work hours and education

+
    +
  1. Hypotheses:
  2. +
+

\(H_0:\) The mean hours worked for the groups are all equal.

+

\[\mu_{<~hs} = \mu_{hs} = \mu_{jc} = \mu_{ba} = \mu_{grad} \] \(H_A:\) The mean hours worked vary by education level. In other words, the means are not equal.

+
    +
  1. Conditions and assumptions necessary for unbiased ANOVA estimates include:
  2. +
+

Independent observations, normal(ish) distributions, and constant(ish) variance. The problem doesn’t say much about the sample to help evaluate the independence of the observations, but it’s definitely less than 10% of the population and is likely a fairly good approximation of a random sample (thereby satisfying the rule of thumb). From the boxplots the distributions all look fairly normal. The standard deviations are also similar. We’ll assume that the conditions are met for the purposes of the test.

+
    +
  1. Working across the rows of the table, we can fill in the blanks:
  2. +
+
    +
  • The degrees of freedom for degree \(= 5-1 = 4\)
    +
  • +
  • The Sum of Squares between degree levels \(= 501.54 \times 4 = 2006.16\)
    +
  • +
  • The F value \(= Sum~Sq~degree / Mean~Sq~residuals = 501.54 / 229.12 = 2.189\)
    +
  • +
  • The degrees of freedom for Residuals \(= 1171-4 = 1167\)
    +
  • +
  • Mean Square Residuals (Error) \(= 267382/1167 = 229.12\)
    +
  • +
  • Total degrees of freedom \(=1172 - 1 = 1171\)
    +
  • +
  • Total Sum of squares \(=2006.16+267382 = 269388.16\)
  • +
+
    +
  1. According to the ANOVA results, we cannot reject the null hypothesis at a \(p ≤0.05\) level, suggesting that the mean number of hours worked per week may be equal across education levels.
  2. +
+
+
+

7.44 Child care hours

+
    +
  1. \(H_0\): Average child care hours is the same for all attainment levels: \(\mu_{College}~=~\mu_{TechOrVoc}~=~\mu_{UMS}~=~\mu_{LMS}~=~\mu_{PS}\) \(H_A\): At least one pair of means are different.

  2. +
  3. Since \(p~>~0.05\), the results fail to reject \(H_0\). The data do not provide convincing evidence of a difference between the average number of hours spent on child care across educational attainment levels.

  4. +
+
+
+

7.46 True/False ANOVA questions

+
    +
  1. False. The ANOVA procedure does not evaluate the pairwise comparisons, but the overall variation across groups.
    +
  2. +
  3. True, otherwise the F-value will not be large enough to reject the null hypothesis.
    +
  4. +
  5. False. It is possible that none of the pairwise comparisons will be significantly different even if the ANOVA rejects the null.
    +
  6. +
  7. Assuming this question is about the Bonferroni correction, False. The correction does not divide \(\alpha\) by the number of groups, but rather the number of pairwise tests. In this case, 4 groups yields \({4}\choose{2} = 6\) pairs, meaning that the corrected value for \(\alpha = 0.05/6 = 0.0083\). Other corrections exist even though they were not discussed in the book (and the Reinhart reading) and they may choose other values for \(\alpha\) or other procedures.
  8. +
+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + + diff --git a/os_exercises/ch7b_exercises_solutions.pdf b/os_exercises/ch7b_exercises_solutions.pdf new file mode 100644 index 0000000..f5bb071 Binary files /dev/null and b/os_exercises/ch7b_exercises_solutions.pdf differ diff --git a/os_exercises/ch7b_exercises_solutions.rmd b/os_exercises/ch7b_exercises_solutions.rmd new file mode 100644 index 0000000..f3cc95c --- /dev/null +++ b/os_exercises/ch7b_exercises_solutions.rmd @@ -0,0 +1,71 @@ +--- +title: "Chapter 7 Textbook exercises (part b)" +subtitle: "Solutions to even-numbered questions \nStatistics and statistical programming \nNorthwestern University \nMTS + 525" +author: "Aaron Shaw" +date: "November 5, 2020" +output: + html_document: + toc: yes + toc_depth: 3 + toc_float: + collapsed: false + smooth_scroll: true + theme: readable + pdf_document: + toc: no + toc_depth: '3' + latex_engine: xelatex +header-includes: + - \newcommand{\lt}{<} + - \newcommand{\gt}{>} + - \renewcommand{\leq}{≤} + - \usepackage{lmodern} +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) + +``` + + +All exercises taken from the *OpenIntro Statistics* textbook, $4^{th}$ edition, Chapter 7. + +# 7.42 Work hours and education +(a) Hypotheses: + +$H_0:$ The mean hours worked for the groups are all equal. + +$$\mu_{\lt~hs} = \mu_{hs} = \mu_{jc} = \mu_{ba} = \mu_{grad} $$ +$H_A:$ The mean hours worked vary by education level. In other words, the means are not equal. + +(b) Conditions and assumptions necessary for unbiased ANOVA estimates include: + +Independent observations, normal(ish) distributions, and constant(ish) variance. The problem doesn't say much about the sample to help evaluate the independence of the observations, but it's definitely less than 10% of the population and is likely a fairly good approximation of a random sample (thereby satisfying the rule of thumb). From the boxplots the distributions all look fairly normal. The standard deviations are also similar. We'll assume that the conditions are met for the purposes of the test. + +(c) Working across the rows of the table, we can fill in the blanks: + +* The degrees of freedom for degree $= 5-1 = 4$ +* The Sum of Squares between degree levels $= 501.54 \times 4 = 2006.16$ +* The F value $= Sum~Sq~degree / Mean~Sq~residuals = 501.54 / 229.12 = 2.189$ +* The degrees of freedom for Residuals $= 1171-4 = 1167$ +* Mean Square Residuals (Error) $= 267382/1167 = 229.12$ +* Total degrees of freedom $=1172 - 1 = 1171$ +* Total Sum of squares $=2006.16+267382 = 269388.16$ + +(d) According to the ANOVA results, we cannot reject the null hypothesis at a $p \leq 0.05$ level, suggesting that the mean number of hours worked per week may be equal across education levels. + +# 7.44 Child care hours + +(a) +$H_0$: Average child care hours is the same for all attainment levels: $\mu_{College}~=~\mu_{TechOrVoc}~=~\mu_{UMS}~=~\mu_{LMS}~=~\mu_{PS}$ +$H_A$: At least one pair of means are different. + +(b) Since $p~\gt~0.05$, the results fail to reject $H_0$. The data do not provide convincing evidence of a difference between the average number of hours spent on child care across educational attainment levels. + +# 7.46 True/False ANOVA questions + +(a) False. The ANOVA procedure does not evaluate the pairwise comparisons, but the overall variation across groups. +(b) True, otherwise the F-value will not be large enough to reject the null hypothesis. +(c) False. It is possible that none of the pairwise comparisons will be significantly different even if the ANOVA rejects the null. +(d) Assuming this question is about the Bonferroni correction, False. The correction does not divide $\alpha$ by the number of groups, but rather the number of pairwise tests. In this case, 4 groups yields ${4}\choose{2} = 6$ pairs, meaning that the corrected value for $\alpha = 0.05/6 = 0.0083$. Other corrections exist even though they were not discussed in the book (and the Reinhart reading) and they may choose other values for $\alpha$ or other procedures. \ No newline at end of file