From: aaronshaw Date: Thu, 2 May 2019 02:31:36 +0000 (-0500) Subject: initial commit of week 5 pset solutions X-Git-Url: https://code.communitydata.science/stats_class_2019.git/commitdiff_plain/ff24cba0648615296d49a623cb08a03b90454f17 initial commit of week 5 pset solutions --- diff --git a/problem_sets/week_05/ps5-worked_solution.Rmd b/problem_sets/week_05/ps5-worked_solution.Rmd new file mode 100644 index 0000000..658fded --- /dev/null +++ b/problem_sets/week_05/ps5-worked_solution.Rmd @@ -0,0 +1,281 @@ +--- +title: 'Week 5 problem set: Worked solutions' +subtitle: "Statistics and statistical programming \nNorthwestern University \nMTS 525" +author: "Aaron Shaw" +date: "May 2, 2019" +output: html_document +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +## PC0 + +Load the dataset. As usual, you'll need to edit this block of code to point at the appropriate file location to make it work on your machine. +```{r} +pop <- read.delim(file="~/Documents/Teaching/2019/stats/data/week_05/population.tsv") +``` + +## PC1 + +```{r} +mean(pop$x, na.rm=T) + +## Insert code for your group as needed: +w3 <- read.csv(file="~/Documents/Teaching/2019/stats/data/week_03/group_03.csv", row.names=NULL) + +mean(w3$x, na.rm=T) +``` +Given the structure of the full dataset, it's also easy to calculate all of the group means: +```{r} +s.means <- tapply(pop$x, pop$group, mean, na.rm=T) +s.means +``` +We will discuss the relationship of the individual group means to population mean in class. + +## PC2 + +I'll do this two ways. First, just plugging the values into the formula for the standard error, I can then add/subtract twice the standard from the mean to find the 95% CI. +```{r} +se <- sd(w3$x, na.rm=T) / sqrt(length(w3$x)) +mean(w3$x, na.rm=T)-(2*se) +mean(w3$x, na.rm=T)+(2*se) +``` + +Now, I'll write a more general function to calculate confidence intervals. Note that I create an 'alpha' argument with a default value of 0.05. I then divide alpha by 2. Can you explain why this division step is necessary? + +```{r} +ci <- function (x, alpha=0.05) { + x <- x[!is.na(x)] + probs <- c(alpha/2, 1-alpha/2) + critical.values <- qnorm(probs, mean=0, sd=1) + se <- sd(x)/sqrt(length(x)) + mean(x) + critical.values * se +} + +## Here I run the function on the group 3 data +ci(w3$x) +``` + +Again, it's easy to estimate this for every group: + +```{r} +group.confints <- tapply(pop$x, pop$group, ci) +group.confints +``` +## PC3 + +We'll discuss this one in class. + +## PC4 + +Comparing the distributions (first for the single comparison) + +```{r} +hist(pop$x) +hist(w3$x) + +summary(pop$x) +sd(pop$x, na.rm=T) + +summary(w3$x) +sd(w3$x, na.rm=T) +``` + +And here's code to do it for each group. A ggplot2 "faceted" histogram will look great and makes for concise code. + +```{r} + +library(ggplot2) +ggplot(pop, aes(x)) + geom_histogram() + facet_wrap(. ~ group) + +tapply(pop$x, pop$group, summary) + +tapply(pop$x, pop$group, sd, na.rm=T) +``` +They all look a little bit different from each other and from the population distribution. We'll discuss these differences in class. + +## PC5 + +```{r} +s.means <- tapply(pop$x, pop$group, mean, na.rm=T) +s.means + +sd(s.means) + +## My standard error from one of the groups above: +se +``` +We will discuss the relationship of these values in class. + +## PC 6 + +### (a) +Since there's going to be some randomness in the next few commands, I'll set a seed value to ensure the results are consistent on other laptops. +```{r} +set.seed(20190501) +pop.unif <- sample(seq(0, 9), 10000, replace=TRUE) +``` + +### (b) + +```{r} +mean(pop.unif) +hist(pop.unif) +``` + +### (c) + +```{r} +hist( + sapply( rep(1, 100), function (x) { + mean(sample(pop.unif, 2))} + ) + ) + +``` + +## (d) +Same commands as (c) just changing the sample size +```{r} +hist(sapply(rep(1, 100), function (x) { mean(sample(pop.unif, 10))})) + +hist(sapply(rep(1, 100), function (x) { mean(sample(pop.unif, 100))})) +``` + +## PC7 + +We'll discuss this in class. + +# Statistical Questions + +## SQ1 — 5.22 + +(a) We want to calculate the following: + +$$ \overline{x}_{diff} \pm t^*_{df}\frac{s_{diff}}{\sqrt{n}}$$ + +Plug in the corresponding values and the Z-score for a 95% confidence interval (1.98): + +$$-0.545 \pm 1.98 \times \frac{8.887}{\sqrt{200}} $$ +I'll let R take it from here: + +```{r} +-0.545 - (1.98 * (8.887/sqrt(200))) +-0.545 + (1.98 * (8.887/sqrt(200))) +``` + +(b) With 95% confidence, the average difference between reading and writing test scores falls between -1.79 and 0.70. + +(c) No, because the interval includes/crosses zero. + +## SQ2 — 5.28 + +We want to test the following hypotheses: +$$H_0~:~\mu_{0.99}=\mu_{1}$$ +$$H_A~:~\mu_{0.99}\neq\mu_{1}$$ +To do so, we can use a two-sample t-test to compare the two sample means. The conditions we'd like to satisfy are independence and normality. Re: independence, the samples are random and not terribly large (presumably less than 10% of all the diamonds of each carat rating on earth), so we should be good to go. Re: normality, visual inspection of the histograms presented in the textbook suggests that what skew may be present in either distribution is not extreme. + +Given that the conditions are met, here's how you could construct the test statistic $T$: + +$$T~=~\frac{Point~estimate~-~Null}{Standard~error_{difference}}$$ + +Plugging in formulas from chapter 5 of the textbook this looks like: + +$$T~=~ \frac{(\overline{x}_1-\overline{x}_2)-(0)}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}}$$ +Now, plug in values from the table: + +$$T~=~ \frac{(44.51-56.81)-(0)}{\sqrt{\frac{13.32^2}{23}+\frac{16.13^2}{23}}}$$ +Work that out and you should have $T~=~-2.82$. The degrees of freedom are estimated by the smaller of $n_1-1$ or $n_2-1$ (which are equal in this case), so $df~=~22$. Consulting the table of T-statistics from the back of the book, we find: +$$p_{value}=P(T_{22} \gt 2.82) = 0.01$$ +Assuming we're okay with a fale positive rate of $p\leq0.05$, this provides support for the alternative hypothesis and we can reject the null of no difference between the average standardized prices of 0.99 and 1 carat diamonds. + +## SQ3 — 5.30 + +I want to calculate the following: + +$$(\overline{x_1}-\overline{x_2})~\pm~t^*_{df}\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}$$ +Since the critical values for the distribution of the test statistic $T$ is available in R using the `qt()` function, I can use a similar procedure here as in my `ci()` function from earlier in the problem set to calculate the value of $t_{df}$ (in this case, $t_{22}$). Note that if you didn't use the `qt()` part of this you could have looked up the critical value for a two-tailed test in the t distribution table at the back of the textbook (or elsewhere). + +```{r} + +diff.means <- 44.51-56.81 + +t.star <- qt(0.025, df=22) + +se <- sqrt( ((13.32^2)/23) + ((16.13^2)/23) ) + +diff.means - (t.star*se) +diff.means + (t.star*se) +``` + +In words, I am 95% confident that the average difference between the standardized prices of 0.99 carat diamond and a 1 carat diamond falls between \$3.27 and \$21.33 (the 1 carat diamond costs more). + +## SQ4 — 5.48 + +(a) Hypotheses: + +$H_0:$ The mean hours worked for the groups are all equal. + +$$\mu_{\lt~hs} = \mu_{hs} = \mu_{jc} = \mu_{ba} = \mu_{grad} $$ +$H_A:$ The mean hours worked vary by education level. In other words, the means are not equal. + +(b) Conditions and assumptions necessary for unbiased ANOVA estimates include: + +Independent observations, normal(ish) distributions, and constant(ish) variance. The problem doesn't say much about the sample to help evaluate the independence of the observations, but it's definitely less than 10% of the population and is likely a fairly good approximation of a random sample (thereby satisfying the rule of thumb). From the boxplots the distributions all look fairly normal. The standard deviations are also similar. We'll assume that the conditions are met for the purposes of the test. + +(c) Working across the blanks in the table by row: + +* The degrees of freedom for degree $= 5-1 = 4$ +* The Sum of Squares between degree levels $= 501.54 \times 4 = 2006.16$ +* The F value $= Sum~Sq~degree / Mean~Sq~residuals = 501.54 / 229.12 = 2.189$ +* The degrees of freedom for Residuals $= 1171-4 = 1167$ +* Mean Square Residuals (Error) $= 267382/1167 = 229.12$ +* Total degrees of freedom $=1172 - 1 = 1171$ +* Total Sum of squares $=2006.16+267382 = 269388.16$ + +(d) According to the ANOVA results, we cannot reject the null hypothesis at a $p \leq 0.05$ level, suggesting that the mean hours worked may be equal across education levels. + +## SQ5 — 5.52 + +(a) False. The ANOVA procedure does not evaluate the pairwise comparisons, but the overall variation across groups. +(b) True, otherwise the F-value will not be large enough to reject the null hypothesis. +(c) False. It is possible that none of the pairwise comparisons will be significantly different even if the ANOVA rejects the null. +(d) Assuming this question is about the Bonferroni correction, False. The correction does not divide $\alpha$ by the number of groups, but rather the number of pairwise tests. In this case, 4 groups yields ${4}\choose{2} = 6$ pairs, meaning that the corrected value for $\alpha = 0.05/6 = 0.0083$. Other corrections exist even though they were not discussed in the book and they may choose other values for $\alpha$. + +## SQ6 — Reinhart + +We'll discuss this one as a group. Personally, I find the focus on p-values somewhat thought-stopping and would prefer to see researchers and publications embrace reporting standards that yield more intuitive, flexible, and meaningful understanding. Confidence intervals seem to do this more effectively than p-values and, in that respect, I like confidence intervals quite a lot! + +# Empirical Paper Questions + +## EQ1 — RQ4 questions + +(a) For RQ4, the units of analysis are the 109 respondents. The credibility index (a group of five survey items evaluating how credible the respondents perceived the blogs they read to be) was split at the mean and these high/low groups were the independent (grouping) variable in the ANOVA. The six perceived relationship management factors (survey items) were the dependent variables for the ANOVA. + +(b) The analysis tests whether the mean responses on the survey items composing each of the different relationship management factors were equal across the high/low blog credibility groups. The alternative hypotheses are whether there are differences between the groups for any of the perceived relationship management dimensions. + +(c) None of the ANOVA tests rejected the null hypothesis of no difference. In other words, there was no evidence that perceptions of relationship management dimensions varied across individuals perceiving blogs as low or high credibiliy. + +(d) It is (usually) a bit hard to say much from a null result! See the answer to (c) above. + +### EQ5 — RQ5 questions + +(a) Again, the units are the 109 respondents and the partitioned (low/high) credibility index serves as the independent (grouping) variable. The crisis index is the dependent variable. + +(b) The ANOVA tests whether average assessments of perceived crisis in the organization in question were equal by whether participants perceived the blogs to be low/high credibility. The alternative hypotheses are whether there are differences between the groups for perceptions of the organization being in crisis. + +(c) The results of the ANOVA reject the null, suggesting support for the alternative hypothesis that participants reporting low credibility blogs reported different (higher) scores on the crisis index. + +(d) I find the reported differences compelling, but would like more information in order to determine more specific takeaways. For example, I would like to see descriptive statistics about the various measures to help evaluate whether they meet the assumptions for identifying the ANOVA. Survey indices like this are a bit slippery insofar as they can seem to yield results when the differences are really artifacts of the measurements and how they are coded. I am also a bit concerned that the questions seemed to ask about blog credibility in general rather than the specific credibility of the specific blogs read by the study participants? The presumed relationship between credibility and the assignment to the blogs in question is not confirmed empirically, meaning that the differences in perceptions of organizational crisis might be more related to baseline attitudes than to anything specific about the treatment conditions in the experiment. I would also like to know more about the conditional means and standard errors in order to evaluate whether the pairwise average perceptions of organizational crisis vary across perceived credibility levels. + +### EQ6 — RQ6 questions + +(a) Analogous to RQ5 except that the (six) different dimensions of relationship management separated into high/low categories served as the independent (grouping) variables in the ANOVA. Perceptions of organizational crisis remained the dependent variable. + +(b) This set of ANOVAs test whether assessments of perceived organizational crisis were equal or varied depending on the prevalence of specific relationship management strategies. + +(c) The results of the ANOVAs are mixed, rejecting the null hypothesis of no difference for two of the relaionship management dimensions (task sharing and responsiveness/customer service), but failing to do so for the other four dimensions. For the two dimensions in question, higher prevalence of each strategy appeared to reduce the perception of crisis. + +(d) Again, the differences reported are compelling insofar as they indicate larger varation across the groups in terms of perceived crisis than would be expected in a world where no difference existed. That said, in addition to all the issues mentioned above for EQ5 part (d), this set of tests also raisesw issues of multiple comparisons. Not only is it useful to consider multiple comparisons in the context of a single ANOVA, but when running many ANOVAs across many grouping variables. If the authors really wanted to test whether the specific PR strategies mentioned here altered perceptions of organizational crisis across different types of blogs, they should/could have incorporated those variations into their experimental design much more directly. As such, the results remain suggestive that some of the observed relationships may exist, but can provide only weak support for even those that reject the null hypotheses in statistical tests. \ No newline at end of file diff --git a/problem_sets/week_05/ps5-worked_solution.html b/problem_sets/week_05/ps5-worked_solution.html new file mode 100644 index 0000000..d4f103d --- /dev/null +++ b/problem_sets/week_05/ps5-worked_solution.html @@ -0,0 +1,640 @@ + + + + + + + + + + + + + + +Week 5 problem set: Worked solutions + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +
+

PC0

+

Load the dataset. As usual, you’ll need to edit this block of code to point at the appropriate file location to make it work on your machine.

+
pop <- read.delim(file="~/Documents/Teaching/2019/stats/data/week_05/population.tsv")
+
+
+

PC1

+
mean(pop$x, na.rm=T)
+
## [1] 2.598453
+
## Insert code for your group as needed:
+w3 <- read.csv(file="~/Documents/Teaching/2019/stats/data/week_03/group_03.csv", row.names=NULL)
+
+mean(w3$x, na.rm=T)
+
## [1] 2.759947
+

Given the structure of the full dataset, it’s also easy to calculate all of the group means:

+
s.means <- tapply(pop$x, pop$group, mean, na.rm=T)
+s.means
+
## group_01 group_02 group_03 group_04 group_05 group_06 group_07 group_08 
+## 2.964142 3.163984 2.759947 2.303034 2.290273 2.387994 2.424987 2.400375 
+## group_09 group_10 group_11 group_12 group_13 group_14 group_15 group_16 
+## 2.887230 2.892782 2.376018 2.456387 2.489604 2.572719 2.786722 2.535294 
+## group_17 group_18 group_19 group_20 
+## 2.592676 2.354645 3.016203 2.314035
+

We will discuss the relationship of the individual group means to population mean in class.

+
+
+

PC2

+

I’ll do this two ways. First, just plugging the values into the formula for the standard error, I can then add/subtract twice the standard from the mean to find the 95% CI.

+
se <- sd(w3$x, na.rm=T) / sqrt(length(w3$x))
+mean(w3$x, na.rm=T)-(2*se)
+
## [1] 2.245946
+
mean(w3$x, na.rm=T)+(2*se)
+
## [1] 3.273947
+

Now, I’ll write a more general function to calculate confidence intervals. Note that I create an ‘alpha’ argument with a default value of 0.05. I then divide alpha by 2. Can you explain why this division step is necessary?

+
ci <- function (x, alpha=0.05) {
+    x <- x[!is.na(x)]
+    probs <- c(alpha/2, 1-alpha/2)
+    critical.values <- qnorm(probs, mean=0, sd=1)
+    se <- sd(x)/sqrt(length(x))
+    mean(x) + critical.values * se
+}
+
+## Here I run the function on the group 3 data
+ci(w3$x)
+
## [1] 2.243150 3.276744
+

Again, it’s easy to estimate this for every group:

+
group.confints <- tapply(pop$x, pop$group, ci)
+group.confints
+
## $group_01
+## [1] 2.469464 3.458819
+## 
+## $group_02
+## [1] 2.629990 3.697978
+## 
+## $group_03
+## [1] 2.243150 3.276744
+## 
+## $group_04
+## [1] 1.846621 2.759447
+## 
+## $group_05
+## [1] 1.791081 2.789465
+## 
+## $group_06
+## [1] 1.965324 2.810665
+## 
+## $group_07
+## [1] 1.997502 2.852473
+## 
+## $group_08
+## [1] 1.985440 2.815311
+## 
+## $group_09
+## [1] 2.340745 3.433714
+## 
+## $group_10
+## [1] 2.400087 3.385476
+## 
+## $group_11
+## [1] 1.936974 2.815062
+## 
+## $group_12
+## [1] 2.042326 2.870447
+## 
+## $group_13
+## [1] 2.044957 2.934251
+## 
+## $group_14
+## [1] 2.164345 2.981092
+## 
+## $group_15
+## [1] 2.342309 3.231136
+## 
+## $group_16
+## [1] 2.064519 3.006068
+## 
+## $group_17
+## [1] 2.178324 3.007028
+## 
+## $group_18
+## [1] 1.889973 2.819317
+## 
+## $group_19
+## [1] 2.555803 3.476604
+## 
+## $group_20
+## [1] 1.918150 2.709919
+
+
+

PC3

+

We’ll discuss this one in class.

+
+
+

PC4

+

Comparing the distributions (first for the single comparison)

+
hist(pop$x)
+

+
hist(w3$x)
+

+
summary(pop$x)
+
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+##  0.00078  0.60682  2.08238  2.59845  4.03440 11.21858      100
+
sd(pop$x, na.rm=T)
+
## [1] 2.291625
+
summary(w3$x)
+
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
+##  0.001016  0.657997  2.266437  2.759947  4.089892 10.425881         5
+
sd(w3$x, na.rm=T)
+
## [1] 2.570002
+

And here’s code to do it for each group. A ggplot2 “faceted” histogram will look great and makes for concise code.

+
library(ggplot2)
+
## Registered S3 methods overwritten by 'ggplot2':
+##   method         from 
+##   [.quosures     rlang
+##   c.quosures     rlang
+##   print.quosures rlang
+
ggplot(pop, aes(x)) + geom_histogram() + facet_wrap(. ~ group)
+
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+
## Warning: Removed 100 rows containing non-finite values (stat_bin).
+

+
tapply(pop$x, pop$group, summary)
+
## $group_01
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+## 0.000836 0.623552 2.480939 2.964142 4.869991 7.996040        5 
+## 
+## $group_02
+##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
+## 0.04996 0.98868 2.25888 3.16398 4.58581 9.64321       5 
+## 
+## $group_03
+##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
+##  0.001016  0.657997  2.266437  2.759947  4.089892 10.425881         5 
+## 
+## $group_04
+##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
+## 0.02008 0.44755 1.57473 2.30303 3.47370 9.87610       5 
+## 
+## $group_05
+##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
+##  0.000778  0.375376  1.387920  2.290273  3.839179 10.158698         5 
+## 
+## $group_06
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+## 0.007768 0.517054 2.038364 2.387994 3.766346 8.899269        5 
+## 
+## $group_07
+##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
+## 0.00432 0.51095 2.19910 2.42499 3.78105 9.35393       5 
+## 
+## $group_08
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+## 0.001657 0.691977 2.059659 2.400375 3.534149 9.170684        5 
+## 
+## $group_09
+##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
+##  0.007371  0.474792  2.503350  2.887230  4.194422 11.142319         5 
+## 
+## $group_10
+##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
+##  0.001859  0.872912  2.503538  2.892782  4.366617 10.468560         5 
+## 
+## $group_11
+##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
+## 0.01069 0.65365 1.71980 2.37602 4.08116 9.70812       5 
+## 
+## $group_12
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+## 0.008528 0.540260 2.099041 2.456387 3.518414 7.945303        5 
+## 
+## $group_13
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+## 0.006755 0.547588 2.174567 2.489604 3.748151 9.907302        5 
+## 
+## $group_14
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+## 0.001154 0.905256 2.262228 2.572719 4.189871 7.278206        5 
+## 
+## $group_15
+##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
+## 0.01192 0.92555 2.49983 2.78672 4.24195 9.16923       5 
+## 
+## $group_16
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+##  0.01626  0.67461  1.78174  2.53529  3.92996 10.49848        5 
+## 
+## $group_17
+##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
+## 0.01861 0.69614 2.39607 2.59268 3.89538 8.20067       5 
+## 
+## $group_18
+##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
+##  0.001341  0.419682  1.786937  2.354645  3.397951 11.218583         5 
+## 
+## $group_19
+##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
+## 0.04522 1.28244 2.30096 3.01620 4.38815 9.91588       5 
+## 
+## $group_20
+##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
+## 0.007343 0.620991 1.879363 2.314035 3.588999 8.116529        5
+
tapply(pop$x, pop$group, sd, na.rm=T)
+
## group_01 group_02 group_03 group_04 group_05 group_06 group_07 group_08 
+## 2.460005 2.655523 2.570002 2.269715 2.482454 2.101918 2.125861 2.063453 
+## group_09 group_10 group_11 group_12 group_13 group_14 group_15 group_16 
+## 2.717638 2.450142 2.183341 2.059100 2.211206 2.030818 2.210043 2.341134 
+## group_17 group_18 group_19 group_20 
+## 2.060548 2.310790 2.289548 1.968713
+

They all look a little bit different from each other and from the population distribution. We’ll discuss these differences in class.

+
+
+

PC5

+
s.means <- tapply(pop$x, pop$group, mean, na.rm=T)
+s.means
+
## group_01 group_02 group_03 group_04 group_05 group_06 group_07 group_08 
+## 2.964142 3.163984 2.759947 2.303034 2.290273 2.387994 2.424987 2.400375 
+## group_09 group_10 group_11 group_12 group_13 group_14 group_15 group_16 
+## 2.887230 2.892782 2.376018 2.456387 2.489604 2.572719 2.786722 2.535294 
+## group_17 group_18 group_19 group_20 
+## 2.592676 2.354645 3.016203 2.314035
+
sd(s.means)
+
## [1] 0.2696987
+
## My standard error from one of the groups above:
+se
+
## [1] 0.2570002
+

We will discuss the relationship of these values in class.

+
+
+

PC 6

+
+

(a)

+

Since there’s going to be some randomness in the next few commands, I’ll set a seed value to ensure the results are consistent on other laptops.

+
set.seed(20190501)
+pop.unif <- sample(seq(0, 9), 10000, replace=TRUE)
+
+
+

(b)

+
mean(pop.unif)
+
## [1] 4.5284
+
hist(pop.unif)
+

+
+
+

(c)

+
hist(
+  sapply( rep(1, 100), function (x) { 
+    mean(sample(pop.unif, 2))}
+    )
+  )
+

+
+
+
+

(d)

+

Same commands as (c) just changing the sample size

+
hist(sapply(rep(1, 100), function (x) { mean(sample(pop.unif, 10))}))
+

+
hist(sapply(rep(1, 100), function (x) { mean(sample(pop.unif, 100))}))
+

+
+
+

PC7

+

We’ll discuss this in class.

+
+
+

Statistical Questions

+
+

SQ1 — 5.22

+
    +
  1. We want to calculate the following:
  2. +
+

\[ \overline{x}_{diff} \pm t^*_{df}\frac{s_{diff}}{\sqrt{n}}\]

+

Plug in the corresponding values and the Z-score for a 95% confidence interval (1.98):

+

\[-0.545 \pm 1.98 \times \frac{8.887}{\sqrt{200}} \] I’ll let R take it from here:

+
-0.545 - (1.98 * (8.887/sqrt(200)))
+
## [1] -1.789243
+
-0.545 + (1.98 * (8.887/sqrt(200)))
+
## [1] 0.6992435
+
    +
  1. With 95% confidence, the average difference between reading and writing test scores falls between -1.79 and 0.70.

  2. +
  3. No, because the interval includes/crosses zero.

  4. +
+
+
+

SQ2 — 5.28

+

We want to test the following hypotheses: \[H_0~:~\mu_{0.99}=\mu_{1}\] \[H_A~:~\mu_{0.99}\neq\mu_{1}\] To do so, we can use a two-sample t-test to compare the two sample means. The conditions we’d like to satisfy are independence and normality. Re: independence, the samples are random and not terribly large (presumably less than 10% of all the diamonds of each carat rating on earth), so we should be good to go. Re: normality, visual inspection of the histograms presented in the textbook suggests that what skew may be present in either distribution is not extreme.

+

Given that the conditions are met, here’s how you could construct the test statistic \(T\):

+

\[T~=~\frac{Point~estimate~-~Null}{Standard~error_{difference}}\]

+

Plugging in formulas from chapter 5 of the textbook this looks like:

+

\[T~=~ \frac{(\overline{x}_1-\overline{x}_2)-(0)}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}}\] Now, plug in values from the table:

+

\[T~=~ \frac{(44.51-56.81)-(0)}{\sqrt{\frac{13.32^2}{23}+\frac{16.13^2}{23}}}\] Work that out and you should have \(T~=~-2.82\). The degrees of freedom are estimated by the smaller of \(n_1-1\) or \(n_2-1\) (which are equal in this case), so \(df~=~22\). Consulting the table of T-statistics from the back of the book, we find:
+\[p_{value}=P(T_{22} \gt 2.82) = 0.01\] Assuming we’re okay with a fale positive rate of \(p\leq0.05\), this provides support for the alternative hypothesis and we can reject the null of no difference between the average standardized prices of 0.99 and 1 carat diamonds.

+
+
+

SQ3 — 5.30

+

I want to calculate the following:

+

\[(\overline{x_1}-\overline{x_2})~\pm~t^*_{df}\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}\] Since the critical values for the distribution of the test statistic \(T\) is available in R using the qt() function, I can use a similar procedure here as in my ci() function from earlier in the problem set to calculate the value of \(t_{df}\) (in this case, \(t_{22}\)). Note that if you didn’t use the qt() part of this you could have looked up the critical value for a two-tailed test in the t distribution table at the back of the textbook (or elsewhere).

+
diff.means <- 44.51-56.81
+
+t.star <- qt(0.025, df=22)
+
+se <- sqrt( ((13.32^2)/23) + ((16.13^2)/23) )
+
+diff.means - (t.star*se)
+
## [1] -3.254
+
diff.means + (t.star*se)
+
## [1] -21.346
+

In words, I am 95% confident that the average difference between the standardized prices of 0.99 carat diamond and a 1 carat diamond falls between $3.27 and $21.33 (the 1 carat diamond costs more).

+
+
+

SQ4 — 5.48

+
    +
  1. Hypotheses:
  2. +
+

\(H_0:\) The mean hours worked for the groups are all equal.

+

\[\mu_{\lt~hs} = \mu_{hs} = \mu_{jc} = \mu_{ba} = \mu_{grad} \] \(H_A:\) The mean hours worked vary by education level. In other words, the means are not equal.

+
    +
  1. Conditions and assumptions necessary for unbiased ANOVA estimates include:
  2. +
+

Independent observations, normal(ish) distributions, and constant(ish) variance. The problem doesn’t say much about the sample to help evaluate the independence of the observations, but it’s definitely less than 10% of the population and is likely a fairly good approximation of a random sample (thereby satisfying the rule of thumb). From the boxplots the distributions all look fairly normal. The standard deviations are also similar. We’ll assume that the conditions are met for the purposes of the test.

+
    +
  1. Working across the blanks in the table by row:
  2. +
+
    +
  • The degrees of freedom for degree \(= 5-1 = 4\)
    +
  • +
  • The Sum of Squares between degree levels \(= 501.54 \times 4 = 2006.16\)
    +
  • +
  • The F value \(= Sum~Sq~degree / Mean~Sq~residuals = 501.54 / 229.12 = 2.189\)
    +
  • +
  • The degrees of freedom for Residuals \(= 1171-4 = 1167\)
    +
  • +
  • Mean Square Residuals (Error) \(= 267382/1167 = 229.12\)
    +
  • +
  • Total degrees of freedom \(=1172 - 1 = 1171\)
    +
  • +
  • Total Sum of squares \(=2006.16+267382 = 269388.16\)
  • +
+
    +
  1. According to the ANOVA results, we cannot reject the null hypothesis at a \(p \leq 0.05\) level, suggesting that the mean hours worked may be equal across education levels.
  2. +
+
+
+

SQ5 — 5.52

+
    +
  1. False. The ANOVA procedure does not evaluate the pairwise comparisons, but the overall variation across groups.
    +
  2. +
  3. True, otherwise the F-value will not be large enough to reject the null hypothesis.
    +
  4. +
  5. False. It is possible that none of the pairwise comparisons will be significantly different even if the ANOVA rejects the null.
    +
  6. +
  7. Assuming this question is about the Bonferroni correction, False. The correction does not divide \(\alpha\) by the number of groups, but rather the number of pairwise tests. In this case, 4 groups yields \({4}\choose{2} = 6\) pairs, meaning that the corrected value for \(\alpha = 0.05/6 = 0.0083\). Other corrections exist even though they were not discussed in the book and they may choose other values for \(\alpha\).
  8. +
+
+
+

SQ6 — Reinhart

+

We’ll discuss this one as a group. Personally, I find the focus on p-values somewhat thought-stopping and would prefer to see researchers and publications embrace reporting standards that yield more intuitive, flexible, and meaningful understanding. Confidence intervals seem to do this more effectively than p-values and, in that respect, I like confidence intervals quite a lot!

+
+
+
+

Empirical Paper Questions

+
+

EQ1 — RQ4 questions

+
    +
  1. For RQ4, the units of analysis are the 109 respondents. The credibility index (a group of five survey items evaluating how credible the respondents perceived the blogs they read to be) was split at the mean and these high/low groups were the independent (grouping) variable in the ANOVA. The six perceived relationship management factors (survey items) were the dependent variables for the ANOVA.

  2. +
  3. The analysis tests whether the mean responses on the survey items composing each of the different relationship management factors were equal across the high/low blog credibility groups. The alternative hypotheses are whether there are differences between the groups for any of the perceived relationship management dimensions.

  4. +
  5. None of the ANOVA tests rejected the null hypothesis of no difference. In other words, there was no evidence that perceptions of relationship management dimensions varied across individuals perceiving blogs as low or high credibiliy.

  6. +
  7. It is (usually) a bit hard to say much from a null result! See the answer to (c) above.

  8. +
+
+

EQ5 — RQ5 questions

+
    +
  1. Again, the units are the 109 respondents and the partitioned (low/high) credibility index serves as the independent (grouping) variable. The crisis index is the dependent variable.

  2. +
  3. The ANOVA tests whether average assessments of perceived crisis in the organization in question were equal by whether participants perceived the blogs to be low/high credibility. The alternative hypotheses are whether there are differences between the groups for perceptions of the organization being in crisis.

  4. +
  5. The results of the ANOVA reject the null, suggesting support for the alternative hypothesis that participants reporting low credibility blogs reported different (higher) scores on the crisis index.

  6. +
  7. I find the reported differences compelling, but would like more information in order to determine more specific takeaways. For example, I would like to see descriptive statistics about the various measures to help evaluate whether they meet the assumptions for identifying the ANOVA. Survey indices like this are a bit slippery insofar as they can seem to yield results when the differences are really artifacts of the measurements and how they are coded. I am also a bit concerned that the questions seemed to ask about blog credibility in general rather than the specific credibility of the specific blogs read by the study participants? The presumed relationship between credibility and the assignment to the blogs in question is not confirmed empirically, meaning that the differences in perceptions of organizational crisis might be more related to baseline attitudes than to anything specific about the treatment conditions in the experiment. I would also like to know more about the conditional means and standard errors in order to evaluate whether the pairwise average perceptions of organizational crisis vary across perceived credibility levels.

  8. +
+
+
+

EQ6 — RQ6 questions

+
    +
  1. Analogous to RQ5 except that the (six) different dimensions of relationship management separated into high/low categories served as the independent (grouping) variables in the ANOVA. Perceptions of organizational crisis remained the dependent variable.

  2. +
  3. This set of ANOVAs test whether assessments of perceived organizational crisis were equal or varied depending on the prevalence of specific relationship management strategies.

  4. +
  5. The results of the ANOVAs are mixed, rejecting the null hypothesis of no difference for two of the relaionship management dimensions (task sharing and responsiveness/customer service), but failing to do so for the other four dimensions. For the two dimensions in question, higher prevalence of each strategy appeared to reduce the perception of crisis.

  6. +
  7. Again, the differences reported are compelling insofar as they indicate larger varation across the groups in terms of perceived crisis than would be expected in a world where no difference existed. That said, in addition to all the issues mentioned above for EQ5 part (d), this set of tests also raisesw issues of multiple comparisons. Not only is it useful to consider multiple comparisons in the context of a single ANOVA, but when running many ANOVAs across many grouping variables. If the authors really wanted to test whether the specific PR strategies mentioned here altered perceptions of organizational crisis across different types of blogs, they should/could have incorporated those variations into their experimental design much more directly. As such, the results remain suggestive that some of the observed relationships may exist, but can provide only weak support for even those that reject the null hypotheses in statistical tests.

  8. +
+
+
+
+ + + + +
+ + + + + + + +