psets/pset6-worked_solution.rmd

   1 ---
   2 title: "Problem set 6: Worked solutions"
   3 subtitle: "Statistics and statistical programming  \nNorthwestern University  \nMTS 525"
   4 author: "Aaron Shaw"
   5 date: "November 9, 2020"
   6 output:
   7   html_document:
   8     toc: yes
   9     toc_depth: 3
  10     toc_float:
  11       collapsed: true
  12       smooth_scroll: true
  13     theme: readable
  14   pdf_document:
  15     toc: yes
  16     toc_depth: '3'
  17   header-includes:
  18   - \newcommand{\lt}{<}
  19   - \newcommand{\gt}{>}
  20 ---
  21
  22 ```{r setup, include=FALSE}
  23 knitr::opts_chunk$set(echo = TRUE, message=FALSE, tidy='styler')
  24 ```
  25
  26 # Programming challenges
  27
  28 ### PC1 Download, import, and reshape the data
  29 First I can import the data. My preferred method for working with .xls files is (whenever possible) to open them up and save them as .csv or .tsv files. I've done that here and uploaded a copy of the resulting file to the course website to make this example reproducible. I'll use the tidyverse `read_csv()` command to import the data.
  30
  31 ```{r}
  32 library(tidyverse)
  33
  34 raw_df = read_csv("https://communitydata.science/~ads/teaching/2020/stats/data/week_09/owan03.csv")
  35 raw_df
  36 ```
  37
  38 Now, let's organize the data. We want this in "long" format where the survival times in weeks for each of the three dosage levels are all arranged in a single column and another column labels the corresponding dosage level. You might note that having variable information captured in the column headers is a pretty common situation, so learning how to do this kind of dataset manipulation is a very useful skill!
  39
  40 First, I'll rename the original column names so that the information about dosage is explicitly labeled.
  41
  42 ```{r}
  43
  44 colnames(raw_df) <- c("None", "Low", "Medium","High")
  45 ```
  46 Now, I'll use the very handy `pivot_longer()` function from the `tidyverse` library to do the rearranging.
  47 ```{r}
  48 df <- raw_df %>%
  49   pivot_longer(
  50     cols = everything(),
  51     names_to = "dose",
  52     values_to = "weeks_alive")
  53
  54 df
  55 ```
  56 That's looking good! I should probably convert the "dose" variable into a factor while I'm at it. I'll be super explicit about the levels here just to make sure nothing gets lost along the way...
  57 ```{r}
  58 df$dose <- factor(df$dose, levels=c("None", "Low", "Medium", "High"))
  59 ```
  60 Last, but not least, the NA values from my original dataframe aren't really doing me any good here, so I can drop them:
  61 ```{r}
  62 df <- df[complete.cases(df),]
  63 ```
  64
  65 ### PC2 Summarize the data
  66
  67 Summary statistics overall:
  68 ```{r}
  69 summary(df$weeks_alive)
  70 sd(df$weeks_alive)
  71 boxplot(df$weeks_alive)
  72 ```
  73
  74
  75 Now summary statistics within the different groups. First, I'll collect some basic info using the `tapply()` function:
  76 ```{r}
  77 tapply(df$weeks_alive, df$dose, summary)
  78 ```
  79
  80 Here's an alternative way to do this using tidyverse `group_by()` and `summarize_all()` functions:
  81
  82 ```{r}
  83 df %>%
  84   group_by(dose) %>%
  85   summarize(
  86             n = n(),
  87             min = min(weeks_alive),
  88             avg = round(mean(weeks_alive),2),
  89             max = max(weeks_alive),
  90             sd = round(sd(weeks_alive),2)
  91   )
  92
  93 ```
  94
  95 When it comes to visualizations, we definitely want to use ggplot. We have lots of options for what to do.
  96
  97 ```{r}
  98 library(ggplot2)
  99
 100 # Histograms
 101 h_plot <- ggplot(data=df, aes(x=weeks_alive, fill = dose)) + geom_histogram(position = 'dodge', bins = 5)
 102 h_plot
 103 ```
 104
 105 In this case, faceted histograms is probably better
 106
 107 ```{r}
 108 h_facet <- ggplot(data=df, aes(x=weeks_alive)) + geom_histogram(bins = 5) + facet_grid(~dose)
 109 h_facet
 110 ```
 111 This might be easier to compare if we flip the coordinates (rotate the axes):
 112 ```{r}
 113 h_facet+coord_flip()
 114 ```
 115
 116 Density plots might be nice? Note that setting a value <1 for "alpha" makes the fill more transparent
 117 ```{r}
 118 d_plot <- ggplot(data=df, aes(x=weeks_alive, fill = dose)) + geom_density(alpha = .2)
 119 d_plot
 120 ```
 121
 122 Aaron finds boxplots quite clarifying and added a call to `scale_fill_brewer()` here to use the "Brewer" color palettes:
 123 ```{r}
 124 box_plot <- ggplot(data=df, aes(y=weeks_alive,  x = dose, fill=dose)) + geom_boxplot() + scale_fill_brewer()
 125 box_plot
 126 ```
 127
 128 Some people like "ridgeline" plots too:
 129
 130 ```{r}
 131 # install.packages('ggridges')
 132 library(ggridges)
 133
 134 ridge_plot <- ggplot(data=df, aes(x=weeks_alive, y = dose)) + geom_density_ridges(jittered_points = T, fill = 'orange')
 135 ridge_plot
 136
 137 # add a fancy minimalist theme to make it prettier:
 138 ridge_plot + theme_minimal()
 139
 140 ```
 141
 142
 143 #### SQ1 Discuss the descriptive results
 144
 145 Overall, the survival times range from $30-100$ weeks, with an average of about $75.5$ weeks and a standard deviation of $21.4$ weeks. The distribution of outcomes is slightly left-skewed.
 146
 147 Comparisons of survival times across the different dosage levels reveals that the average survival time is highest in the "None" dosage level ($\mu \approx 91$ weeks) and lowest in the "High" dosage level ($\mu \approx 65$ weeks). The spread of the data is also much greater in the "Medium and "High" dosage level groups, suggesting a wider range of outcomes in those conditions. Based on the "ridgeline" plots, it appears that this expanded spread might have something to do with a bi-modal distribution in both conditions.
 148
 149 #### SQ2 State hypotheses
 150 For the ANOVA:
 151
 152 $H_0:~\mu_{none}~=~\mu_{low}~=~\mu_{medium}~=~\mu_{high}$
 153 $H_A:$ Means are not all equal.
 154
 155 For the t-tests of none vs. any:
 156 $H_0:~\mu_{none}~=~\mu_{any}$
 157 $H_A:~\mu_{none}~\neq~\mu_{any}$
 158
 159 For the t-tests of none vs. high:
 160 $H_0:~\mu_{none}~=~\mu_{high}$
 161 $H_A:~\mu_{none}~\neq~\mu_{high}$
 162
 163 #### SQ3 Address assumptions
 164
 165 An ANOVA assumes independence, normality, and equal variance. A two sample t-test assumes independence and normality. The assumption of equal variance seems like a stretch (check out those standard deviations within conditions!). Normality is also a bit of a hard sell within groups or overall, but ultimately not so bad. Independence of the samples seems just fine since this was an experiment and we have no reason to anticipate dependence between the different conditions.
 166
 167 Despite the issues with unequal variance and normality, most analysts would march ahead with the analysis despite these violations of assumptions. We can discuss how you might think and talk about this more in class.
 168
 169 ### PC3  ANOVA analysis
 170
 171 Remember that we have to call `summary()` to see the ANOVA table:
 172
 173 ```{r}
 174 summary(aov(weeks_alive ~ dose, data = df))
 175 ```
 176 #### SQ4 Interpret ANOVA results
 177
 178 This provides evidence in support of the alternative hypothesis that the group means are not equal ($p\lt0.05$). It seems that the dosage of Red Dye #40 likely has a relationship with how many weeks the mice lived.
 179
 180 ### PC4 Differences in means
 181
 182 T-test between None and Any, and between None and High.
 183
 184 ```{r}
 185 t.test(df$weeks_alive[df$dose == 'None'], # Samples with no dose
 186        df$weeks_alive[df$dose != 'None'] # Samples with any dose
 187        )
 188 ```
 189 Or, using formula notation
 190 ```{r}
 191 t.test(weeks_alive ~ dose == 'None', data = df)
 192 ```
 193
 194 T-test between None and High
 195
 196 ```{r}
 197 t.test(df$weeks_alive[df$dose == 'None'], # Samples with no dose
 198        df$weeks_alive[df$dose == 'High'] # Samples with high dose
 199        )
 200 ```
 201
 202 Formula notation for this is a bit tricker. I would create a temporary dataframe...
 203 ```{r}
 204 df.tmp <- subset(df, subset= dose=="None" | dose=="High")
 205 t.test(weeks_alive ~ dose, data = df.tmp)
 206
 207 ```
 208
 209 #### SQ5 Interpret t-test results
 210 These t-tests both support the idea that receiving a dose of RD40 reduces lifespan.
 211
 212
 213 #### SQ6 Multiple comparisons
 214 We might not completely trust these p-values, since we are doing multiple comparisons ($\binom{4}{2}=6$ comparisons, to be precise). One option is to do a Bonferroni correction, where we only cnsider things significant if $\alpha < .05/m$ where $m$ is the number of tests (so our new critical p-value would be $0.0083$).
 215
 216 However, as discussed in the Reinhart book, the Bonferroni correction is usually more conservative than it needs to be, and there are other approaches; for example, Benjamini-Hochberg correction which I would calculate calling `pairwise.t.test()` in R like so:
 217
 218 ```{r}
 219 with(df,
 220 pairwise.t.test(weeks_alive, dose, p.adjust = "BH")
 221 )
 222 ```
 223 In this particular study, I would suggest reporting adjusted p-values for all of the tests (and I would consider including the "raw" p-values in a footnote or appendix or something).
 224
 225 # Empirical paper questions
 226
 227 ## EQ1 — RQ4 questions
 228
 229 (a) For RQ4, the units of analysis are the 109 respondents. The credibility index (a group of five survey items evaluating how credible the respondents perceived the blogs they read to be) was split at the mean and these high/low groups were the independent (grouping) variable in the ANOVA. The six perceived relationship management factors (survey items) were the dependent variables for the ANOVA.
 230
 231 (b) The analysis tests whether the mean responses on the survey items composing each of the different relationship management factors were equal across the high/low blog credibility groups. The alternative hypotheses are whether there are differences between the groups for any of the perceived relationship management dimensions.
 232
 233 (c) None of the ANOVA tests rejected the null hypothesis of no difference. In other words, there was no evidence that perceptions of relationship management dimensions varied across individuals perceiving blogs as low or high credibiliy.
 234
 235 (d) It is (usually) a bit hard to say much from a null result! See the answer to (c) above.
 236
 237 ## EQ2 — RQ5 questions
 238
 239 (a) Again, the units are the 109 respondents and the partitioned (low/high) credibility index serves as the independent (grouping) variable. The crisis index is the dependent variable.
 240
 241 (b) The ANOVA tests whether average assessments of perceived crisis in the organization in question were equal by whether participants perceived the blogs to be low/high credibility. The alternative hypotheses are whether there are differences between the groups for perceptions of the organization being in crisis.
 242
 243 (c) The results of the ANOVA reject the null, suggesting support for the alternative hypothesis that participants reporting low credibility blogs reported different (higher) scores on the crisis index.
 244
 245 (d) I find the reported differences compelling, but would like more information in order to determine more specific takeaways. For example, I would like to see descriptive statistics about the various measures to help evaluate whether they meet the assumptions for identifying the ANOVA. Survey indices like this are a bit slippery insofar as they can seem to yield results when the differences are really artifacts of the measurements and how they are coded. I am also a bit concerned that the questions seemed to ask about blog credibility in general rather than the specific credibility of the specific blogs read by the study participants? The presumed relationship between credibility and the assignment to the blogs in question is not confirmed empirically, meaning that the differences in perceptions of organizational crisis might be more related to baseline attitudes than to anything specific about the treatment conditions in the experiment. I would also like to know more about the conditional means and standard errors in order to evaluate whether the pairwise average perceptions of organizational crisis vary across perceived credibility levels.
 246
 247 ## EQ3 — RQ6 questions
 248
 249 (a) Analogous to RQ5 except that the (six) different dimensions of relationship management separated into high/low categories served as the independent (grouping) variables in the ANOVA. Perceptions of organizational crisis remained the dependent variable.
 250
 251 (b) This set of ANOVAs test whether assessments of perceived organizational crisis were equal or varied depending on the prevalence of specific relationship management strategies.
 252
 253 (c) The results of the ANOVAs are mixed, rejecting the null hypothesis of no difference for two of the relaionship management dimensions (task sharing and responsiveness/customer service), but failing to do so for the other four dimensions. For the two dimensions in question, higher prevalence of each strategy appeared to reduce the perception of crisis.
 254
 255 (d) Again, the differences reported are compelling insofar as they indicate larger varation across the groups in terms of perceived crisis than would be expected in a world where no difference existed. That said, in addition to all the issues mentioned above for EQ5 part (d), this set of tests also raisesw issues of multiple comparisons. Not only is it useful to consider multiple comparisons in the context of a single ANOVA, but when running many ANOVAs across many grouping variables. If the authors really wanted to test whether the specific PR strategies mentioned here altered perceptions of organizational crisis across different types of blogs, they should/could have incorporated those variations into their experimental design much more directly. As such, the results remain suggestive that some of the observed relationships may exist, but can provide only weak support for even those that reject the null hypotheses in statistical tests.