First I can import the data. My preferred method for working with .xls files is (whenever possible) to open them up and save them as .csv or .tsv files. I’ve done that here and uploaded a copy of the resulting file to the course website to make this example reproducible. I’ll use the tidyverse read_csv()
command to import the data.
library(tidyverse)
raw_df <- read_csv("https://communitydata.science/~ads/teaching/2020/stats/data/week_09/owan03.csv")
raw_df
## # A tibble: 11 x 4
## X1 X2 X3 X4
## <dbl> <dbl> <dbl> <dbl>
## 1 70 49 30 34
## 2 77 60 37 36
## 3 83 63 56 48
## 4 87 67 65 48
## 5 92 70 76 65
## 6 93 74 83 91
## 7 100 77 87 98
## 8 102 80 90 102
## 9 102 89 94 NA
## 10 103 NA 97 NA
## 11 96 NA NA NA
Now, let’s organize the data. We want this in “long” format where the survival times in weeks for each of the three dosage levels are all arranged in a single column and another column labels the corresponding dosage level. You might note that having variable information captured in the column headers is a pretty common situation, so learning how to do this kind of dataset manipulation is a very useful skill!
First, I’ll rename the original column names so that the information about dosage is explicitly labeled.
colnames(raw_df) <- c("None", "Low", "Medium", "High")
Now, I’ll use the very handy pivot_longer()
function from the tidyverse
library to do the rearranging.
df <- raw_df %>%
pivot_longer(
cols = everything(),
names_to = "dose",
values_to = "weeks_alive"
)
df
## # A tibble: 44 x 2
## dose weeks_alive
## <chr> <dbl>
## 1 None 70
## 2 Low 49
## 3 Medium 30
## 4 High 34
## 5 None 77
## 6 Low 60
## 7 Medium 37
## 8 High 36
## 9 None 83
## 10 Low 63
## # … with 34 more rows
That’s looking good! I should probably convert the “dose” variable into a factor while I’m at it. I’ll be super explicit about the levels here just to make sure nothing gets lost along the way…
df$dose <- factor(df$dose, levels = c("None", "Low", "Medium", "High"))
Last, but not least, the NA values from my original dataframe aren’t really doing me any good here, so I can drop them:
df <- df[complete.cases(df), ]
Summary statistics overall:
summary(df$weeks_alive)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.00 63.50 78.50 75.55 92.75 103.00
sd(df$weeks_alive)
## [1] 21.42832
boxplot(df$weeks_alive)
Now summary statistics within the different groups. First, I’ll collect some basic info using the tapply()
function:
tapply(df$weeks_alive, df$dose, summary)
## $None
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 70.00 85.00 93.00 91.36 101.00 103.00
##
## $Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 49.00 63.00 70.00 69.89 77.00 89.00
##
## $Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.00 58.25 79.50 71.50 89.25 97.00
##
## $High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.00 45.00 56.50 65.25 92.75 102.00
Here’s an alternative way to do this using tidyverse group_by()
and summarize_all()
functions:
df %>%
group_by(dose) %>%
summarize(
n = n(),
min = min(weeks_alive),
avg = round(mean(weeks_alive), 2),
max = max(weeks_alive),
sd = round(sd(weeks_alive), 2)
)
## # A tibble: 4 x 6
## dose n min avg max sd
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 None 11 70 91.4 103 11.0
## 2 Low 9 49 69.9 89 11.9
## 3 Medium 10 30 71.5 97 23.8
## 4 High 8 34 65.2 102 28.1
When it comes to visualizations, we definitely want to use ggplot. We have lots of options for what to do.
library(ggplot2)
# Histograms
h_plot <- ggplot(data = df, aes(x = weeks_alive, fill = dose)) +
geom_histogram(position = "dodge", bins = 5)
h_plot
In this case, faceted histograms is probably better
h_facet <- ggplot(data = df, aes(x = weeks_alive)) +
geom_histogram(bins = 5) +
facet_grid(~dose)
h_facet
This might be easier to compare if we flip the coordinates (rotate the axes):
h_facet + coord_flip()
Density plots might be nice? Note that setting a value <1 for “alpha” makes the fill more transparent
d_plot <- ggplot(data = df, aes(x = weeks_alive, fill = dose)) +
geom_density(alpha = .2)
d_plot
Aaron finds boxplots quite clarifying and added a call to scale_fill_brewer()
here to use the “Brewer” color palettes:
box_plot <- ggplot(data = df, aes(y = weeks_alive, x = dose, fill = dose)) +
geom_boxplot() +
scale_fill_brewer()
box_plot
Some people like “ridgeline” plots too:
# install.packages('ggridges')
library(ggridges)
ridge_plot <- ggplot(data = df, aes(x = weeks_alive, y = dose)) +
geom_density_ridges(jittered_points = T, fill = "orange")
ridge_plot
# add a fancy minimalist theme to make it prettier:
ridge_plot + theme_minimal()
Overall, the survival times range from \(30-100\) weeks, with an average of about \(75.5\) weeks and a standard deviation of \(21.4\) weeks. The distribution of outcomes is slightly left-skewed.
Comparisons of survival times across the different dosage levels reveals that the average survival time is highest in the “None” dosage level (\(\mu \approx 91\) weeks) and lowest in the “High” dosage level (\(\mu \approx 65\) weeks). The spread of the data is also much greater in the “Medium and”High" dosage level groups, suggesting a wider range of outcomes in those conditions. Based on the “ridgeline” plots, it appears that this expanded spread might have something to do with a bi-modal distribution in both conditions.
For the ANOVA:
\(H_0:~\mu_{none}~=~\mu_{low}~=~\mu_{medium}~=~\mu_{high}\)
\(H_A:\) Means are not all equal.
For the t-tests of none vs. any:
\(H_0:~\mu_{none}~=~\mu_{any}\)
\(H_A:~\mu_{none}~\neq~\mu_{any}\)
For the t-tests of none vs. high:
\(H_0:~\mu_{none}~=~\mu_{high}\)
\(H_A:~\mu_{none}~\neq~\mu_{high}\)
An ANOVA assumes independence, normality, and equal variance. A two sample t-test assumes independence and normality. The assumption of equal variance seems like a stretch (check out those standard deviations within conditions!). Normality is also a bit of a hard sell within groups or overall, but ultimately not so bad. Independence of the samples seems just fine since this was an experiment and we have no reason to anticipate dependence between the different conditions.
Despite the issues with unequal variance and normality, most analysts would march ahead with the analysis despite these violations of assumptions. We can discuss how you might think and talk about this more in class.
Remember that we have to call summary()
to see the ANOVA table:
summary(aov(weeks_alive ~ dose, data = df))
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 3 4052 1350.7 3.55 0.0245 *
## Residuals 34 12937 380.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This provides evidence in support of the alternative hypothesis that the group means are not equal (\(p<0.05\)). It seems that the dosage of Red Dye #40 likely has a relationship with how many weeks the mice lived.
T-test between None and Any, and between None and High.
t.test(
df$weeks_alive[df$dose == "None"], # Samples with no dose
df$weeks_alive[df$dose != "None"] # Samples with any dose
)
##
## Welch Two Sample t-test
##
## data: df$weeks_alive[df$dose == "None"] and df$weeks_alive[df$dose != "None"]
## t = 4.2065, df = 33.732, p-value = 0.0001806
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 11.49879 33.00626
## sample estimates:
## mean of x mean of y
## 91.36364 69.11111
Or, using formula notation
t.test(weeks_alive ~ dose == "None", data = df)
##
## Welch Two Sample t-test
##
## data: weeks_alive by dose == "None"
## t = -4.2065, df = 33.732, p-value = 0.0001806
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -33.00626 -11.49879
## sample estimates:
## mean in group FALSE mean in group TRUE
## 69.11111 91.36364
T-test between None and High
t.test(
df$weeks_alive[df$dose == "None"], # Samples with no dose
df$weeks_alive[df$dose == "High"] # Samples with high dose
)
##
## Welch Two Sample t-test
##
## data: df$weeks_alive[df$dose == "None"] and df$weeks_alive[df$dose == "High"]
## t = 2.4958, df = 8.5799, p-value = 0.0353
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.266399 49.960874
## sample estimates:
## mean of x mean of y
## 91.36364 65.25000
Formula notation for this is a bit tricker. I would create a temporary dataframe…
df.tmp <- subset(df, subset = dose == "None" | dose == "High")
t.test(weeks_alive ~ dose, data = df.tmp)
##
## Welch Two Sample t-test
##
## data: weeks_alive by dose
## t = 2.4958, df = 8.5799, p-value = 0.0353
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.266399 49.960874
## sample estimates:
## mean in group None mean in group High
## 91.36364 65.25000
These t-tests both support the idea that receiving a dose of RD40 reduces lifespan.
We might not completely trust these p-values, since we are doing multiple comparisons (\(\binom{4}{2}=6\) comparisons, to be precise). One option is to do a Bonferroni correction, where we only cnsider things significant if \(\alpha < .05/m\) where \(m\) is the number of tests (so our new critical p-value would be \(0.0083\)).
However, as discussed in the Reinhart book, the Bonferroni correction is usually more conservative than it needs to be, and there are other approaches; for example, Benjamini-Hochberg correction which I would calculate calling pairwise.t.test()
in R like so:
with(
df,
pairwise.t.test(weeks_alive, dose, p.adjust = "BH")
)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: weeks_alive and dose
##
## None Low Medium
## Low 0.052 - -
## Medium 0.052 0.858 -
## High 0.041 0.753 0.753
##
## P value adjustment method: BH
In this particular study, I would suggest reporting adjusted p-values for all of the tests (and I would consider including the “raw” p-values in a footnote or appendix or something).
For RQ4, the units of analysis are the 109 respondents. The credibility index (a group of five survey items evaluating how credible the respondents perceived the blogs they read to be) was split at the mean and these high/low groups were the independent (grouping) variable in the ANOVA. The six perceived relationship management factors (survey items) were the dependent variables for the ANOVA.
The analysis tests whether the mean responses on the survey items composing each of the different relationship management factors were equal across the high/low blog credibility groups. The alternative hypotheses are whether there are differences between the groups for any of the perceived relationship management dimensions.
None of the ANOVA tests rejected the null hypothesis of no difference. In other words, there was no evidence that perceptions of relationship management dimensions varied across individuals perceiving blogs as low or high credibiliy.
It is (usually) a bit hard to say much from a null result! See the answer to (c) above.
Again, the units are the 109 respondents and the partitioned (low/high) credibility index serves as the independent (grouping) variable. The crisis index is the dependent variable.
The ANOVA tests whether average assessments of perceived crisis in the organization in question were equal by whether participants perceived the blogs to be low/high credibility. The alternative hypotheses are whether there are differences between the groups for perceptions of the organization being in crisis.
The results of the ANOVA reject the null, suggesting support for the alternative hypothesis that participants reporting low credibility blogs reported different (higher) scores on the crisis index.
I find the reported differences compelling, but would like more information in order to determine more specific takeaways. For example, I would like to see descriptive statistics about the various measures to help evaluate whether they meet the assumptions for identifying the ANOVA. Survey indices like this are a bit slippery insofar as they can seem to yield results when the differences are really artifacts of the measurements and how they are coded. I am also a bit concerned that the questions seemed to ask about blog credibility in general rather than the specific credibility of the specific blogs read by the study participants? The presumed relationship between credibility and the assignment to the blogs in question is not confirmed empirically, meaning that the differences in perceptions of organizational crisis might be more related to baseline attitudes than to anything specific about the treatment conditions in the experiment. I would also like to know more about the conditional means and standard errors in order to evaluate whether the pairwise average perceptions of organizational crisis vary across perceived credibility levels.
Analogous to RQ5 except that the (six) different dimensions of relationship management separated into high/low categories served as the independent (grouping) variables in the ANOVA. Perceptions of organizational crisis remained the dependent variable.
This set of ANOVAs test whether assessments of perceived organizational crisis were equal or varied depending on the prevalence of specific relationship management strategies.
The results of the ANOVAs are mixed, rejecting the null hypothesis of no difference for two of the relaionship management dimensions (task sharing and responsiveness/customer service), but failing to do so for the other four dimensions. For the two dimensions in question, higher prevalence of each strategy appeared to reduce the perception of crisis.
Again, the differences reported are compelling insofar as they indicate larger varation across the groups in terms of perceived crisis than would be expected in a world where no difference existed. That said, in addition to all the issues mentioned above for EQ5 part (d), this set of tests also raisesw issues of multiple comparisons. Not only is it useful to consider multiple comparisons in the context of a single ANOVA, but when running many ANOVAs across many grouping variables. If the authors really wanted to test whether the specific PR strategies mentioned here altered perceptions of organizational crisis across different types of blogs, they should/could have incorporated those variations into their experimental design much more directly. As such, the results remain suggestive that some of the observed relationships may exist, but can provide only weak support for even those that reject the null hypotheses in statistical tests.