Load the dataset. As usual, you’ll need to edit this block of code to point at the appropriate file location to make it work on your machine.
pop <- read.delim(file="~/Documents/Teaching/2019/stats/data/week_05/population.tsv")
mean(pop$x, na.rm=T)
## [1] 2.598453
## Insert code for your group as needed:
w3 <- read.csv(file="~/Documents/Teaching/2019/stats/data/week_03/group_03.csv", row.names=NULL)
mean(w3$x, na.rm=T)
## [1] 2.759947
Given the structure of the full dataset, it’s also easy to calculate all of the group means:
s.means <- tapply(pop$x, pop$group, mean, na.rm=T)
s.means
## group_01 group_02 group_03 group_04 group_05 group_06 group_07 group_08
## 2.964142 3.163984 2.759947 2.303034 2.290273 2.387994 2.424987 2.400375
## group_09 group_10 group_11 group_12 group_13 group_14 group_15 group_16
## 2.887230 2.892782 2.376018 2.456387 2.489604 2.572719 2.786722 2.535294
## group_17 group_18 group_19 group_20
## 2.592676 2.354645 3.016203 2.314035
We will discuss the relationship of the individual group means to population mean in class. Basically, we can think of each group as a sample, so the sample means are the sampling distribution of the population mean.
I’ll do this two ways. First, just plugging the values from the group sample into the formula for the standard error, I can then add/subtract twice the standard from the mean to find the 95% CI.
se <- sd(w3$x, na.rm=T) / sqrt(length(w3$x[!is.na(w3$x)]))
mean(w3$x, na.rm=T)-(2*se)
## [1] 2.232594
mean(w3$x, na.rm=T)+(2*se)
## [1] 3.2873
Now, I’ll write a more general function to calculate confidence intervals. Note that I create an ‘alpha’ argument with a default value of 0.05. I then divide alpha by 2. Can you explain why this division step is necessary?
ci <- function (x, alpha=0.05) {
x <- x[!is.na(x)]
probs <- c(alpha/2, 1-alpha/2)
critical.values <- qnorm(probs, mean=0, sd=1)
se <- sd(x)/sqrt(length(x))
mean(x) + critical.values * se
}
## Here I run the function on the group 3 data
ci(w3$x)
## [1] 2.243150 3.276744
Again, it’s easy to estimate this for every group:
group.confints <- tapply(pop$x, pop$group, ci)
group.confints
## $group_01
## [1] 2.469464 3.458819
##
## $group_02
## [1] 2.629990 3.697978
##
## $group_03
## [1] 2.243150 3.276744
##
## $group_04
## [1] 1.846621 2.759447
##
## $group_05
## [1] 1.791081 2.789465
##
## $group_06
## [1] 1.965324 2.810665
##
## $group_07
## [1] 1.997502 2.852473
##
## $group_08
## [1] 1.985440 2.815311
##
## $group_09
## [1] 2.340745 3.433714
##
## $group_10
## [1] 2.400087 3.385476
##
## $group_11
## [1] 1.936974 2.815062
##
## $group_12
## [1] 2.042326 2.870447
##
## $group_13
## [1] 2.044957 2.934251
##
## $group_14
## [1] 2.164345 2.981092
##
## $group_15
## [1] 2.342309 3.231136
##
## $group_16
## [1] 2.064519 3.006068
##
## $group_17
## [1] 2.178324 3.007028
##
## $group_18
## [1] 1.889973 2.819317
##
## $group_19
## [1] 2.555803 3.476604
##
## $group_20
## [1] 1.918150 2.709919
We’ll discuss this one in class too. Since the samples are (random) samples, we should not be surprised that their individual group means are different from the population mean. We should also not be surprised that the 95% CI for the population mean estimated from at least one of the samples does not include the true population mean. Since our confidence interval is 95%, we would expect to be wrong about 1/20 times on average!
Comparing the distributions (first for the single comparison)
hist(pop$x)
hist(w3$x)
summary(pop$x)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00078 0.60682 2.08238 2.59845 4.03440 11.21858 100
sd(pop$x, na.rm=T)
## [1] 2.291625
summary(w3$x)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.001016 0.657997 2.266437 2.759947 4.089892 10.425881 5
sd(w3$x, na.rm=T)
## [1] 2.570002
And here’s code to do it for each group. A ggplot2 “faceted” histogram will look great and makes for concise code.
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
ggplot(pop, aes(x)) + geom_histogram() + facet_wrap(. ~ group)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 100 rows containing non-finite values (stat_bin).
tapply(pop$x, pop$group, summary)
## $group_01
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000836 0.623552 2.480939 2.964142 4.869991 7.996040 5
##
## $group_02
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.04996 0.98868 2.25888 3.16398 4.58581 9.64321 5
##
## $group_03
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.001016 0.657997 2.266437 2.759947 4.089892 10.425881 5
##
## $group_04
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.02008 0.44755 1.57473 2.30303 3.47370 9.87610 5
##
## $group_05
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000778 0.375376 1.387920 2.290273 3.839179 10.158698 5
##
## $group_06
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.007768 0.517054 2.038364 2.387994 3.766346 8.899269 5
##
## $group_07
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00432 0.51095 2.19910 2.42499 3.78105 9.35393 5
##
## $group_08
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.001657 0.691977 2.059659 2.400375 3.534149 9.170684 5
##
## $group_09
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.007371 0.474792 2.503350 2.887230 4.194422 11.142319 5
##
## $group_10
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.001859 0.872912 2.503538 2.892782 4.366617 10.468560 5
##
## $group_11
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.01069 0.65365 1.71980 2.37602 4.08116 9.70812 5
##
## $group_12
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.008528 0.540260 2.099041 2.456387 3.518414 7.945303 5
##
## $group_13
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.006755 0.547588 2.174567 2.489604 3.748151 9.907302 5
##
## $group_14
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.001154 0.905256 2.262228 2.572719 4.189871 7.278206 5
##
## $group_15
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.01192 0.92555 2.49983 2.78672 4.24195 9.16923 5
##
## $group_16
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.01626 0.67461 1.78174 2.53529 3.92996 10.49848 5
##
## $group_17
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.01861 0.69614 2.39607 2.59268 3.89538 8.20067 5
##
## $group_18
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.001341 0.419682 1.786937 2.354645 3.397951 11.218583 5
##
## $group_19
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.04522 1.28244 2.30096 3.01620 4.38815 9.91588 5
##
## $group_20
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.007343 0.620991 1.879363 2.314035 3.588999 8.116529 5
tapply(pop$x, pop$group, sd, na.rm=T)
## group_01 group_02 group_03 group_04 group_05 group_06 group_07 group_08
## 2.460005 2.655523 2.570002 2.269715 2.482454 2.101918 2.125861 2.063453
## group_09 group_10 group_11 group_12 group_13 group_14 group_15 group_16
## 2.717638 2.450142 2.183341 2.059100 2.211206 2.030818 2.210043 2.341134
## group_17 group_18 group_19 group_20
## 2.060548 2.310790 2.289548 1.968713
They all look a little bit different from each other and from the population distribution. We’ll discuss these differences in class. Again, none of this should be shocking given the relationship of the samples to the population.
s.means <- tapply(pop$x, pop$group, mean, na.rm=T)
s.means
## group_01 group_02 group_03 group_04 group_05 group_06 group_07 group_08
## 2.964142 3.163984 2.759947 2.303034 2.290273 2.387994 2.424987 2.400375
## group_09 group_10 group_11 group_12 group_13 group_14 group_15 group_16
## 2.887230 2.892782 2.376018 2.456387 2.489604 2.572719 2.786722 2.535294
## group_17 group_18 group_19 group_20
## 2.592676 2.354645 3.016203 2.314035
sd(s.means)
## [1] 0.2696987
## My standard error from one of the groups above:
se
## [1] 0.2636766
We will discuss the relationship of these values in class. As mentioned earlier, the distribution of sample means drawn from the population is the sampling distribution. The standard error of the mean estimated from any of the individual groups/samples should be a good approximation of (but not necessarily equal to!) the standard deviation of the sampling distribution of the means.
Since there’s going to be some randomness in the next few commands, I’ll set a seed value to ensure the results are consistent on other laptops.
set.seed(20190501)
pop.unif <- sample(seq(0, 9), 10000, replace=TRUE)
mean(pop.unif)
## [1] 4.5284
hist(pop.unif)
hist(
sapply( rep(1, 100), function (x) {
mean(sample(pop.unif, 2))}
)
)
Same commands as (c) just changing the sample size
hist(sapply(rep(1, 100), function (x) { mean(sample(pop.unif, 10))}))
hist(sapply(rep(1, 100), function (x) { mean(sample(pop.unif, 100))}))
We’ll discuss this in class. Noteable things you might observe include that the sampling distribution of the means approaches normality as it gets larger in size whether the population we draw from is uniform, log-normal, or really just about any other distribution. This is an illustration of some aspects of the central limit theorem. It is also an illustration of the t-distribution (the basis for the t-tests that you learned about this week).
\[ \overline{x}_{diff} \pm t^*_{df}\frac{s_{diff}}{\sqrt{n}}\]
Plug in the corresponding values and the Z-score for a 95% confidence interval (1.98):
\[-0.545 \pm 1.98 \times \frac{8.887}{\sqrt{200}} \] I’ll let R take it from here:
-0.545 - (1.98 * (8.887/sqrt(200)))
## [1] -1.789243
-0.545 + (1.98 * (8.887/sqrt(200)))
## [1] 0.6992435
With 95% confidence, the average difference between reading and writing test scores falls between -1.79 and 0.70.
No, because the interval includes/crosses zero.
We want to test the following hypotheses: \[H_0~:~\mu_{0.99}=\mu_{1}\] \[H_A~:~\mu_{0.99}\neq\mu_{1}\] To do so, we can use a two-sample t-test to compare the two sample means. The conditions we’d like to satisfy are independence and normality. Re: independence, the samples are random and not terribly large (presumably less than 10% of all the diamonds of each carat rating on earth), so we should be good to go. Re: normality, visual inspection of the histograms presented in the textbook suggests that what skew may be present in either distribution is not extreme.
Given that the conditions are met, here’s how you could construct the test statistic \(T\):
\[T~=~\frac{Point~estimate~-~Null}{Standard~error_{difference}}\]
Plugging in formulas from chapter 5 of the textbook this looks like:
\[T~=~ \frac{(\overline{x}_1-\overline{x}_2)-(0)}{\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}}\] Now, plug in values from the table:
\[T~=~ \frac{(44.51-56.81)-(0)}{\sqrt{\frac{13.32^2}{23}+\frac{16.13^2}{23}}}\] Work that out and you should have \(T~=~-2.82\). The degrees of freedom are estimated by the smaller of \(n_1-1\) or \(n_2-1\) (which are equal in this case), so \(df~=~22\). Consulting the table of T-statistics from the back of the book, we find:
\[p_{value}=P(T_{22} \gt 2.82) = 0.01\] Assuming we’re okay with a fale positive rate of \(p\leq0.05\), this provides support for the alternative hypothesis and we can reject the null of no difference between the average standardized prices of 0.99 and 1 carat diamonds.
I want to calculate the following:
\[(\overline{x_1}-\overline{x_2})~\pm~t^*_{df}\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}\] Since the critical values for the distribution of the test statistic \(T\) is available in R using the qt()
function, I can use a similar procedure here as in my ci()
function from earlier in the problem set to calculate the value of \(t_{df}\) (in this case, \(t_{22}\)). Note that if you didn’t use the qt()
part of this you could have looked up the critical value for a two-tailed test in the t distribution table at the back of the textbook (or elsewhere).
diff.means <- 44.51-56.81
t.star <- qt(0.025, df=22)
se <- sqrt( ((13.32^2)/23) + ((16.13^2)/23) )
diff.means - (t.star*se)
## [1] -3.254
diff.means + (t.star*se)
## [1] -21.346
In words, I am 95% confident that the average difference between the standardized prices of 0.99 carat diamond and a 1 carat diamond falls between $3.27 and $21.33 (the 1 carat diamond costs more).
\(H_0:\) The mean hours worked for the groups are all equal.
\[\mu_{\lt~hs} = \mu_{hs} = \mu_{jc} = \mu_{ba} = \mu_{grad} \] \(H_A:\) The mean hours worked vary by education level. In other words, the means are not equal.
Independent observations, normal(ish) distributions, and constant(ish) variance. The problem doesn’t say much about the sample to help evaluate the independence of the observations, but it’s definitely less than 10% of the population and is likely a fairly good approximation of a random sample (thereby satisfying the rule of thumb). From the boxplots the distributions all look fairly normal. The standard deviations are also similar. We’ll assume that the conditions are met for the purposes of the test.
We’ll discuss this one as a group. Personally, I find the focus on p-values somewhat thought-stopping and would prefer to see researchers and publications embrace reporting standards that yield more intuitive, flexible, and meaningful understanding. Confidence intervals seem to do this more effectively than p-values and, in that respect, I like confidence intervals quite a lot!
For RQ4, the units of analysis are the 109 respondents. The credibility index (a group of five survey items evaluating how credible the respondents perceived the blogs they read to be) was split at the mean and these high/low groups were the independent (grouping) variable in the ANOVA. The six perceived relationship management factors (survey items) were the dependent variables for the ANOVA.
The analysis tests whether the mean responses on the survey items composing each of the different relationship management factors were equal across the high/low blog credibility groups. The alternative hypotheses are whether there are differences between the groups for any of the perceived relationship management dimensions.
None of the ANOVA tests rejected the null hypothesis of no difference. In other words, there was no evidence that perceptions of relationship management dimensions varied across individuals perceiving blogs as low or high credibiliy.
It is (usually) a bit hard to say much from a null result! See the answer to (c) above.
Again, the units are the 109 respondents and the partitioned (low/high) credibility index serves as the independent (grouping) variable. The crisis index is the dependent variable.
The ANOVA tests whether average assessments of perceived crisis in the organization in question were equal by whether participants perceived the blogs to be low/high credibility. The alternative hypotheses are whether there are differences between the groups for perceptions of the organization being in crisis.
The results of the ANOVA reject the null, suggesting support for the alternative hypothesis that participants reporting low credibility blogs reported different (higher) scores on the crisis index.
I find the reported differences compelling, but would like more information in order to determine more specific takeaways. For example, I would like to see descriptive statistics about the various measures to help evaluate whether they meet the assumptions for identifying the ANOVA. Survey indices like this are a bit slippery insofar as they can seem to yield results when the differences are really artifacts of the measurements and how they are coded. I am also a bit concerned that the questions seemed to ask about blog credibility in general rather than the specific credibility of the specific blogs read by the study participants? The presumed relationship between credibility and the assignment to the blogs in question is not confirmed empirically, meaning that the differences in perceptions of organizational crisis might be more related to baseline attitudes than to anything specific about the treatment conditions in the experiment. I would also like to know more about the conditional means and standard errors in order to evaluate whether the pairwise average perceptions of organizational crisis vary across perceived credibility levels.
Analogous to RQ5 except that the (six) different dimensions of relationship management separated into high/low categories served as the independent (grouping) variables in the ANOVA. Perceptions of organizational crisis remained the dependent variable.
This set of ANOVAs test whether assessments of perceived organizational crisis were equal or varied depending on the prevalence of specific relationship management strategies.
The results of the ANOVAs are mixed, rejecting the null hypothesis of no difference for two of the relaionship management dimensions (task sharing and responsiveness/customer service), but failing to do so for the other four dimensions. For the two dimensions in question, higher prevalence of each strategy appeared to reduce the perception of crisis.
Again, the differences reported are compelling insofar as they indicate larger varation across the groups in terms of perceived crisis than would be expected in a world where no difference existed. That said, in addition to all the issues mentioned above for EQ5 part (d), this set of tests also raisesw issues of multiple comparisons. Not only is it useful to consider multiple comparisons in the context of a single ANOVA, but when running many ANOVAs across many grouping variables. If the authors really wanted to test whether the specific PR strategies mentioned here altered perceptions of organizational crisis across different types of blogs, they should/could have incorporated those variations into their experimental design much more directly. As such, the results remain suggestive that some of the observed relationships may exist, but can provide only weak support for even those that reject the null hypotheses in statistical tests.