X-Git-Url: https://code.communitydata.science/stats_class_2019.git/blobdiff_plain/ff24cba0648615296d49a623cb08a03b90454f17..15a75e0555c577856bf7d01910c2e9861ae4d640:/problem_sets/week_05/ps5-worked_solution.html diff --git a/problem_sets/week_05/ps5-worked_solution.html b/problem_sets/week_05/ps5-worked_solution.html index d4f103d..007b07d 100644 --- a/problem_sets/week_05/ps5-worked_solution.html +++ b/problem_sets/week_05/ps5-worked_solution.html @@ -222,16 +222,16 @@ s.means ## 2.887230 2.892782 2.376018 2.456387 2.489604 2.572719 2.786722 2.535294 ## group_17 group_18 group_19 group_20 ## 2.592676 2.354645 3.016203 2.314035 -
We will discuss the relationship of the individual group means to population mean in class.
+We will discuss the relationship of the individual group means to population mean in class. Basically, we can think of each group as a sample, so the sample means are the sampling distribution of the population mean.
Iâll do this two ways. First, just plugging the values into the formula for the standard error, I can then add/subtract twice the standard from the mean to find the 95% CI.
-se <- sd(w3$x, na.rm=T) / sqrt(length(w3$x))
+Iâll do this two ways. First, just plugging the values from the group sample into the formula for the standard error, I can then add/subtract twice the standard from the mean to find the 95% CI.
+se <- sd(w3$x, na.rm=T) / sqrt(length(w3$x[!is.na(w3$x)]))
mean(w3$x, na.rm=T)-(2*se)
-## [1] 2.245946
+## [1] 2.232594
mean(w3$x, na.rm=T)+(2*se)
-## [1] 3.273947
+## [1] 3.2873
Now, Iâll write a more general function to calculate confidence intervals. Note that I create an âalphaâ argument with a default value of 0.05. I then divide alpha by 2. Can you explain why this division step is necessary?
ci <- function (x, alpha=0.05) {
x <- x[!is.na(x)]
@@ -309,7 +309,7 @@ group.confints
Weâll discuss this one in class.
+Weâll discuss this one in class too. Since the samples are (random) samples, we should not be surprised that their individual group means are different from the population mean. We should also not be surprised that the 95% CI for the population mean estimated from at least one of the samples does not include the true population mean. Since our confidence interval is 95%, we would expect to be wrong about 1/20 times on average!
They all look a little bit different from each other and from the population distribution. Weâll discuss these differences in class.
+They all look a little bit different from each other and from the population distribution. Weâll discuss these differences in class. Again, none of this should be shocking given the relationship of the samples to the population.
## [1] 0.2696987
## My standard error from one of the groups above:
se
-## [1] 0.2570002
-We will discuss the relationship of these values in class.
+## [1] 0.2636766
+We will discuss the relationship of these values in class. As mentioned earlier, the distribution of sample means drawn from the population is the sampling distribution. The standard error of the mean estimated from any of the individual groups/samples should be a good approximation of (but not necessarily equal to!) the standard deviation of the sampling distribution of the means.
Weâll discuss this in class.
+Weâll discuss this in class. Noteable things you might observe include that the sampling distribution of the means approaches normality as it gets larger in size whether the population we draw from is uniform, log-normal, or really just about any other distribution. This is an illustration of some aspects of the central limit theorem. It is also an illustration of the t-distribution (the basis for the t-tests that you learned about this week).
None of the ANOVA tests rejected the null hypothesis of no difference. In other words, there was no evidence that perceptions of relationship management dimensions varied across individuals perceiving blogs as low or high credibiliy.
It is (usually) a bit hard to say much from a null result! See the answer to (c) above.
Again, the units are the 109 respondents and the partitioned (low/high) credibility index serves as the independent (grouping) variable. The crisis index is the dependent variable.
The ANOVA tests whether average assessments of perceived crisis in the organization in question were equal by whether participants perceived the blogs to be low/high credibility. The alternative hypotheses are whether there are differences between the groups for perceptions of the organization being in crisis.
I find the reported differences compelling, but would like more information in order to determine more specific takeaways. For example, I would like to see descriptive statistics about the various measures to help evaluate whether they meet the assumptions for identifying the ANOVA. Survey indices like this are a bit slippery insofar as they can seem to yield results when the differences are really artifacts of the measurements and how they are coded. I am also a bit concerned that the questions seemed to ask about blog credibility in general rather than the specific credibility of the specific blogs read by the study participants? The presumed relationship between credibility and the assignment to the blogs in question is not confirmed empirically, meaning that the differences in perceptions of organizational crisis might be more related to baseline attitudes than to anything specific about the treatment conditions in the experiment. I would also like to know more about the conditional means and standard errors in order to evaluate whether the pairwise average perceptions of organizational crisis vary across perceived credibility levels.
Analogous to RQ5 except that the (six) different dimensions of relationship management separated into high/low categories served as the independent (grouping) variables in the ANOVA. Perceptions of organizational crisis remained the dependent variable.
This set of ANOVAs test whether assessments of perceived organizational crisis were equal or varied depending on the prevalence of specific relationship management strategies.