revising interpretation in part III to clarify points about odds ratios

[stats_class_2020.git] / psets / pset8-worked_solution.rmd
diff --git a/psets/pset8-worked_solution.rmd b/psets/pset8-worked_solution.rmd

index 8c4680fe5536270a66c5976131191029e280b25d..b8210e0fd0bb9bf6cd9fa50bb4c903b1b800f22d 100644 (file)
--- a/psets/pset8-worked_solution.rmd
+++ b/psets/pset8-worked_solution.rmd
@@ -266,9 +266,9 @@ summary(fit)
  
  Interesting. Looks like adjusting for these other variables in a regression setting allows us to uncover some different the results. 
  
-Onwards to generating more interpretable results.
+Onwards to generating more interpretable results. You might recall that the big problem with interpreting logistic regression is that the results are given to you in "log-odds." Not only is it difficult to have intuitions about odds, but intuitions about the natural logarithms of odds are just intractable (for most of us). 
  
-First, odds-ratios instead of "raw" log-odds coefficients:
+To make things easier, the typical first step is to calculare odds-ratios instead of log-odds. This is done by exponentiating the coefficients (as well as the corresponding 95\% confidence intervals):
  
  ```{r}
  ## Odds ratios (exponentiated log-odds!)
@@ -276,6 +276,8 @@ exp(coef(fit))
  exp(confint(fit))
  ```
  
+You can use these to construct statements about the change in odds of the dependent variable flipping from 0 to 1 (or `FALSE` to `TRUE`) predicted by a 1-unit change in the corresponding predictor (where an odds ratio of 1 corresponds to unchanged odds). We'll interpret the `obamaTRUE` odds ratio below.
+
  Now, model-predicted probabilities for prototypical observations. Recall that it's necessary to create synthetic ("fake"), hypothetical individuals to generate predicted probabilities like these. In this case, I'll create two versions of each fake kid: one assigned to the treatment condition and one assigned to the control. Then I'll use the `predict()` function to generate fitted values for each of the fake kids.
  
  ```{r}
@@ -325,8 +327,10 @@ stargazer(m2012, m2014, m2015, column.labels = c("2012", "2014", "2015"), type="
  
  ## Interpret and discuss
  
-Well, for starters, the model providing a "pooled" estimate of treatment effects while adjusting for age, gender, and study year suggests that the point estimate is "marginally" statistically significant ($p <0.1$) indicating some evidence that the data support the alternative hypothesis (being shown a picture of Michelle Obama causes trick-or-treaters to be more likely to pick up fruit than the control condition). In more concrete terms, the trick-or-treaters shown the Obama picture were, on average, about 25\% more likely to pick up fruit than those exposed to the control (95\% CI:  $-4\%~-~+66\%$). In even more concrete terms, the estimated probability that a 9 year-old girl in 2015 and a 7 year-old boy in 2012 would take fruit increase about 17\% and 19\% respectively on average (from 29\% to 34\% in the case of the 9 year-old and from 21\% to 25\% in the case of the 7 year-old). These findings are sort of remarkable given the simplicity of the intervention and the fairly strong norm that Halloween is all about candy.
+Well, for starters, the model providing a "pooled" estimate of treatment effects while adjusting for age, gender, and study year suggests that the point estimate is "marginally" statistically significant ($p <0.1$) indicating some evidence that the data support the alternative hypothesis (being shown a picture of Michelle Obama causes trick-or-treaters to be more likely to pick up fruit than the control condition). In more concrete terms, the trick-or-treaters shown the Obama picture were, on average, about 26\% more likely to pick up fruit than those exposed to the control (95\% CI:  $-4\%~-~+66\%$).[^1] In even more concrete terms, the estimated probability that a 9 year-old girl in 2015 and a 7 year-old boy in 2012 would take fruit increase about 17\% and 19\% respectively on average (from 29\% to 34\% in the case of the 9 year-old and from 21\% to 25\% in the case of the 7 year-old). These findings are sort of remarkable given the simplicity of the intervention and the fairly strong norm that Halloween is all about candy.
+
+[^1]: Remember when I said we would use those odds ratios to interpret the parameter on `obamaTRUE`? Here we are. The parameter value is approximately 1.26, which means that the odds of picking fruit are, on average, 1.26 times as large for a trick-or-treater exposed to the picture of Michelle Obama versus a trick-or-treater in the control condition. In other words, the odds go up by about 26\% ($= \frac{1.26-1}{1}$).
  
-All of that said, the t-test results from Problem set 7 and the "unpooled" results reported in the sub-group analysis point to some potential concerns and limitations. For starters, the fact that the experiment was run iteratively over multiple years and that the sample size grew each year raises some concerns that the study design may not have anticipated the small effect sizes eventually observed and/or was adapted on the fly. This would undermine confidence in some of the test statistics and procedures. Furthermore, because the experiment occurred in sequential years, there's a very real possibility that the significance of a picture of Michelle Obama shifted during that time period and/or the house in question developed a reputation for being "that weird place where they show you pictures of Michelle Obama and offer you fruit." Whatever the case, my confidence in the findings here is not so great and I have some lingering suspicions that the results might not replicate.
+All of that said, the t-test results from Problem set 5 and the "unpooled" results reported in the sub-group analysis point to some potential concerns and limitations. For starters, the fact that the experiment was run iteratively over multiple years and that the sample size grew each year raises some concerns that the study design may not have anticipated the small effect sizes eventually observed and/or was adapted on the fly. This would undermine confidence in some of the test statistics and procedures. Furthermore, because the experiment occurred in sequential years, there's a very real possibility that the significance of a picture of Michelle Obama shifted during that time period and/or the house in question developed a reputation for being "that weird place where they show you pictures of Michelle Obama and offer you fruit." Whatever the case, my confidence in the findings here is not so great and I have some lingering suspicions that the results might not replicate.
  
  On a more nuanced/advanced statistical note, I also have some concerns about the standard errors. This goes beyond the content of our course, but basically, a randomized controlled trial introduces clustering into the data by-design (you can think of it as analogous to the observations coming from the treatment "cluster" and the control "cluster"). In this regard, the normal standard error formulas can be biased. Luckily, there's a fix for this: compute "robust" standard errors as a result and re-calculate the corresponding confidence intervals. Indeed, robust standard errors are often considered to be the best choice even when you don't know about potential latent clustering or heteroskedastic error structures in your data. [This a short pdf](https://oes.gsa.gov/assets/files/calculating-standard-errors-guidance.pdf) provides a little more explanation, citations, as well as example R code for how you might calculate robust standard errors.
 \ No newline at end of file