updating explanation of part I to address reasons for dropped observations

[stats_class_2020.git] / psets / pset8-worked_solution.rmd
diff --git a/psets/pset8-worked_solution.rmd b/psets/pset8-worked_solution.rmd

index b8210e0fd0bb9bf6cd9fa50bb4c903b1b800f22d..7901f54dffd2cc3ad6e6791667629be4c984bc59 100644 (file)
--- a/psets/pset8-worked_solution.rmd
+++ b/psets/pset8-worked_solution.rmd
@@ -124,6 +124,20 @@ summary(
  
  What do you know. That was it. The difference in $R^2$ is huge! 
  
+A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the `title` field from the original dataset using the following block of code:
+
+```{r}
+data(mariokart)
+
+mariokart %>% 
+  filter(total_pr > 100) %>% 
+  select(id, total_pr, title)
+
+```
+
+
+What do you make of the textbook authors' decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation?
+
  ## Interpret some results  
  
  The issues above notwithstanding, we can march ahead and interpret the results of the original model that I fit. Here are some general comments and some specifically focused on the `cond_new` and `stock_photo` variables: