X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/37924e44ba85abcf6fa154c46d9d686c926aa4f3..c36ea661e7d087328543688db2ccdc22e12ededd:/psets/pset8-worked_solution.html diff --git a/psets/pset8-worked_solution.html b/psets/pset8-worked_solution.html index b2f7735..2ea1670 100644 --- a/psets/pset8-worked_solution.html +++ b/psets/pset8-worked_solution.html @@ -1723,6 +1723,18 @@ summary(model) ## Multiple R-squared: 0.719, Adjusted R-squared: 0.7108 ## F-statistic: 87.01 on 4 and 136 DF, p-value: < 2.2e-16

What do you know. That was it. The difference in \(R^2\) is huge!

+

A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the title field from the original dataset using the following block of code:

+
data(mariokart)
+
+mariokart %>%
+  filter(total_pr > 100) %>%
+  select(id, total_pr, title)
+
## # A tibble: 2 x 3
+##             id total_pr title                                                   
+##          <dbl>    <dbl> <fct>                                                   
+## 1 110439174663     327. "Nintedo Wii Console Bundle Guitar Hero 5 Mario Kart "  
+## 2 130335427560     118. "10 Nintendo Wii Games - MarioKart Wii, SpiderMan 3, et…
+

What do you make of the textbook authors’ decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation?

Interpret some results