X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/37924e44ba85abcf6fa154c46d9d686c926aa4f3..refs/heads/master:/psets/pset8-worked_solution.html?ds=sidebyside diff --git a/psets/pset8-worked_solution.html b/psets/pset8-worked_solution.html index b2f7735..2ea1670 100644 --- a/psets/pset8-worked_solution.html +++ b/psets/pset8-worked_solution.html @@ -1723,6 +1723,18 @@ summary(model) ## Multiple R-squared: 0.719, Adjusted R-squared: 0.7108 ## F-statistic: 87.01 on 4 and 136 DF, p-value: < 2.2e-16
What do you know. That was it. The difference in \(R^2\) is huge!
+A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the title
field from the original dataset using the following block of code:
data(mariokart)
+
+mariokart %>%
+ filter(total_pr > 100) %>%
+ select(id, total_pr, title)
+## # A tibble: 2 x 3
+## id total_pr title
+## <dbl> <dbl> <fct>
+## 1 110439174663 327. "Nintedo Wii Console Bundle Guitar Hero 5 Mario Kart "
+## 2 130335427560 118. "10 Nintendo Wii Games - MarioKart Wii, SpiderMan 3, etâ¦
+What do you make of the textbook authorsâ decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation?