X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/37924e44ba85abcf6fa154c46d9d686c926aa4f3..HEAD:/psets/pset8-worked_solution.rmd diff --git a/psets/pset8-worked_solution.rmd b/psets/pset8-worked_solution.rmd index b8210e0..7901f54 100644 --- a/psets/pset8-worked_solution.rmd +++ b/psets/pset8-worked_solution.rmd @@ -124,6 +124,20 @@ summary( What do you know. That was it. The difference in $R^2$ is huge! +A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the `title` field from the original dataset using the following block of code: + +```{r} +data(mariokart) + +mariokart %>% + filter(total_pr > 100) %>% + select(id, total_pr, title) + +``` + + +What do you make of the textbook authors' decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation? + ## Interpret some results The issues above notwithstanding, we can march ahead and interpret the results of the original model that I fit. Here are some general comments and some specifically focused on the `cond_new` and `stock_photo` variables: