From: aaronshaw Date: Wed, 25 Nov 2020 04:19:47 +0000 (-0600) Subject: updating explanation of part I to address reasons for dropped observations X-Git-Url: https://code.communitydata.science/stats_class_2020.git/commitdiff_plain/c36ea661e7d087328543688db2ccdc22e12ededd?ds=sidebyside updating explanation of part I to address reasons for dropped observations --- diff --git a/psets/pset8-worked_solution.html b/psets/pset8-worked_solution.html index b2f7735..2ea1670 100644 --- a/psets/pset8-worked_solution.html +++ b/psets/pset8-worked_solution.html @@ -1723,6 +1723,18 @@ summary(model) ## Multiple R-squared: 0.719, Adjusted R-squared: 0.7108 ## F-statistic: 87.01 on 4 and 136 DF, p-value: < 2.2e-16

What do you know. That was it. The difference in \(R^2\) is huge!

+

A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the title field from the original dataset using the following block of code:

+
data(mariokart)
+
+mariokart %>%
+  filter(total_pr > 100) %>%
+  select(id, total_pr, title)
+
## # A tibble: 2 x 3
+##             id total_pr title                                                   
+##          <dbl>    <dbl> <fct>                                                   
+## 1 110439174663     327. "Nintedo Wii Console Bundle Guitar Hero 5 Mario Kart "  
+## 2 130335427560     118. "10 Nintendo Wii Games - MarioKart Wii, SpiderMan 3, et…
+

What do you make of the textbook authors’ decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation?

Interpret some results

diff --git a/psets/pset8-worked_solution.pdf b/psets/pset8-worked_solution.pdf index eacef57..6a5fba8 100644 Binary files a/psets/pset8-worked_solution.pdf and b/psets/pset8-worked_solution.pdf differ diff --git a/psets/pset8-worked_solution.rmd b/psets/pset8-worked_solution.rmd index b8210e0..7901f54 100644 --- a/psets/pset8-worked_solution.rmd +++ b/psets/pset8-worked_solution.rmd @@ -124,6 +124,20 @@ summary( What do you know. That was it. The difference in $R^2$ is huge! +A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the `title` field from the original dataset using the following block of code: + +```{r} +data(mariokart) + +mariokart %>% + filter(total_pr > 100) %>% + select(id, total_pr, title) + +``` + + +What do you make of the textbook authors' decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation? + ## Interpret some results The issues above notwithstanding, we can march ahead and interpret the results of the original model that I fit. Here are some general comments and some specifically focused on the `cond_new` and `stock_photo` variables: