From: aaronshaw <aaron.d.shaw@gmail.com> Date: Wed, 25 Nov 2020 04:19:47 +0000 (-0600) Subject: updating explanation of part I to address reasons for dropped observations X-Git-Url: https://code.communitydata.science/stats_class_2020.git/commitdiff_plain/c36ea661e7d087328543688db2ccdc22e12ededd?ds=sidebyside;hp=37924e44ba85abcf6fa154c46d9d686c926aa4f3 updating explanation of part I to address reasons for dropped observations --- diff --git a/psets/pset8-worked_solution.html b/psets/pset8-worked_solution.html index b2f7735..2ea1670 100644 --- a/psets/pset8-worked_solution.html +++ b/psets/pset8-worked_solution.html @@ -1723,6 +1723,18 @@ summary(model)</code></pre> ## Multiple R-squared: 0.719, Adjusted R-squared: 0.7108 ## F-statistic: 87.01 on 4 and 136 DF, p-value: < 2.2e-16</code></pre> <p>What do you know. That was it. The difference in <span class="math inline">\(R^2\)</span> is huge!</p> +<p>A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the <code>title</code> field from the original dataset using the following block of code:</p> +<pre class="r"><code>data(mariokart) + +mariokart %>% + filter(total_pr > 100) %>% + select(id, total_pr, title)</code></pre> +<pre><code>## # A tibble: 2 x 3 +## id total_pr title +## <dbl> <dbl> <fct> +## 1 110439174663 327. "Nintedo Wii Console Bundle Guitar Hero 5 Mario Kart " +## 2 130335427560 118. "10 Nintendo Wii Games - MarioKart Wii, SpiderMan 3, etâ¦</code></pre> +<p>What do you make of the textbook authorsâ decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation?</p> </div> <div id="interpret-some-results" class="section level2"> <h2>Interpret some results</h2> diff --git a/psets/pset8-worked_solution.pdf b/psets/pset8-worked_solution.pdf index eacef57..6a5fba8 100644 Binary files a/psets/pset8-worked_solution.pdf and b/psets/pset8-worked_solution.pdf differ diff --git a/psets/pset8-worked_solution.rmd b/psets/pset8-worked_solution.rmd index b8210e0..7901f54 100644 --- a/psets/pset8-worked_solution.rmd +++ b/psets/pset8-worked_solution.rmd @@ -124,6 +124,20 @@ summary( What do you know. That was it. The difference in $R^2$ is huge! +A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the `title` field from the original dataset using the following block of code: + +```{r} +data(mariokart) + +mariokart %>% + filter(total_pr > 100) %>% + select(id, total_pr, title) + +``` + + +What do you make of the textbook authors' decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation? + ## Interpret some results The issues above notwithstanding, we can march ahead and interpret the results of the original model that I fit. Here are some general comments and some specifically focused on the `cond_new` and `stock_photo` variables: