updating explanation of part I to address reasons for dropped observations

author aaronshaw <aaron.d.shaw@gmail.com>

Wed, 25 Nov 2020 04:19:47 +0000 (22:19 -0600)

committer aaronshaw <aaron.d.shaw@gmail.com>

Wed, 25 Nov 2020 04:19:47 +0000 (22:19 -0600)
author aaronshaw <aaron.d.shaw@gmail.com>
Wed, 25 Nov 2020 04:19:47 +0000 (22:19 -0600)
committer aaronshaw <aaron.d.shaw@gmail.com>
Wed, 25 Nov 2020 04:19:47 +0000 (22:19 -0600)
diff --git a/psets/pset8-worked_solution.html b/psets/pset8-worked_solution.html

index b2f7735689e2574d856c7096ddcbe447ece0bb04..2ea1670ae836933cccec6f4546d77cfc05b2b884 100644 (file)
--- a/psets/pset8-worked_solution.html
+++ b/psets/pset8-worked_solution.html
@@ -1723,6 +1723,18 @@ summary(model)</code></pre>
  ## Multiple R-squared:  0.719,  Adjusted R-squared:  0.7108 
  ## F-statistic: 87.01 on 4 and 136 DF,  p-value: &lt; 2.2e-16</code></pre>
  <p>What do you know. That was it. The difference in <span class="math inline">\(R^2\)</span> is huge!</p>
+<p>A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the <code>title</code> field from the original dataset using the following block of code:</p>
+<pre class="r"><code>data(mariokart)
+
+mariokart %&gt;%
+  filter(total_pr &gt; 100) %&gt;%
+  select(id, total_pr, title)</code></pre>
+<pre><code>## # A tibble: 2 x 3
+##             id total_pr title                                                   
+##          &lt;dbl&gt;    &lt;dbl&gt; &lt;fct&gt;                                                   
+## 1 110439174663     327. &quot;Nintedo Wii Console Bundle Guitar Hero 5 Mario Kart &quot;  
+## 2 130335427560     118. &quot;10 Nintendo Wii Games - MarioKart Wii, SpiderMan 3, et…</code></pre>
+<p>What do you make of the textbook authors’ decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation?</p>
  </div>
  <div id="interpret-some-results" class="section level2">
  <h2>Interpret some results</h2>
diff --git a/psets/pset8-worked_solution.pdf b/psets/pset8-worked_solution.pdf

index eacef5755c58ac695a0625eba43edf62618216e2..6a5fba833c12ab2f3e4e578c5432ba6e96e76c52 100644 (file)

Binary files a/psets/pset8-worked_solution.pdf and b/psets/pset8-worked_solution.pdf differ
diff --git a/psets/pset8-worked_solution.rmd b/psets/pset8-worked_solution.rmd

index b8210e0fd0bb9bf6cd9fa50bb4c903b1b800f22d..7901f54dffd2cc3ad6e6791667629be4c984bc59 100644 (file)
--- a/psets/pset8-worked_solution.rmd
+++ b/psets/pset8-worked_solution.rmd
@@ -124,6 +124,20 @@ summary(
  
  What do you know. That was it. The difference in $R^2$ is huge! 
  
+A little further digging (by Nick Vincent) revealed that these two outliers come from auctions where the Mario kart game was being sold as part of a bundle along with other games. You can look this up in the `title` field from the original dataset using the following block of code:
+
+```{r}
+data(mariokart)
+
+mariokart %>% 
+  filter(total_pr > 100) %>% 
+  select(id, total_pr, title)
+
+```
+
+
+What do you make of the textbook authors' decision to drop the observations? Can you make a case for/against doing so? What seems like the right decision and the best way to handle this kind of situation?
+
  ## Interpret some results  
  
  The issues above notwithstanding, we can march ahead and interpret the results of the original model that I fit. Here are some general comments and some specifically focused on the `cond_new` and `stock_photo` variables:
author	aaronshaw <aaron.d.shaw@gmail.com>
	Wed, 25 Nov 2020 04:19:47 +0000 (22:19 -0600)
committer	aaronshaw <aaron.d.shaw@gmail.com>
	Wed, 25 Nov 2020 04:19:47 +0000 (22:19 -0600)
psets/pset8-worked_solution.html		patch \| blob \| history
psets/pset8-worked_solution.pdf		patch \| blob \| history
psets/pset8-worked_solution.rmd		patch \| blob \| history