X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/858c9ba53122dcf1459725e3e2ece720df1d5139..1bc769ae3e0682427ea09f21cc263dddd459d009:/os_exercises/ch9_exercises_solutions.html diff --git a/os_exercises/ch9_exercises_solutions.html b/os_exercises/ch9_exercises_solutions.html new file mode 100644 index 0000000..d4b89aa --- /dev/null +++ b/os_exercises/ch9_exercises_solutions.html @@ -0,0 +1,1716 @@ + + + + +
+ + + + + + + + + + +All exercises taken from the OpenIntro Statistics textbook, \(4^{th}\) edition, Chapter 9.
+\[\widehat{days} = 18.93 - (9.11\times{ethnicity}) + (3.10 \times sex) + (2.15 \times learner~status)\]
+18.93 - (9.11*0) + (3.10 * 1) + (2.15 * 1)
+## [1] 24.18
+The observed outcome for this student (\(y_i\)) was \(2\) days absent. So the residual is \(2-24.18=-22.18\).
+\[R^2 = 1-\frac{\sigma^2_e}{\sigma^2_y}= 1-\frac{240.57}{264.17} = 0.0893\] \[R^2_{adj} = 1-\frac{ \frac{\sigma^2_e}{(n-p-1)} }{\frac{\sigma^2_y}{n-1}}= 1-\frac{ \frac{240.57}{146-3-1}} {\frac{264.17}{146-1}} = 0.0701\]
+The damaged O-rings almost all occurred at the lower launch-time temperatures, with the lowest launch temperature accounting for nearly half of the total number of damaged O-rings.
The model suggests that lower launch-time temperatures result in a higher probability of O-ring damage. The coefficient of the âTemperatureâ term is negative with a very small (proportionally speaking) standard error. It is statistically significant with a p-value near 0 (\(H_0:~\beta_{temp} = 0\), \(H_A:~\beta_{temp}\neq 0\)), indicating that the data provide evidence that the coefficient is likely different from 0. By exponentiating the coefficient (see the R-code below), we see that a one degree farenheit increase in launch-time temperature is associated with 81% as large odds of O-ring damage occurring. In other words, the model indicates that higher launch temperatures associate with reduced odds of O-ring damage.
exp(-.2162)
+## [1] 0.8055742
+probs
that takes fitted values from the model and runs them through the inverse logistic function to return probabilities (see the textbook for some of the algebraic details here). I can test my function on some of the example model-estimated probabilities provided in the textbook:probs <- function(x){
+ p.hat <- exp(11.663-(0.2162*x) )
+ pred <- p.hat / (1 + p.hat) # inverse logit
+ return(round(pred, 3))
+}
+
+## examples
+probs(57)
+## [1] 0.341
+probs(65)
+## [1] 0.084
+Both of those look good, so now I can plug in the values the problem asks me to solve for:
+vals <- c(51, 53, 55)
+
+probs(vals)
+## [1] 0.654 0.551 0.443
+Note that the question asks for a âsmooth curveâ fit to the dots. There are many ways to do this in ggplot. I demonstrate one here using geom_smooth()
that fits a quadratic function (\(y = x^2\)) to the points. You might experiment with the different options for geom_smooth()
or, for a simpler solution, just try geom_line()
(with no arguments) instead.
temp = seq(51, 71, by=2) # This creates a vector from 51 to 71, counting by twos
+
+preds <- data.frame( # I'll want the data frame for ggplot
+ temp,
+ pred.probability = probs(temp) # store the probabilities as another vector
+)
+
+library(ggplot2)
+
+ggplot(data=preds, aes(x=temp, y=pred.probability)) +
+ geom_point(color="purple") + # Plot the points
+ geom_smooth(color="orange", # Add a smooth line
+ method="glm", # Create a line fit to the data
+ formula = y ~ poly(x, 2), # Using this formula
+ se=FALSE) # Don't show standard errors
+
+Furthermore, if the model treats each O-ring as an indpendent trial, about 50% of the observed failures occurred in a single missionâthe mission with the lowest observed launch-time temperature. The result is that this one mission with its one launch temperature could drive the model results disproportionately (it generates observations that exert âhigh leverageâ on the model fit to the data). Without knowing ahead of time that temperature was a likely explanation (as compared against any of the other infinite details of that one mission), itâs hard to see how NASA analysts necessarily should have drawn this conclusion on the basis of evidence like this.
+