All exercises taken from the OpenIntro Statistics textbook, \(4^{th}\) edition, Chapter 9.
\[\widehat{days} = 18.93 - (9.11\times{ethnicity}) + (3.10 \times sex) + (2.15 \times learner~status)\]
18.93 - (9.11*0) + (3.10 * 1) + (2.15 * 1)
## [1] 24.18
The observed outcome for this student (\(y_i\)) was \(2\) days absent. So the residual is \(2-24.18=-22.18\).
\[R^2 = 1-\frac{\sigma^2_e}{\sigma^2_y}= 1-\frac{240.57}{264.17} = 0.0893\] \[R^2_{adj} = 1-\frac{ \frac{\sigma^2_e}{(n-p-1)} }{\frac{\sigma^2_y}{n-1}}= 1-\frac{ \frac{240.57}{146-3-1}} {\frac{264.17}{146-1}} = 0.0701\]
The damaged O-rings almost all occurred at the lower launch-time temperatures, with the lowest launch temperature accounting for nearly half of the total number of damaged O-rings.
The model suggests that lower launch-time temperatures result in a higher probability of O-ring damage. The coefficient of the “Temperature” term is negative with a very small (proportionally speaking) standard error. It is statistically significant with a p-value near 0 (\(H_0:~\beta_{temp} = 0\), \(H_A:~\beta_{temp}\neq 0\)), indicating that the data provide evidence that the coefficient is likely different from 0. By exponentiating the coefficient (see the R-code below), we see that a one degree farenheit increase in launch-time temperature is associated with 81% as large odds of O-ring damage occurring. In other words, the model indicates that higher launch temperatures associate with reduced odds of O-ring damage.
exp(-.2162)
## [1] 0.8055742
probs
that takes fitted values from the model and runs them through the inverse logistic function to return probabilities (see the textbook for some of the algebraic details here). I can test my function on some of the example model-estimated probabilities provided in the textbook:probs <- function(x){
p.hat <- exp(11.663-(0.2162*x) )
pred <- p.hat / (1 + p.hat) # inverse logit
return(round(pred, 3))
}
## examples
probs(57)
## [1] 0.341
probs(65)
## [1] 0.084
Both of those look good, so now I can plug in the values the problem asks me to solve for:
vals <- c(51, 53, 55)
probs(vals)
## [1] 0.654 0.551 0.443
Note that the question asks for a “smooth curve” fit to the dots. There are many ways to do this in ggplot. I demonstrate one here using geom_smooth()
that fits a quadratic function (\(y = x^2\)) to the points. You might experiment with the different options for geom_smooth()
or, for a simpler solution, just try geom_line()
(with no arguments) instead.
temp = seq(51, 71, by=2) # This creates a vector from 51 to 71, counting by twos
preds <- data.frame( # I'll want the data frame for ggplot
temp,
pred.probability = probs(temp) # store the probabilities as another vector
)
library(ggplot2)
ggplot(data=preds, aes(x=temp, y=pred.probability)) +
geom_point(color="purple") + # Plot the points
geom_smooth(color="orange", # Add a smooth line
method="glm", # Create a line fit to the data
formula = y ~ poly(x, 2), # Using this formula
se=FALSE) # Don't show standard errors
Furthermore, if the model treats each O-ring as an indpendent trial, about 50% of the observed failures occurred in a single mission—the mission with the lowest observed launch-time temperature. The result is that this one mission with its one launch temperature could drive the model results disproportionately (it generates observations that exert “high leverage” on the model fit to the data). Without knowing ahead of time that temperature was a likely explanation (as compared against any of the other infinite details of that one mission), it’s hard to see how NASA analysts necessarily should have drawn this conclusion on the basis of evidence like this.