assessment/interactive_assessment.rmd

   1 ---
   2 title: "Interactive Self-Assessment"
   3 subtitle: "Fall 2020 MTS 525 / COMMST 395 Statistics and Statistical Programming"
   4 output: learnr::tutorial
   5 runtime: shiny_prerendered
   6 ---
   7
   8
   9 ```{r setup, include=FALSE}
  10 library(learnr)
  11 library(tidyverse)
  12
  13 knitr::opts_chunk$set(echo = FALSE, tidy=TRUE)
  14
  15 t <- Sys.time()
  16 question_filename <- paste("question_submission_", t, ".csv", sep="")
  17 code_filename <- paste("code_", t, ".csv", sep="")
  18
  19 #df <- data.frame(label=c('test'), question=c('asd'), answer=c('asd'), correct=c(TRUE), stringsAsFactors=FALSE)
  20 df <- data.frame()
  21
  22 tutorial_event_recorder <- function(tutorial_id, tutorial_version, user_id,
  23                                     event, data) {
  24   # quiz question answered
  25   if (event == "question_submission"){
  26     # nick exasperatedly believes this is the correct way to index the result of strsplit... [[1]][[1]]
  27     data$category <- strsplit(data$label, '_')[[1]][[1]]
  28
  29     df <<- rbind(df, data, stringsAsFactors=FALSE)
  30     #write.table(data, question_filename, append=TRUE, sep=",", row.names=TRUE, col.names=FALSE)
  31     write.table(df, question_filename, append=FALSE, sep=",", row.names=TRUE, col.names=TRUE)
  32
  33   }
  34   # code
  35   if (event == "exercise_submitted"){
  36     write.table(data, code_filename, append=TRUE, sep=",", row.names=TRUE, col.names=FALSE)
  37   }
  38
  39 }
  40 options(tutorial.event_recorder = tutorial_event_recorder)
  41 ```
  42
  43
  44
  45 ## Overview
  46
  47 This is document contains R Markdown code for an *Interactive Self Assessment*. Through this assessment, both students and the teaching team can check in on learning progress.
  48
  49 The Self Assessment is broken into six sections, described below. You can navigate throughout the document using the left-hand column. In general, completely this assessment should take about 60 minutes.
  50
  51 * Overview: you are here.
  52 * Section 1, Warmup Exercises. Contains warm-ups to help you become familiar with the interactive `learnr` environment (learnr is the R package that this assessment relies on). 1 coding question, 2 multiple choice questions. Time estimate: 5 min.
  53 * Section 2, Debugging and Reading R Code. Contains a series of questions that will require you to work with existing R code. 3 coding questions, 4 multiple choice questions. Time estimate: 15 minutes.
  54 * Section 3, Statistics Concepts and Definitions. 12 multiple choice questions about statistics concepts and definitions. Time estimate: 15 minutes.
  55 * Section 4, Distributions. 3 multiples choice questions that involve some minor calculations. Time estimate: 5 minutes.
  56 * Section 5, Computing Probabilities. 6 multiple choice questions that involve calulating probabilities of events. These calculations are more involved than Section 4. Time estimate: 15 minutes.
  57 * Section 6, Helpful Formulas. Contains some helpful formulas that may be useful for Sections 3-5.
  58 * Section 7, Answer Report. Time estimate: 5 minutes. Here, you can use R code (some of which is prepopulated for you) to analyze (or visualize) your performance on the assessment. This provides a way for you to (1) practice exploratory analyses R with data you created yourself (by answering questions) and (2) get immediate feedback about your performance.
  59
  60 Note that you can clear **all** your answers to *all* questions by clicking "Start Over" in the left-hand sidebar, but doing that basically erases all progress in the document and your answers to every question will be deleted. *Use with caution* (if at all)!
  61
  62 ## Section 1, Warm-up Exercises
  63
  64 This section contains quick warm-up questions, so you can become familiar with how `learnr` works and what to expect from this activity.
  65
  66 ### Code Chunk Warm-up
  67
  68 To get familiar with how code chunks work in `learnr`, let's write R code required to add two numbers: 1234 and 5678 (and the answer is 6912).
  69
  70 The code chunk below is editable and is "pre-populated" with an unfinished function definition. The goal is to add arguments and fill in the body of the function. When finished, you can run the code chunk and it should produce the answer.
  71
  72 If you click "Run Code", you should see the answer below the chunk. That answer will persist as you navigate around this doc.
  73
  74 You can clear your answers by clicking "Start Over" in the top-left of the chunk. You can also clear **all** your answers by clicking "Start Over" in the left-hand sidebar, but doing that basically erases all progress in the document *Use with caution!*
  75
  76 ```{r WarmUp_1, exercise=TRUE, exercise.lines=10}
  77 add <- function() {
  78
  79 }
  80 x = 1234
  81 y = 5678
  82 add(x,y)
  83 ```
  84
  85 ```{r WarmUp_1-solution}
  86 add <- function(value1, value2) {
  87   return(value1 + value2)
  88 }
  89
  90 x = 1234
  91 y = 5678
  92
  93 add(x,y)
  94 ```
  95
  96 ### Multiple Choice Question Warmup
  97 The question below shows how the multiple choice answering and feedback works.
  98 ```{r WarmUp_2}
  99 quiz(
 100   question("Select the answer choice that will return `TRUE` in R.",
 101     answer("1 == 1", message="Good work! Feedback appears here.", correct=TRUE),
 102     answer("1 == 0", message="Not quite! Feedback appears here."),
 103     allow_retry = TRUE
 104   )
 105 )
 106 ```
 107
 108
 109 ## Useful Formulas
 110 Sample Mean (sample statistic):
 111 $\bar{x}=\frac{\sum_{i=1}^n x_i}{n}$
 112
 113 Standard deviation:
 114 $s=\sqrt{\frac{\sum_{i=1}^n (x_i-\bar{x})^2}{n-1}}$
 115
 116 Variance:
 117 $var = s^2$
 118
 119 Useful probability axioms:
 120
 121 Complement:
 122 $\mbox{Pr}(A^c)=1-\mbox{Pr}(A)$
 123
 124 Probability of two *independent* events both happening:
 125 Pr(A and B) = Pr(A) $\times$ Pr(B)
 126
 127 Probability of one of two events happening:
 128 Pr(A or B) = Pr(A) + Pr(B) - Pr(A and B)
 129
 130 Conditional probability:
 131 $\mbox{Pr}(A|B)=\frac{\mbox{Pr(A and B)}}{\mbox{Pr(B)}}$
 132
 133 Population mean (population statistic):
 134 $\mu = \sum_{i=1}^{n}x\mbox{Pr}(x)$
 135
 136 Z-score:
 137 $z=\frac{x-\mu}{\sigma}$
 138
 139 Standard errors:
 140
 141 $SE=\frac{\sigma}{\sqrt{n}}$
 142
 143 $SE_{proportion}=\sqrt{\frac{p(1-p)}{n}}$
 144
 145 Identifying outliers using Interquartile Range (IRQ):
 146 $Q_1 - 1.5 \times IQR, \quad Q_3 + 1.5 \times IQR$
 147
 148
 149 ## Section 2: Writing and Debugging R Code
 150
 151 ### Debugging a Function
 152 Below, you'll see code to define a function that is *supposed* to perform a transformation on a vector. The problem is that it doesn't work right now.
 153
 154 In theory, the function will take a numeric vector as input (let's call it $x$) and scale the values so they lie between zero and one. (This is sometimes called min-max [feature scaling](https://en.wikipedia.org/wiki/Feature_scaling), and is sometimes used for machine learning.)
 155
 156 The way it *should* do this is by first subtracting the minimum value of $x$ from each element of $x$. Then, the function will divide each element by the difference between the maximum value of $x$ and the minimum value of $x$.
 157
 158 As written now, however, the function does not work! There are at least three issues you will need to fix to get it working. Once you fix them, you should be able to confirm that your function works with the pre-populated example (with the correct output provided). You might also be able to make this code more "elegant" (or alternatively, improve the comments and variable names as you see fit).
 159
 160 Bonus: how might we update this function to scale between any "floor" and "ceiling" value?
 161
 162 ```{r R_debug1, exercise=TRUE}
 163 zeroToOneRescaler <- function() {
 164   # the minimum value
 165   minval <- min(x)
 166   # let's "shift" our vector by subtracting the minimum value of x from each element
 167   shifted <- x - minval
 168
 169   # let's find the difference between max val and min val
 170   difference <- min(x) - max(x)
 171
 172   scaled <- shifted / difference
 173   scaled
 174 }
 175
 176 test_vector = c(1,2,3,4,5)
 177 # Should print c(0, 0.25, 0.5, 0.75, 1.00)
 178 zeroToOneRescaler(test_vector)
 179 ```
 180
 181 ```{r R_debug1-solution}
 182 zeroToOneRescaler <- function(x) {
 183   shifted <- x - min(x)
 184   difference = max(x) - min(x)
 185   return(shifted / difference)
 186 }
 187
 188 test_vector = c(1,2,3,4,5)
 189 # Should print c(0, 0.25, 0.5, 0.75, 1.00)
 190 zeroToOneRescaler(test_vector)
 191 ```
 192
 193 ```{r R_debug1-response}
 194 quiz(
 195   question("Were you able to solve the debugging question? (this question is for feedback purposes)",
 196     answer("Yes", message="Nice work!", correct = TRUE),
 197     answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team.")
 198   )
 199 )
 200 ```
 201
 202
 203 The following commented chunk has at least five (annoying) bugs. Can you uncomment the code, fix all the bugs, and get this chunk to run? These are drawn from real experiences from your TA!
 204 ```{r R_debug2, exercise=TRUE}
 205 # ps2 <- readcsv(file = url(
 206 #   " https://communitydata.science/~ads/teaching/2020/stats/data/week_04/group_03.csv"), row.names = NULL
 207 # )
 208 #
 209 # ps2$y[is.na(ps2$y)] <- 0
 210 # "ps2$'My First New Column' <- ps2$y * -1"
 211 # ps2$'My Second New Column" <- ps2$y + ps2$'My First New Column'
 212 #
 213 # summary(ps2$'My Second New Column']
 214 ```
 215
 216 ```{r R_debug2-solution}
 217 ps2 <- read.csv(file = url("https://communitydata.science/~ads/teaching/2020/stats/data/week_04/group_03.csv"), row.names = NULL)
 218 ps2$y[is.na(ps2$y)] <- 0
 219 ps2$'My First New Column' <- ps2$y * -1
 220 ps2$'My Second New Column' <- ps2$y + ps2$'My First New Column'
 221 summary(ps2$'My Second New Column')
 222 ```
 223
 224 ```{r R_debug2-response}
 225 quiz(
 226   question("Were you able to solve the above debugging question? (this question is for feedback purposes)",
 227            answer("Yes", message="Nice work!", correct = TRUE),
 228            answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team."),
 229            allow_retry = TRUE
 230   )
 231 )
 232 ```
 233
 234 ### Updating a visualization
 235 Imagine you've created a histogram to visualize some data from your research (below, we'll use R's built-in "PlantGrowth" dataset). You show your collaborator a histogram of this plot using default R, and they express some concerns about your plot's aesthetics. Replace the base-R histogram with a `ggplot2` histogram that also includes a density plot overlaid on it (maybe in a bright, contrasting color like red).
 236
 237 ```{r R_ggplot, exercise=TRUE}
 238 data("PlantGrowth")
 239 hist(PlantGrowth$weight)
 240 ```
 241
 242 ```{r R_ggplot-solution}
 243 library(ggplot2)
 244
 245 ggplot(PlantGrowth, aes(weight, after_stat(density))) + geom_histogram() + geom_density(color = "red")
 246 ```
 247
 248 Bonus: How would you find more information about the source of this dataset?
 249
 250 ```{r R_ggplot-response}
 251 quiz(
 252   question("Were you able to solve the above plotting question? (this question is for feedback purposes)",
 253            answer("Yes", message="Nice work!", correct = TRUE),
 254            answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team."),
 255            allow_retry = TRUE
 256   )
 257 )
 258 ```
 259
 260 ### Interpret a dataframe
 261 ```{r R_columns-setup, exercise=TRUE}
 262 data <- mtcars
 263 data$mpgGreaterThan20 <- data$mpg > 20
 264 data$gear <- as.factor(data$gear)
 265 data$mpgRounded <- round(data$mpg)
 266 ```
 267
 268 The below questions relate to the `data` data.frame defined above, which is a modified version of the classic `mtcars`.
 269
 270 For all answers, assume the above code chunks *has completely run*, i.e. assume all modifications described above were made.
 271 ```{r R_columns}
 272 quiz(
 273   question("Which of the following best describes the `mpg` variable?",
 274     answer("Numeric, continuous", correct=TRUE),
 275     answer("Numeric, discrete"),
 276     answer("Categorical, dichotomous"),
 277     answer("Categorical, ordinal"),
 278     answer("Categorical")
 279   ),
 280   question("Which of the following best describes the `mpgGreaterThan20` variable?",
 281     answer("Numeric, continuous"),
 282     answer("Numeric, discrete"),
 283     answer("Categorical, dichotomous", correct=TRUE),
 284     answer("Categorical, ordinal"),
 285     answer("Categorical")
 286   ),
 287   question("Which of the following best describes the `mpgRounded` variable?",
 288     answer("Numeric, continuous"),
 289     answer("Numeric, discrete", correct=TRUE),
 290     answer("Categorical, dichotomous"),
 291     answer("Categorical, ordinal"),
 292     answer("Categorical")
 293   ),
 294   question("Which of the following best describes the `gear` variable?",
 295     answer("Numeric, continuous"),
 296     answer("Numeric, discrete"),
 297     answer("Categorical, dichotomous"),
 298     answer("Categorical, ordinal", correct=TRUE),
 299     answer("Categorical")
 300   )
 301 )
 302 ```
 303
 304 ## Section 3, Statistics Concepts and Definitions
 305 The following is a series of short multiple choice questions. These questions focus on definitions, and should not require performing any computations or writing any code.
 306 ```{r StatsConcepts_lightninground}
 307 m1 <- ""
 308 wolf <- "Think of the 'Boy who cried wolf', with a null hypothesis that no wolf exists. First the boy claims the alternative hypothesis: there is a wolf. The villagers believe this, and reject the correct null hypothesis. Second, the villagers make an error by not believing the boy when he presents a correct alternative hypothesis."
 309
 310 quiz(
 311   question("A hypothesis is typically written in terms of a:",
 312     answer("p-value."),
 313     answer("population statistic.", correct = TRUE),
 314     answer("sample statistic.")
 315   ),
 316   question("A sampling distribution is:",
 317     answer("critical to report in your papers."),
 318     answer("theoretically helpful, but rarely available to researchers in practice.", correct = TRUE),
 319     answer("practically useful, but not relies on assumptions that are rarely met.")
 320   ),
 321   question("Z-scores tell us about a value in terms of:",
 322     answer("mean and standard deviation.", correct = TRUE),
 323     answer("sample size and sampling strategy."),
 324     answer("if an effect is causal or not.")
 325   ),
 326   question("A distribution that is right-skewed has a long tail to the:",
 327     answer("right.", correct = TRUE),
 328     answer("left.")
 329   ),
 330   question("A normal distribution can be characterized with only this many parameters:",
 331     answer("1."),
 332     answer("2.", correct = TRUE),
 333     answer("3.")
 334   ),
 335   question("When we calculate a standard error, we look to understand",
 336     answer("the spread of our observed data based on the spread of the population distribution."),
 337     answer("the spread of the population distribution based on the spread of our observed data.", correct = TRUE),
 338     answer("whether or not our result is causal.")
 339   ),
 340 #  question("When we calculate standard error, we calculate",
 341 #    answer("using a different formula for every type of variable."),
 342 #    answer("the sample standard error, which is an estimate of the population standard error.", correct = TRUE),
 343 #    answer("whether or not our result is causal.")
 344 #  ),
 345   question("P values tell us about",
 346     answer("the probability of observing the outcome."),
 347     answer("the world in which the null hypothesis is true.", correct = TRUE),
 348     answer("the world in which the null hypothesis is false."),
 349     answer("the probability that a difference is due to chance.")
 350 #    answer("the world in which our data describe a causal effect.")
 351   ),
 352   question("P values are",
 353     answer("a conditional probability.", correct = TRUE),
 354     answer("completely misleading."),
 355     answer("an indication of the strength of an association"),
 356     answer("most useful when our data has a normal distribution.")
 357   ),
 358   question("A type 1 error occurs when",
 359     answer("when we reject a correct null hypothesis (i.e. false positive).", correct = TRUE, message=wolf),
 360     answer("when we accept a correct null hypothesis", message=wolf),
 361     answer("when we accept an incorrect null hypothesis (i.e. false negative)", message=wolf)
 362   ),
 363   question("Before we assume independence of two random samples, it can be useful to check whether",
 364     answer("they are correlated."),
 365     answer("both samples include over 90% of the population."),
 366     answer("both samples include less than 10% of the population.", correct = TRUE)
 367   )
 368 )
 369 ```
 370
 371 ```{r StatsConcepts_sampling}
 372 quiz(
 373   question("A political scientist is interested in the effect of teaching style on standardized test performance
 374 She plans to use a sample of 30 classes evenly spread among the Communication, Computer Science, and Business to conduct her analysis. What type of sampling strategy should she use to ensure that
 375 classes are selected from each discipline equally? Assume a limited research budget.",
 376     answer("A simple random sample"),
 377     answer("A cluster random sample"),
 378     answer("A stratifed random sample", correct=TRUE),
 379     answer("A snowball sample")
 380   )
 381 )
 382 ```
 383
 384 ## Section 4: Distributions
 385 The following questions are in the style of pen-and-paper statistics class exam questions. This section includes three questions about distributions. These questions involve some minor calculations.
 386
 387 ### Percentiles and the Normal Distribution
 388 For the following question, you may want to use this "scratch paper" code chunk.
 389 ```{r Distributions_quartile-scratch, exercise=TRUE}
 390
 391 ```
 392
 393 ```{r Distributions_quartile}
 394 quiz(
 395   question("Heights of boys in a high school are approximately normally distributed with mean of 175 cm
 396 standard deviation of 5 cm. What value most likely corresponds to the first quartile of heights?",
 397     answer("25 cm"),
 398     answer("167.3 cm"),
 399     answer("171.7 cm", correct=TRUE),
 400     answer("173.5 cm"),
 401     answer("178.3 cm")
 402   )
 403 )
 404 ```
 405
 406
 407 ### Outliers and Skew
 408 Suppose we are reading a paper which reports the following about a column of a dataset:
 409
 410 Minimum value is 0.00125 and Maximum Value is 2.1100.
 411
 412 Mean is 0.41100 and median is 0.27800.
 413
 414 1st quartile is 0.13000 and 3rd quartile is 0.56200.
 415
 416 ```{r Distributions_summary-scratch, exercise=TRUE}
 417
 418 ```
 419
 420 ```{r Distributions_summary}
 421 m1 <- "Under R's default setting, outliers are values that are either greater than the upper bound $Q_3 + 1.5\\times IQR$ OR less than the lower bound $Q_1 - 1.5\\times IQR$. Here, $IQR = 0.562-0.130=0.432$. The upper bound $= 0.562 + 1.5\\times (0.432) = 1.21$. The lower bound is $0.13 - 1.5\\times (0.432) = -0.518$. We see that the maximum value is 2.11, greater than the upper bound. Thus, there is at least one outlier in this sample."
 422
 423 m2 <- "There is at least one outlier on the right, whereas there is none on the left. $|Q_3-Q_2| > |Q_2-Q_1|$, so the whisker for this box plot would be longer on the right-hand side. The mean is larger than the median."
 424 quiz(
 425   question("Are there outliers (in terms of IQR) in this sample?",
 426     answer("Yes", correct = TRUE, message=m1),
 427     answer("No", message=m1)
 428   ),
 429   question("Based on these summary statistics, we might expect the skew of the distribution to be:",
 430     answer("left-skewed", message=m2),
 431     answer("right-skewed", message=m2, correct=TRUE),
 432     answer("symmetric", message=m2)
 433   )
 434 )
 435 ```
 436
 437
 438 ## Sections 5, Computing Probabilities
 439 For each of the below questions, you will need to calculate some probabilities by hand.
 440 You may want to use this "scratch paper" code chunk (possibly in conjunction with actual paper).
 441
 442 ```{r Probabilities-scratch, exercise=TRUE}
 443
 444 ```
 445
 446 ```{r Probabilities_probs}
 447 m1 <- "$P(\\text{Coffee} \\cap \\text{No Milk}) = P(\\text{Coffee})\\cdot P(\\text{No Milk}) = 0.5 \\cdot (1-0.1)  = 0.45$"
 448
 449 m2 <- "Let H be the event of hypertension, M be event of being a male. We see here that $P(H) = 0.15$ whereas $P(H|M) = 0.18$. Since $P(H) \\neq P(H|M)$, then hypertension is not independent of sex."
 450
 451 m3 <- "$P(HIV \\cap HCV) = P(HIV|HCV)\\cdot P(HCV) = 0.1\\cdot 0.02 = 0.002$"
 452
 453 quiz(
 454   question("Suppose in a population, half prefer coffee to tea, and assume that 10 percent of the population prefers no milk in their coffee or tea. If coffee vs. tea preference and milk use are independent, what fraction of the population both prefers coffee and puts milk in their coffee?",
 455     answer("40%", message=m1),
 456     answer("45%", correct = TRUE, message=m1),
 457     answer("50%", message=m1),
 458     answer("55%", message=m1)
 459   ),
 460   question("In the general population, about 15 percent of adults between 25 and 40 years of age are hypertensive.  Suppose that among males of this age, hypertension occurs about 18 percent of the time. Is hypertension independent of sex? ",
 461     answer("Yes", message=m2),
 462     answer("No.", correct=TRUE, message=m2)
 463   ),
 464   question("Co-infection with HIV and hepatitis C (HCV) occurs when a patient has both diseases, and is on the rise in some countries. Assume that in a given country, only about 2% of the population has HCV,  but 25% of the population with HIV have HCV.  Assume as well that 10% of the population with HCV have HIV.  What is the probability that a randomly chosen member of the population has both HIV and HCV?",
 465     answer("0.001", message=m3),
 466     answer("0.01", message=m3),
 467     answer("0.002", correct=TRUE, message=m3),
 468     answer("0.02", message=m3)
 469   )
 470   #question("What might you search for (in Google, your notes, the OpenIntro PDF, etc.) to help with this question?",
 471   #  answer("t test"),
 472   #  answer("laws of probability", correct=TRUE),
 473   #  answer("linear regression"),
 474   #  answer("R debugging")
 475   #)
 476 )
 477 ```
 478
 479 ### Biostats Example
 480 This question is adapted from a biostats midterm exam.
 481 In the past (2015, to be specific), the US Preventive Services
 482 Task Force recommended that women under the age of 50 should
 483 not get routine mammogram screening for breast cancer.  The Task Force
 484 argued that for a woman with a positive mammogram (one suggesting the
 485 presence of breast cancer), the chance that she has breast cancer was
 486 too low to justify a surgical biopsy.
 487
 488 Suppose the data below describe a cohort of 100,000 women age 40 -
 489 49 in whom mammogram screening and breast cancer behaves just like the
 490 larger population.  For instance, in this table, the 3,333 women with
 491 breast cancer represent a rate of 1 in 30 women with undiagnosed
 492 cancer. The numbers in the table are realistic for US women in this
 493 age category.
 494
 495 |                    | Positive test result | Negative test result |
 496 |--------------------|---------------------:|---------------------:|
 497 | Have breast cancer |                3,296 |                   37 |
 498 | Do not have breast cancer |         8,313 |               88,354 |
 499
 500
 501 First, compute the "margins" of the above contingency table.
 502 * Row margins: How many total women have breast cancer? How many total women do not have breast cancer?
 503 * Column margins: How many total positive test? How many total negative tests?
 504 ```{r Probabilities_mammogram-chunk, exercise=TRUE}
 505
 506 ```
 507
 508 ```{r Probabilities_mammogram}
 509 m1 <- "
 510 $\\Pr(\\textrm{Test}^+ \\cap \\textrm{Cancer}) = 3,296$
 511
 512 $\\Pr(Cancer) = 3,333$
 513
 514 $\\Pr(\\textrm{Test}^+|\\textrm{Cancer}) =$ \
 515 $\\dfrac{\\Pr(\\textrm{Test}^+ \\cap \\textrm{Cancer})}{\\Pr(\\textrm{Cancer})} =$\
 516 $\\dfrac{3,296}{3,333} = 0.989$"
 517
 518 m2 <- "
 519 $Pr(\\textrm{Cancer}|\\textrm{Test}^+) =$
 520
 521 $\\dfrac{\\Pr(\\textrm{Cancer} \\cap \\textrm{Test}^+)}
 522      {\\Pr(\\textrm{Test}^+)}=$
 523
 524
 525  $\\dfrac{3,296}{11,609} = 0.284$"
 526
 527 quiz(
 528   question("Based on this data, what is the probability that a woman has a positive test given that a women has cancer?",
 529     answer("98.9%", correct = TRUE, message=m1),
 530     answer("99.9%",message=m1),
 531     answer("89.9%",message=m1),
 532     answer("88.9%",message=m1)
 533   ),
 534   question("Based on this data, what is the probability that a woman who has cancer receives a positive test?",
 535     answer("28.4%", correct = TRUE,message=m2),
 536     answer("10.3%",message=m2),
 537     answer("50.7%",message=m2),
 538     answer("97.9%",message=m2)
 539   ),
 540   question("Is the Task Force correct to claim that there is a low probability that a women between 40-49 who tests positive has breast cancer?",
 541     answer("Yes", correct=TRUE),
 542     answer("No")
 543   )
 544 )
 545 ```
 546
 547
 548
 549
 550
 551
 552 ## Answer Report
 553 Finally, let's generate a report that summarizes your answers to this evaluation.
 554
 555 Answers are written to a file that looks like this: `question_submission-{CURRENT TIME}.csv`.
 556
 557 Take note of this csv file: this is what you will submit to Canvas.
 558
 559 They're also saved in R Studio's global environment as a variable called `df`. Run the below code chunk to see what `df` looks like.
 560
 561 ```{r report1, exercise=TRUE}
 562 df
 563 ```
 564
 565
 566 To check your percentage of correct answers:
 567 ```{r report2, exercise=TRUE}
 568 mean(df$correct)
 569 ```
 570
 571 To check your percentage of correct answers by section:
 572
 573 ```{r report3, exercise=TRUE}
 574 df %>% group_by(category) %>% summarize(avg = mean(correct))
 575 ```