X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/19a89ad915d31db03613061e2fadfbcb9a13fd3c..c13bce81edb9862158e4da1becb092faa0c1acb1:/assessment/interactive_assessment.rmd?ds=sidebyside diff --git a/assessment/interactive_assessment.rmd b/assessment/interactive_assessment.rmd index 57c1691..a1e3d8c 100644 --- a/assessment/interactive_assessment.rmd +++ b/assessment/interactive_assessment.rmd @@ -18,8 +18,6 @@ code_filename <- paste("code_", t, ".csv", sep="") #df <- data.frame(label=c('test'), question=c('asd'), answer=c('asd'), correct=c(TRUE), stringsAsFactors=FALSE) df <- data.frame() -print('*') -print(df) tutorial_event_recorder <- function(tutorial_id, tutorial_version, user_id, event, data) { @@ -27,8 +25,7 @@ tutorial_event_recorder <- function(tutorial_id, tutorial_version, user_id, if (event == "question_submission"){ # nick exasperatedly believes this is the correct way to index the result of strsplit... [[1]][[1]] data$category <- strsplit(data$label, '_')[[1]][[1]] - #print(data) - + df <<- rbind(df, data, stringsAsFactors=FALSE) #write.table(data, question_filename, append=TRUE, sep=",", row.names=TRUE, col.names=FALSE) write.table(df, question_filename, append=FALSE, sep=",", row.names=TRUE, col.names=TRUE) @@ -47,13 +44,24 @@ options(tutorial.event_recorder = tutorial_event_recorder) ## Overview -TODO add a short description. State the number of questions that will be asked. Include expectations about time commitment. Explain the idea of the solution report +This is document contains R Markdown code for an *Interactive Self Assessment*. By completing this assessment, both students and the teaching can check in on learning progress. + +The Self Assessment is broken into six sections, described below. You can navigate throughout the document using the left-hand column. In general, completely this assessment should take about 60 minutes. + +* Overview: you are here. +* Section 1, Warmup Exercises. Contains warm-ups to help you become familiar with the interactive `learnr` environment (learnr is the R package that this assessment relies on). 1 coding question, 2 multiple choice questions. Time estimate: 5 min. +* Section 2, Debugging and Reading R Code. Contains a series of questions that will require you to work with existing R code. 3 coding questions, 4 multiple choice questions. Time estimate: 15 minutes. +* Section 3, Statistics Concepts and Definitions. 12 multiple choice questions about statistics concepts and definitions. Time estimate: 15 minutes. +* Section 4, Distributions. 3 multiples choice questions that involve some minor calculations. Time estimate: 5 minutes. +* Section 5, Computing Probabilities. 6 multiple choice questions that involve calulating probabilities of events. These calculations are more involved than Section 4. Time estimate: 15 minutes. +* Section 6, Helpful Formulas. Contains some helpful formulas that may be useful for Sections 3-5. +* Section 7, Answer Report. Time estimate: 5 minutes. Here, you can use R code (some of which is prepopulated for you) to analyze (or visualize) your performance on the assessment. This provides a way for you to (1) practice exploratory analyses R with data you created yourself (by answering questions) and (2) get immediate feedback about your performance. Note that you can clear **all** your answers to *all* questions by clicking "Start Over" in the left-hand sidebar, but doing that basically erases all progress in the document and your answers to any questions will be deleted. *Use with caution* (if at all)! -## Section 1: Warmup exercises +## Section 1, Warm-up Exercises -TODO add a short description of this section. +This section contains quick warm-up questions, so you can become familiar with how `learnr` works and what to expect from this activity. ### Code Chunk Warm-up @@ -65,7 +73,7 @@ If you click "Run Code", you should see the answer below the chunk. That answer You can clear your answers by clicking "Start Over" in the top-left of the chunk. You can also clear **all** your answers by clicking "Start Over" in the left-hand sidebar, but doing that basically erases all progress in the document *Use with caution!* -```{r warmup_1, exercise=TRUE, exercise.lines=10} +```{r WarmUp_1, exercise=TRUE, exercise.lines=10} add <- function() { } @@ -74,7 +82,7 @@ y = 5678 add(x,y) ``` -```{r warmup_1-solution} +```{r WarmUp_1-solution} add <- function(value1, value2) { return(value1 + value2) } @@ -86,22 +94,26 @@ add(x,y) ``` ### Multiple Choice Question Warmup -The question below shows how the multiple choice answering and "feedback" works. -```{r warmup_2} +The question below shows how the multiple choice answering and feedback works. +```{r WarmUp_2} quiz( question("Select the answer choice that will return `TRUE` in R.", answer("1 == 1", message="Good work! Feedback appears here.", correct=TRUE), - answer("1 == 0", message="Not quite! Feedback appears here.") + answer("1 == 0", message="Not quite! Feedback appears here."), + allow_retry = TRUE ) ) ``` + +## Section 2: Writing and Debugging R Code + ### Debugging a Function Below, you'll see code to define a function that is *supposed* to perform a transformation on a vector. The problem is that it doesn't work right now. -In theory, the function will take a numeric vector as input (let's call it $x$) and scale the values so they lie between zero and one. [^1] The way it *should* do this is by first subtracting the minimum value of $x$ from each element of $x$. Then, the function will divide each element by the difference between the maximum value of $x$ and the minimum value of $x$. +In theory, the function will take a numeric vector as input (let's call it $x$) and scale the values so they lie between zero and one. (This is sometimes called min-max [feature scaling](https://en.wikipedia.org/wiki/Feature_scaling), and is sometimes used for machine learning.) -[^1]: This is sometimes called min-max [feature scaling](https://en.wikipedia.org/wiki/Feature_scaling), and is sometimes used for machine learning. +The way it *should* do this is by first subtracting the minimum value of $x$ from each element of $x$. Then, the function will divide each element by the difference between the maximum value of $x$ and the minimum value of $x$. As written now, however, the function does not work! There are at least three issues you will need to fix to get it working. Once you fix them, you should be able to confirm that your function works with the pre-populated example (with the correct output provided). You might also be able to make this code more "elegant" (or alternatively, improve the comments and variable names as you see fit). @@ -122,8 +134,8 @@ zeroToOneRescaler <- function() { } test_vector = c(1,2,3,4,5) -zeroToOneRescaler(test_vector) # Should print c(0, 0.25, 0.5, 0.75, 1.00) +zeroToOneRescaler(test_vector) ``` ```{r R_debug1-solution} @@ -134,15 +146,15 @@ zeroToOneRescaler <- function(x) { } test_vector = c(1,2,3,4,5) -zeroToOneRescaler(test_vector) # Should print c(0, 0.25, 0.5, 0.75, 1.00) +zeroToOneRescaler(test_vector) ``` ```{r R_debug1-response} quiz( question("Were you able to solve the debugging question? (this question is for feedback purposes)", answer("Yes", message="Nice work!", correct = TRUE), - answer("No", message="") + answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team.") ) ) ``` @@ -173,7 +185,8 @@ summary(ps2$'My Second New Column') quiz( question("Were you able to solve the above debugging question? (this question is for feedback purposes)", answer("Yes", message="Nice work!", correct = TRUE), - answer("No", message="") + answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team."), + allow_retry = TRUE ) ) ``` @@ -194,6 +207,15 @@ ggplot(PlantGrowth, aes(weight, after_stat(density))) + geom_histogram() + geom_ Bonus: How would you find more information about the source of this dataset? +```{r R_ggplot-response} +quiz( + question("Were you able to solve the above plotting question? (this question is for feedback purposes)", + answer("Yes", message="Nice work!", correct = TRUE), + answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team."), + allow_retry = TRUE + ) +) +``` ### Interpret a dataframe ```{r R_columns-setup, exercise=TRUE} @@ -239,9 +261,9 @@ quiz( ) ``` -## Section 2 +## Section 3, Statistics Concepts and Definitions The following is a series of short multiple choice questions. These questions focus on definitions, and should not require performing any computations or writing any code. -```{r Stats_lightninground} +```{r StatsConcepts_lightninground} m1 <- "" wolf <- "Think of the 'Boy who cried wolf', with a null hypothesis that no wolf exists. First the boy claims the alternative hypothesis: there is a wolf. The villagers believe this, and reject the correct null hypothesis. Second, the villagers make an error by not believing the boy when he presents a correct alternative hypothesis." @@ -261,16 +283,16 @@ quiz( answer("if an effect is causal or not.") ), question("A distribution that is right-skewed has a long tail to the:", - answer("right", correct = TRUE), - answer("left") + answer("right.", correct = TRUE), + answer("left.") ), question("A normal distribution can be characterized with only this many parameters:", - answer("1"), - answer("2", correct = TRUE), - answer("3") + answer("1."), + answer("2.", correct = TRUE), + answer("3.") ), question("When we calculate standard error, we calculate", - answer("using a different formula for every type of variable."), + answer("it using a different formula for every type of variable."), answer("the sample standard error, which is an estimate of the population standard error.", correct = TRUE), answer("whether or not our result is causal.") ), @@ -282,7 +304,7 @@ quiz( question("P values tell us about", answer("the world in which our null hypothesis is true.", correct = TRUE), answer("the world in which our null hypothesis is false."), - answer("the world in which our data describe a causal effect") + answer("the world in which our data describe a causal effect.") ), question("P values are", answer("a conditional probability.", correct = TRUE), @@ -301,20 +323,11 @@ quiz( ) ``` -## Section 3 - -### About this Section - -The following questions are in the style of pen-and-paper statistics class exam questions. There a few sections that you may want or need to run some R code; there are a variety of empty "scratch paper" code chunks for this purpose. Note that this document contains a section with helpful formulas, which you can navigate to via the leftmost column. - -### Sampling - -```{r Stats_sampling} +```{r StatsConcepts_sampling} quiz( - question("A political scientist is interested in the effect of government type on economic development. -She wants to use a sample of 30 countries evenly represented among the Americas, Europe, -Asia, and Africa to conduct her analysis. What type of study should she use to ensure that -countries are selected from each region of the world? Assume a limitied research budget.", + question("A political scientist is interested in the effect of teaching style type on standardized test performance +She wants to use a sample of 30 classes evenly represented among the Communication, Computer Science, and Business to conduct her analysis. What type of study should she use to ensure that +classes are selected from each region of the world? Assume a limited research budget.", answer("Observational - simple random sample"), answer("Observational - cluster"), answer("Observational - stratifed", correct=TRUE), @@ -323,12 +336,16 @@ countries are selected from each region of the world? Assume a limitied research ) ``` +## Section 4: Distributions +The following questions are in the style of pen-and-paper statistics class exam questions. This section includes three questions about distributions. These questions involve some minor calculations. + +### Percentiles and the Normal Distribution For the following question, you may want to use this "scratch paper" code chunk. -```{r Stats_quartile-scratch, exercise=TRUE} +```{r Distributions_quartile-scratch, exercise=TRUE} ``` -```{r Stats_quartile} +```{r Distributions_quartile} quiz( question("Heights of boys in a high school are approximately normally distributed with mean of 175 cm standard deviation of 5 cm. What is the first quartile of heights?", @@ -351,14 +368,18 @@ Mean is 0.41100 and median is 0.27800. 1st quartile is 0.13000 and 3rd quartile is 0.56200. -```{r Stats_summary} +```{r Distributions_summary-scratch, exercise=TRUE} + +``` + +```{r Distributions_summary} m1 <- "Under R's default setting, outliers are values that are either greater than the upper bound $Q_3 + 1.5\\times IQR$ OR less than the lower bound $Q_1 - 1.5\\times IQR$. Here, $IQR = 0.562-0.130=0.432$. The upper bound $= 0.562 + 1.5\\times (0.432) = 1.21$. The lower bound is $0.13 - 1.5\\times (0.432) = -0.518$. We see that the maximum value is 2.11, greater than the upper bound. Thus, there is at least one outlier in this sample." m2 <- "There is at least one outlier on the right, whereas there is none on the left. $|Q_3-Q_2| > |Q_2-Q_1|$, so the whisker for this box plot would be longer on the right-hand side. The mean is larger than the median." quiz( question("Are there outliers (in terms of IQR) in this sample?", answer("Yes", correct = TRUE, message=m1), - answer("No", message="asd") + answer("No", message=m1) ), question("Based on these summary statistics, we might expect the skew of the distribution to be:", answer("left-skewed", message=m2), @@ -369,15 +390,15 @@ quiz( ``` -### Computing Probabilities +## Sections 5, Computing Probabilities For each of the below questions, you will need to calculate some probabilities by hand. You may want to use this "scratch paper" code chunk (possibly in conjunction with actual paper). -```{r Stats_probs-scratch, exercise=TRUE} +```{r Probabilities-scratch, exercise=TRUE} ``` -```{r Stats_probs} +```{r Probabilities_probs} m1 <- "$P(\\text{Coffee} \\cap \\text{No Milk}) = P(\\text{Coffee})\\cdot P(\\text{No Milk}) = 0.5 \\cdot (1-0.1) = 0.45$" m2 <- "Let H be the event of hypertension, M be event of being a male. We see here that $P(H) = 0.15$ whereas $P(H|M) = 0.18$. Since $P(H) \\neq P(H|M)$, then hypertension is not independent of sex." @@ -395,28 +416,22 @@ quiz( answer("Yes", message=m2), answer("No.", correct=TRUE, message=m2) ), - question("What might you search for (in Google, your notes, the OpenIntro PDF, etc.) to help with this question?", - answer("t test"), - answer("laws of probability", correct=TRUE), - answer("linear regression"), - answer("R debugging") - ), question("Co-infection with HIV and hepatitis C (HCV) occurs when a patient has both diseases, and is on the rise in some countries. Assume that in a given country, only about 2% of the population has HCV, but 25% of the population with HIV have HCV. Assume as well that 10% of the population with HCV have HIV. What is the probability that a randomly chosen member of the population has both HIV and HCV?", answer("0.001", message=m3), answer("0.01", message=m3), answer("0.002", correct=TRUE, message=m3), answer("0.02", message=m3) - ), - question("What might you search for (in Google, your notes, the OpenIntro PDF, etc.) to help with this question?", - answer("t test"), - answer("laws of probability", correct=TRUE), - answer("linear regression"), - answer("R debugging") ) + #question("What might you search for (in Google, your notes, the OpenIntro PDF, etc.) to help with this question?", + # answer("t test"), + # answer("laws of probability", correct=TRUE), + # answer("linear regression"), + # answer("R debugging") + #) ) ``` -### Calculating Probabilities: A Biostats Example +### Biostats Example This question is adapted from a biostats midterm exam. In the past (2015, to be specific), the US Preventive Services Task Force recommended that women under the age of 50 should @@ -439,11 +454,11 @@ Does not Have Breast Cancer: 8,313 Positive Test Results and 88,354 negative tes First, compute the "margins" of the above contingency table. Row margins: How many total women have breast cancer? How many total women do not have breast cancer? Column margins: How many total positive test? How many total negative tests? -```{r Stats_mammogram-chunk, exercise=TRUE} +```{r Probabilities_mammogram-chunk, exercise=TRUE} ``` -```{r Stats_mammogram} +```{r Probabilities_mammogram} m1 <- " $\\Pr(\\textrm{Test}^+ \\cap \\textrm{Cancer}) = 3,296$ @@ -521,16 +536,24 @@ $Q_1 - 1.5 \times IQR, \quad Q_3 + 1.5 \times IQR$ ## Answer Report Finally, let's generate a report that summarizes your answers to this evaluation. -Answers are written to a file that looks like this: `question_submission-{CURRENT TIME}.csv`. We can actually quickly analyze them. +Answers are written to a file that looks like this: `question_submission-{CURRENT TIME}.csv`. + +Take note of this csv file: this is what you will submit to Canvas. + +They're also saved in R Studio's global environment as a variable called `df`. Run the below code chunk to see what `df` looks like. ```{r report1, exercise=TRUE} df ``` + +To check your percentage of correct answers: ```{r report2, exercise=TRUE} mean(df$correct) ``` +To check your percentage of correct answers by section: + ```{r report3, exercise=TRUE} df %>% group_by(category) %>% summarize(avg = mean(correct)) ```