X-Git-Url: https://code.communitydata.science/stats_class_2020.git/blobdiff_plain/68275101f1d127a83e2c6fd076c9929c6d3f4dd4..c0a584cb21431f64a48de4cf3043e8e07b63ec7d:/psets/pset3-worked_solution.html?ds=sidebyside
diff --git a/psets/pset3-worked_solution.html b/psets/pset3-worked_solution.html
index 0970f3e..0d9a589 100644
--- a/psets/pset3-worked_solution.html
+++ b/psets/pset3-worked_solution.html
@@ -2181,6 +2181,8 @@ sapply(groups, gen_subgroup_prop)
Again, many possible things worth mentioning here, so Iâll provide a few that stand out to me.
- The generalizability of analysis focused on one state during one 6 year period is limited.
+- Working with a random \(1\%\) sample of the full dataset means that our results here could diverge from those we would find in an analysis of the full population of traffic stops in unpredictable ways. That said, even the very small sample is quite big and once youâve read OpenIntro §5 youâll have some tools to estimate standard errors and confidence intervals around the various results from this analysis.
+
- The data seem very prone to measurement errors of various kinds. In particular, I suspect the race/ethnicity classifications provided by officers are subject to some biases that are hard to identify and might also shift over time/region. The prevalence of missing values during the first two years of the dataset illustrate one aspect of this and may impact estimates of raw counts and proportions.
- While the comparisons across racial/ethnic groups and between the traffic stops/searches and baseline population proportions illustrates a number of suggestive patterns, conclusive interpretation or attribution of those patterns to any specific cause or causes is quite difficult in the absence of additional information or assumptions. For one example, see my comments regarding statistical independence and the possible explanations in SQ2 above.