From: Benjamin Mako Hill <mako@atdot.cc>
Date: Tue, 7 Oct 2025 22:37:08 +0000 (-0700)
Subject: change assessment code for BSOC 2024
X-Git-Url: https://code.communitydata.science/coldcallbot-discord.git/commitdiff_plain/HEAD?hp=0d3bd05ff2f414fe0030f40f610561677afaffb2

change assessment code for BSOC 2024
---

diff --git a/README b/README
index da52a4d..733a2e8 100644
--- a/README
+++ b/README
@@ -1,14 +1,27 @@
 Setting up the Discord Bot
 ======================================
 
-I run the Discord boy from my laptop. It requires the discord Python
+I run the Discord bot from my laptop. It requires the discord Python
 module available in PyPi and installable like:
 
     $ pip3 install discord
 
-I don't have details on how I set up my own Discord bot and/or invited
-it to my server but I hope you'll add to this file as you do this and
-figure out what needs to happen.
+Setting up the Bot
+=====================================
+
+The documentation for the `discord` python package
+(https://discordpy.readthedocs.io/en/latest/discord.html) does a good
+job explaining how to set up a Discord bot with your server. Follow
+the steps there, with one important exception:
+
+1. On the "Bot" tab in the discord application configuration page you
+need to enable both "Privileged Gateway Intents."  This allows the bot
+to see who is present and active in the channel.
+
+Finally, you need to copy your bot'ss Token (also found on the "Bot" tab)
+into coldcallbot.py. Pass it as the argument to `ccb.run()`.
+
+
 
 Using the Cold Call Bot
 ======================================
@@ -31,7 +44,7 @@ Daily Process
 
 You need to start the bot from the laptop each day. I do that by:
 
-  $ ./coldcallboy.py
+  $ ./coldcallbot.py
 
 The bot will run in the terminal, print out data as it works including
 detailed weights as it goes, and it will record data into files in the
diff --git a/README_daily b/README_daily
new file mode 100644
index 0000000..ea666b9
--- /dev/null
+++ b/README_daily
@@ -0,0 +1,127 @@
+I keep my entire data directory in git and I'd recommend that you do
+too. Just make sure you don't commit and publish student records into
+the public git repository. I usually just keep a separate branch for
+classes.
+
+Daily Process
+================================
+
+1. Open your terminal (on Windows, this will likely be powershell in anaconda)
+
+2. Change into the directory with the coldcall scripts.
+
+3. Download new data with: `python download_student_info.py`
+
+   This will download the latest version of absence data into `data/optout_poll_data.tsv` as well as th student information into `data/student_information.tsv`.
+
+   If you noticed any changes you need to make (e.g., the same preferred names, incorrectly entered absences, etc) you should edit the Google sheets and then running the download again with the same script.
+
+4. When you're ready, fun the main script in the same directory: python coldcallbot-manual.py
+   
+   This will both:
+   
+   - output a paper list in terminal. I often redirect this to a file like: `python coldcallbot-manual.py > data/paper_call_list-2024-09-26.txt` or similar.
+   - Create the computed call list in the `data/` folder
+
+During case, I take notes on student answers on paper during class (typically I
+only note down non "GOOD" answers) and then add these to the sheet
+immediately after class.
+
+After class each day, you need to open up "call_list-YYYY-MM-DD.tsv"
+and edit the two columns in which you store the results of the
+case. The first columns `answered` means that the person responded and
+answered the question (i.e., they were present in the room but away
+from their computer and unresponsive). This is almost always TRUE but
+would be FALSE if the student were missing.
+
+The assessment column should be is "GOOD", "SATISFACTORY", "POOR", "NO
+MEANINGFUL ANSWER" or "ABSENT" but you can do whatever makes sense in
+this and we can work with it when it comes to grading. Just make sure
+you are consistent!
+
+Details on my rubric is here:
+
+https://wiki.communitydata.science/User:Benjamin_Mako_Hill/Assessment#Rubric_for_case_discussion_answers
+
+
+Assessment and Tracking
+======================================
+
+These scripts rely on a file in this repository called
+`data/student_information.csv` which I have set to be downloaded
+automatically from a Google form using the download script.
+
+I don't expect that these will necessary work without
+modification. It's a good idea to go line-by-line through these to
+make sure they are doing what *you* want and that you agree with the
+assessment logic built into this.
+
+For reference, that file has the following column labels (this is the
+full header, in order):
+
+    Timestamp
+    Your UW student number
+    Name you'd like to go by in class
+    Your Wikipedia username
+    Your username on the class Discord server
+    Preferred pronouns
+    Anything else you'd like me to know?
+
+The scripts in this directory are meant to be run or sourced *from*
+the data directory. As in:
+
+    $ cd ../data
+    $ R --no-save < ../assessment_and_tracking/track_participation.R
+
+There are three files in that directory:
+
+track_enrolled.R:
+
+    This file keeps track of who is in Discord, who is enrolled for
+    the class, etc. This helps me remove people from the
+    student_informaiton.csv spreadsheet who are have dropped the
+    class, deal with users who change their Discord name, and other
+    things that the scripts can't deal with automatically.
+
+    This all need to be dealt with manually, one way or
+    another. Sometimes by modifying the script, sometimes by modifying
+    the files in the data/ directory.
+
+    This requires an additional file called
+    `myuw-COM_482_A_autumn_2020_students.csv` which is just the saved
+    CSV from https://my.uw.edu which includes the full class list. I
+    download this one manually.
+
+track_participation.R:
+
+    This file generates histograms and other basic information about
+    the distribution of participation and absences. I've typically run
+    this weekly after a few weeks of the class and share these images
+    with students at least once or twice in the quarter.
+
+    This file is also sourced by compute_final_case_grades.R.
+
+compute_final_case_grades.R:
+
+    You can find a narrative summary of my assessment process here:
+
+    https://wiki.communitydata.science/User:Benjamin_Mako_Hill/Assessment#Overall_case_discussion_grade
+
+    This also requires the registration file (something like
+    `myuw-COM_482_A_autumn_2020_students.csv`) which is described
+    above.
+
+    To run this script, you will need to create the following subdirectories:
+
+    data/case_grades
+    data/case_grades/student_reports
+
+
+One final note: A bunch of things in these scripts assumes a UW 4.0
+grade scale. I don't think it should be hard to map these onto some
+other scale, but that's an exercise I'll leave up to those that want
+to do this.
+
+   
+ 5. after class, update the call list in the data folder to remove lines for any call that didn't happen (or you don't want to count) and update the assessments:
+ 
diff --git a/assessment_and_tracking/compute_final_case_grades.R b/assessment_and_tracking/compute_final_case_grades.R
index 60a60f3..e355052 100644
--- a/assessment_and_tracking/compute_final_case_grades.R
+++ b/assessment_and_tracking/compute_final_case_grades.R
@@ -1,67 +1,100 @@
 ## load in the data
 #################################
+myuw <- read.csv("../data/2024_autumn_COMMLD_570_A_joint_students.csv", stringsAsFactors=FALSE)
 
-case.sessions  <- 15
-myuw <- read.csv("myuw-COM_482_A_autumn_2020_students.csv", stringsAsFactors=FALSE)
-
-## class-level variables
-question.grades <- c("GOOD"=100, "FAIR"=100-(50/3.3), "BAD"=100-(50/(3.3)*2))
-missed.question.penalty <- (50/3.3) * 0.2 ## 1/5 of a full point on the GPA scale
-
+current.dir <- getwd()
 source("../assessment_and_tracking/track_participation.R")
-setwd("case_grades")
+setwd(current.dir)
+
+rownames(d) <- d$unique.name
+call.list$timestamp <- as.Date(call.list$timestamp)
 
-rownames(d) <- d$discord.name
+## class-level variables
+gpa.point.value <- 50/(4 - 0.7)
+## question.grades <- c("GOOD"=100, "FAIR"=100-gpa.point.value, "BAD"=100-(gpa.point.value*2))
+question.grades <- c("GOOD"=100, "SATISFACTORY"=100-gpa.point.value, "POOR"=100-(gpa.point.value*2), "NO MEANINGFUL ANSWER"=0)
+missed.question.penalty <- gpa.point.value * 0.2 ## 1/5 of a full point on the GPA scale
+
+## inspect set the absence threashold
+ggplot(d) + aes(x=absences) + geom_histogram(binwidth=1, fill="white",color="black")
+absence.threshold <- median(d$absences)
+
+## inspect and set the questions cutoff
+## questions.cutoff <- median(d$num.calls)
+## median(d$num.calls)
+## questions.cutoff <- nrow(call.list) / nrow(d) ## TODO talk about this
+## this is the 95% percentile based on simulation in simulation.R
+questions.cutoff <- 15
 
 ## show the distribution of assessments
-table(call.list.full$assessment)
-prop.table(table(call.list.full$assessment))
+table(call.list$assessment)
+prop.table(table(call.list$assessment))
+
 table(call.list.full$answered)
 prop.table(table(call.list.full$answered))
 
-total.questions.asked <- nrow(call.list.full)
+total.questions.asked <- nrow(call.list)
+
+## find out how man questions folks have present/absent for.
+## 
+## NOTE: this is currently only for informational purposes and is NOT
+## being used to compute grants in any way.
+########################################################################
+calls.per.day <- data.frame(day=as.Date(names(table(call.list$timestamp))),
+                            questions.asked=as.numeric(table(call.list$timestamp)))
+
+## function to return the numbers of calls present for or zero if they
+## were absent
+calls.for.student.day <- function (day, student.id) {
+    if (any(absence$unique.name == student.id & absence$date.absent == day)) {
+        return(0)
+    } else {
+        return(calls.per.day$questions.asked[calls.per.day$day == day])
+    }
+}
+
+compute.questions.present.for.student <- function (student.id) {
+    sum(unlist(lapply(unique(calls.per.day$day), calls.for.student.day, student.id)))
+}
 
 ## create new column with number of questions present
-d$prop.asked <- d$num.calls / d$num.present
+d$q.present <- unlist(lapply(d$unique.name, compute.questions.present.for.student))
+d$prop.asked <- d$num.calls / d$q.present
 
 ## generate statistics using these new variables
 prop.asks.quantiles <- quantile(d$prop.asked, probs=seq(0,1, 0.01))
 prop.asks.quantiles <- prop.asks.quantiles[!duplicated(prop.asks.quantiles)]
 
-## this is generating broken stuff but it's not used for anything
-d$prop.asked.quant <- cut(d$prop.asked, breaks=prop.asks.quantiles,
-    labels=names(prop.asks.quantiles)[1:(length(prop.asks.quantiles)-1)])
+d$prop.asked.quant <- cut(d$prop.asked, right=FALSE, breaks=c(prop.asks.quantiles, 1),
+    labels=names(prop.asks.quantiles)[1:(length(prop.asks.quantiles))])
 
 ## generate grades
-##########################################################
-
-d$part.grade <- NA
+########################################################################
 
 ## print the median number of questions for (a) everybody and (b)
 ## people that have been present 75% of the time
-median(d$num.calls[d$days.absent < 0.25*case.sessions])
 median(d$num.calls)
 
-questions.cutoff <- median(d$num.calls)
-
 ## helper function to generate average grade minus number of missing
-gen.part.grade <- function (x.discord.name) {
-    q.scores <- question.grades[call.list$assessment[call.list$discord.name == x.discord.name]]
+gen.part.grade <- function (x.unique.name) {
+    q.scores <- question.grades[call.list$assessment[call.list$unique.name == x.unique.name]]
+    print(q.scores)
     base.score <- mean(q.scores, na.rm=TRUE)
 
     ## number of missing days
-    missing.days <- nrow(missing.in.class[missing.in.class$discord.name == x.discord.name,])
+    missing.in.class.days <- nrow(missing.in.class[missing.in.class$unique.name == x.unique.name,])
 
     ## return the final score
-    data.frame(discord.name=x.discord.name,
-               part.grade=(base.score - missing.days * missed.question.penalty))
+    data.frame(unique.name=x.unique.name,
+               base.grade=base.score,
+               missing.in.class.days=missing.in.class.days)
 }
 
-tmp <- do.call("rbind", lapply(d$discord.name[d$num.calls >= questions.cutoff], gen.part.grade))
-
-d[as.character(tmp$discord.name), "part.grade"] <- tmp$part.grade
-
-## next handle the folks *under* the median
+## create the base grades which do NOT include missing questions
+tmp <- do.call("rbind", lapply(d$unique.name, gen.part.grade))
+d <- merge(d, tmp)
+rownames(d) <- d$unique.name
+d$part.grade <- d$base.grade
 
 ## first we handle the zeros
 ## step 1: first double check the people who have zeros and ensure that they didn't "just" get unlucky"
@@ -70,50 +103,46 @@ d[d$num.calls == 0,]
 ## set those people to 0 :(
 d$part.grade[d$num.calls == 0] <- 0
 
-## step 2 is to handle folks who got unlucky in the normal way
-tmp <- do.call("rbind", lapply(d$discord.name[is.na(d$part.grade) & d$prop.asked <= median(d$prop.asked)], gen.part.grade))
-d[as.character(tmp$discord.name), "part.grade"] <- tmp$part.grade
+## step 2: identify the people who were were not asked "enough"
+## questions but were unlucky/lucky
 
-## the people who are left are lucky and still undercounted so we'll penalize them
-d[is.na(d$part.grade),]
-penalized.discord.names <- d$discord.name[is.na(d$part.grade)]
+## first this just prints out are the people were were not called
+## simply because they got unlucky
+d[d$num.calls < questions.cutoff & d$absences < absence.threshold,]
 
-## generate the baseline participation grades as per the process above
-tmp <- do.call("rbind", lapply(penalized.discord.names, gen.part.grade))
-d[as.character(tmp$discord.name), "part.grade"] <- tmp$part.grade
+## these are the people were were not called simply unlucky (i.e.,
+## they were not in class very often)
+penalized.unique.names <- d$unique.name[d$num.calls < questions.cutoff & d$absences > absence.threshold]
+d[d$unique.name %in% penalized.unique.names,]
 
 ## now add "zeros" for every questions that is below the normal
-d[as.character(penalized.discord.names),"part.grade"] <- ((
-    (questions.cutoff - d[as.character(penalized.discord.names),"num.calls"] * 0) +
-    (d[as.character(penalized.discord.names),"num.calls"] * d[as.character(penalized.discord.names),"part.grade"]) )
+d[as.character(penalized.unique.names),"part.grade"] <- (
+    (d[as.character(penalized.unique.names),"num.calls"] * d[as.character(penalized.unique.names),"part.grade"])
     / questions.cutoff)
 
-d[as.character(penalized.discord.names),]
-
-## map part grades back to 4.0 letter scale and points
-d$part.4point <-round((d$part.grade / (50/3.3)) - 2.6, 2)
+d[as.character(penalized.unique.names),]
 
-d[sort.list(d$prop.asked), c("discord.name", "num.calls", "num.present",
-                             "prop.asked", "prop.asked.quant", "part.grade", "part.4point",
-                             "days.absent")]
+## apply the penality for number of days we called on them and they were gone
+d$part.grade <- d$part.grade - d$missing.in.class.days * missed.question.penalty
 
-d[sort.list(d$part.4point), c("discord.name", "num.calls", "num.present",
-                             "prop.asked", "prop.asked.quant", "part.grade", "part.4point",
-                             "days.absent")]
+## TODO ensure this is right. i think it is
+## map part grades back to 4.0 letter scale and points
+d$part.4point <- round((d$part.grade / gpa.point.value) - ((100 / gpa.point.value) - 4), 2)
 
+d[sort.list(d$part.4point, decreasing=TRUE),
+  c("unique.name", "short.name", "num.calls", "absences", "part.4point")]
 
-## writing out data
-quantile(d$num.calls, probs=(0:100*0.01))
+## writing out data to CSV
 d.print <- merge(d, myuw[,c("StudentNo", "FirstName", "LastName", "UWNetID")],
-           by.x="student.num", by.y="StudentNo")
-write.csv(d.print, file="final_participation_grades.csv")
+                 by.x="unique.name", by.y="StudentNo")
+write.csv(d.print, file="../data/final_participation_grades.csv")
 
 library(rmarkdown)
 
-for (x.discord.name in d$discord.name) {
-    render(input="../../assessment_and_tracking/student_report_template.Rmd",
+for (id in d$unique.name) {
+    render(input="student_report_template.Rmd",
            output_format="html_document",
-           output_file=paste("../data/case_grades/student_reports/",
-                             d.print$UWNetID[d.print$discord.name == x.discord.name],
+           output_file=paste("../data/case_grades/",
+                             d.print$unique.name[d.print$unique.name == id],
                              sep=""))
 }
diff --git a/assessment_and_tracking/simulation.R b/assessment_and_tracking/simulation.R
new file mode 100644
index 0000000..7134bef
--- /dev/null
+++ b/assessment_and_tracking/simulation.R
@@ -0,0 +1,24 @@
+weight.fac <- 2
+num.calls <- 373
+num.students <- 76
+
+gen.calls.per.students <- function (x) {
+    raw.weights <<- rep(1, num.students)
+    names(raw.weights) <- seq(1, num.students)
+
+    table(sapply(1:num.calls, function (i) {
+        probs <- raw.weights / sum(raw.weights)
+        selected <- sample(names(raw.weights), 1, prob=probs)
+        ## update the raw.weights
+        raw.weights[selected] <<- raw.weights[selected] / weight.fac
+                                        #print(raw.weights)
+        return(selected)
+    }))
+}
+
+
+simulated.call.list <- unlist(lapply(1:1000, gen.calls.per.students))
+hist(simulated.call.list)
+
+quantile(simulated.call.list, probs=seq(0,1,by=0.01))
+quantile(simulated.call.list, probs=0.05)
diff --git a/assessment_and_tracking/student_report_template.Rmd b/assessment_and_tracking/student_report_template.Rmd
index a0b2145..866b1e0 100644
--- a/assessment_and_tracking/student_report_template.Rmd
+++ b/assessment_and_tracking/student_report_template.Rmd
@@ -1,22 +1,19 @@
-**Student Name:** `r paste(d.print[d.print$discord.name == x.discord.name, c("FirstName", "LastName")])`
+**Student Name:** `r paste(d.print[d.print$unique.name == id, c("LastName", "FirstName")])` (`r id`)
 
-**Discord Name:** `r d.print[d.print$discord.name == x.discord.name, c("discord.name")]`
+**Participation grade:** `r d.print$part.4point[d.print$unique.name == id]`
 
-**Participation grade:** `r d.print$part.4point[d.print$discord.name == x.discord.name]`
+**Questions asked:** `r d.print[d$unique.name == id, "num.calls"]`
 
-**Questions asked:** `r d.print[d$discord.name == x.discord.name, "prev.questions"]`
+**Days Absent:** `r d.print[d.print$unique.name == id, "absences"]` / `r length(unique(as.Date(unique(call.list$timestamp))))`
 
-**Days Absent:** `r d.print[d.print$discord.name == x.discord.name, "days.absent"]` / `r case.sessions`
+**Missing in class days:** `r d.print[d$unique.name == id, "missing.in.class.days"]` (base grade lowered by 0.2 per day)
 
 **List of questions:**
 
 ```{r echo=FALSE}
-call.list[call.list$discord.name == x.discord.name,]
+call.list[call.list$unique.name == id,]
 ```
 
-**Luckiness:** `r d.print[d.print$discord.name == x.discord.name, "prop.asked.quant"]`
-
-If you a student has a luckiness over 50% that means that they were helped by the weighting of the system and/or got lucky. We did not penalize *any* students with a luckiness under 50% for absences.
 
 
 
diff --git a/assessment_and_tracking/track_enrolled.R b/assessment_and_tracking/track_enrolled.R
index 563384d..06e4aba 100644
--- a/assessment_and_tracking/track_enrolled.R
+++ b/assessment_and_tracking/track_enrolled.R
@@ -1,5 +1,5 @@
-myuw <- read.csv("myuw-COM_482_A_autumn_2020_students.csv")
-gs <- read.delim("student_information.tsv")
+myuw <- read.csv("data/2024_autumn_COM_481_A_students.csv")
+gs <- read.delim("data/student_information.tsv")
 
 ## these are students who dropped the class (should be empty)
 gs[!gs$Your.UW.student.number %in% myuw$StudentNo,]
@@ -7,17 +7,23 @@ gs[!gs$Your.UW.student.number %in% myuw$StudentNo,]
 ## these are students who are in the class but didn't reply to the form
 myuw[!myuw$StudentNo %in% gs$Your.UW.student.number,]
 
+roster.merged <- merge(myuw, gs, by.x="StudentNo", by.y="Your.UW.student.number", all.x=TRUE, all.y=FALSE)
+
+roster.merged[,c("StudentNo", "Email", "FirstName", "LastName", "Your.username.on.the.class.Discord.server",  "checked.off.on.discord")][!roster.merged$StudentNo %in% gs$Your.UW.student.number,]
+## these are students who are in the class but didn't reply to the form
+
+
 ## read all the folks who have been called and see who is missing from
 ## the google sheet
 
-call.list <- unlist(lapply(list.files(".", pattern="^attendance-.*tsv$"), function (x) {
-    d <- read.delim(x)
-    strsplit(d[[2]], ",")
-})
-)
-present <- unique(call.list)
-present[!present %in% gs[["Your.username.on.the.class.Discord.server"]]]
+## call.list <- unlist(lapply(list.files(".", pattern="^attendance-.*tsv$"), function (x) {
+##    d <- read.delim(x)
+##    strsplit(d[[2]], ",")
+## })
+## )
+## present <- unique(call.list)
+## present[!present %in% gs[["Your.username.on.the.class.Discord.server"]]]
 
 ## and never attended class..
-gs[["Your.username.on.the.class.Discord.server"]][!gs[["Your.username.on.the.class.Discord.server"]] %in% present]
+## gs[["Your.username.on.the.class.Discord.server"]][!gs[["Your.username.on.the.class.Discord.server"]] %in% present]
 
diff --git a/assessment_and_tracking/track_participation.R b/assessment_and_tracking/track_participation.R
index 9a51084..71d3256 100644
--- a/assessment_and_tracking/track_participation.R
+++ b/assessment_and_tracking/track_participation.R
@@ -1,109 +1,129 @@
-library(ggplot2)
+setwd("../data/")
+
 library(data.table)
 
-gs <- read.delim("student_information.tsv")
-d <- gs[,c(2,5)]
-colnames(d) <- c("student.num", "discord.name")
+################################################
+## LOAD call_list TSV data
+################################################
+
+call.list <- do.call("rbind", lapply(list.files(".", pattern="^call_list-.*tsv$"), function (x) {read.delim(x, stringsAsFactors=FALSE)[,1:5]}))
 
-call.list <- do.call("rbind", lapply(list.files(".", pattern="^call_list-.*tsv$"), function (x) {read.delim(x)[,1:4]}))
 colnames(call.list) <- gsub("_", ".", colnames(call.list))
+colnames(call.list)[1] <- "unique.name"
+colnames(call.list)[2] <- "preferred.name"
 
-call.list$day <- as.Date(call.list$timestamp)
+table(call.list$unique.name[call.list$answered])
 
 ## drop calls where the person wasn't present
 call.list.full <- call.list
 call.list[!call.list$answered,]
 call.list <- call.list[call.list$answered,]
 
-call.counts <- data.frame(table(call.list$discord.name))
-colnames(call.counts) <- c("discord.name", "num.calls")
+## show the distribution of assessments
+prop.table(table(call.list$assessment))
 
-d <- merge(d, call.counts, all.x=TRUE, all.y=TRUE, by="discord.name"); d
+call.counts <- data.frame(table(call.list$unique.name))
+colnames(call.counts) <- c("unique.name", "num.calls")
 
-## set anything that's missing to zero
-d$num.calls[is.na(d$num.calls)] <- 0
-      
-attendance <- unlist(lapply(list.files(".", pattern="^attendance-.*tsv$"), function (x) {d <- read.delim(x); strsplit(d[[2]], ",")}))
-
-file.to.attendance.list <- function (x) {
-    tmp <- read.delim(x)
-    d.out <- data.frame(discord.name=unlist(strsplit(tmp[[2]], ",")))
-    d.out$day <- rep(as.Date(tmp[[1]][1]), nrow(d.out))
-    return(d.out)
-}
+## create list of folks who are missing in class w/o reporting it
+absence.data.cols <- c("unique.name", "date.absent", "reported")
 
-attendance <- do.call("rbind",
-                      lapply(list.files(".", pattern="^attendance-.*tsv$"),
-                             file.to.attendance.list))
+missing.in.class <- call.list.full[!call.list.full$answered,
+                                   c("unique.name", "timestamp")]
+missing.in.class$date.absent <- as.Date(missing.in.class$timestamp)
+missing.in.class$reported <- rep(FALSE, nrow(missing.in.class))
+missing.in.class <- missing.in.class[,absence.data.cols]
+missing.in.class <- unique(missing.in.class)
 
-## create list of folks who are missing in class 
-missing.in.class  <- call.list.full[is.na(call.list.full$answered) |
-                                    (!is.na(call.list.full$answered) & !call.list.full$answered),
-                                    c("discord.name", "day")]
+################################################
+## LOAD absence data TSV data
+################################################
 
-missing.in.class <- unique(missing.in.class)
+absence.google <- read.delim("optout_poll_data.tsv")
+colnames(absence.google) <- c("timestamp", "unique.name", "date.absent")
+absence.google$date.absent <- as.Date(absence.google$date.absent, format="%m/%d/%Y")
+absence.google$reported <- TRUE
+absence.google <- absence.google[,absence.data.cols]
+absence.google <- unique(absence.google)
 
-setDT(attendance)
-setkey(attendance, discord.name, day)
-setDT(missing.in.class)
-setkey(missing.in.class, discord.name, day)
+## combine the two absence lists and then create a unique subset
+absence <- rbind(missing.in.class[,absence.data.cols],
+                 absence.google[,absence.data.cols])
 
-## drop presence for people on missing days
-attendance[missing.in.class,]
-attendance <- as.data.frame(attendance[!missing.in.class,])
+## these are people that show up in both lists (i.e., probably they
+## submitted too late but it's worth verifying before we penalize
+## them. i'd actually remove them from the absence sheet to suppress
+## this error
+absence[duplicated(absence[,1:2]),]
+absence <- absence[!duplicated(absence[,1:2]),]
 
-attendance.counts <- data.frame(table(attendance$discord.name))
-colnames(attendance.counts) <- c("discord.name", "num.present")
+## print total questions asked and absences
+absence.count <- data.frame(table(unique(absence[,c("unique.name", "date.absent")])[,"unique.name"]))
+colnames(absence.count) <- c("unique.name", "absences")
 
-d <- merge(d, attendance.counts,
-           all.x=TRUE, all.y=TRUE,
-           by="discord.name")
 
-days.list <- lapply(unique(attendance$day), function (day) {
-    day.total <- table(call.list.full$day == day)[["TRUE"]]
-    lapply(d$discord.name, function (discord.name) {
-        num.present <- nrow(attendance[attendance$day == day & attendance$discord.name == discord.name,])
-        if (num.present/day.total > 1) {print(day)}
-        data.frame(discord.name=discord.name,
-                   days.present=(num.present/day.total))
-    })
-})
+## load up the full class list
+gs <- read.delim("student_information.tsv")
+d <- gs[,c("Your.UW.student.number", "Name.you.d.like.to.go.by.in.class")]
+colnames(d) <- c("unique.name", "short.name")
 
-days.tmp <- do.call("rbind", lapply(days.list, function (x) do.call("rbind", x)))
+## merge in the call counts
+d <- merge(d, call.counts, all.x=TRUE, all.y=FALSE, by="unique.name")
+d <- merge(d, absence.count, by="unique.name", all.x=TRUE, all.y=FALSE)
 
-days.tbl <- tapply(days.tmp$days.present, days.tmp$discord.name, sum)
+d
 
-attendance.days <- data.frame(discord.name=names(days.tbl),
-                              days.present=days.tbl,
-                              days.absent=length(list.files(".", pattern="^attendance-.*tsv$"))-days.tbl)
+## set anything that's missing to zero
+d$num.calls[is.na(d$num.calls)] <- 0
+d$absences[is.na(d$absences)] <- 0
+
+################################################
+## list people who have been absent often or called on a lot
+################################################
+
+
+## list students sorted in terms of (a) absences and (b) prev questions
+d[sort.list(d$absences),]
+
+d[sort.list(d$num.calls, decreasing=TRUE),]
 
-d <- merge(d, attendance.days,
-           all.x=TRUE, all.y=TRUE, by="discord.name")
+################################################
+## build visualizations
+################################################
 
-d[sort.list(d$days.absent), c("discord.name", "num.calls", "days.absent")]
 
-## make some visualizations of whose here/not here
-#######################################################
+library(ggplot2)
+
+color.gradient <- scales::seq_gradient_pal("yellow", "magenta", "Lab")(seq(0,1,length.out=range(d$absences)[2]+1))
+
+table(d$num.calls, d$absences)
 
-png("questions_absence_histogram_combined.png", units="px", width=800, height=600)
+png("questions_absence_histogram_combined.png", units="px", width=600, height=400)
 
 ggplot(d) +
-    aes(x=as.factor(num.calls), fill=days.absent, group=days.absent) +
+    aes(x=as.factor(num.calls), fill=as.factor(absences)) +
     geom_bar(color="black") +
-    scale_x_discrete("Number of questions asked") +
+    stat_count() +
+    scale_x_discrete("Number of questions answered") +
     scale_y_continuous("Number of students") +
-    scale_fill_continuous("Days absent", low="red", high="blue")+
+    ##scale_fill_brewer("Absences", palette="Blues") +
+    scale_fill_manual("Opt-outs", values=color.gradient) +
     theme_bw()
 
 dev.off()
 
-png("questions_absenses_boxplots.png", units="px", width=800, height=600)
+absence.labeller <- function (df) {
+    lapply(df, function (x) { paste("Absences:", x) })
+}
 
-ggplot(data=d) +
-    aes(x=as.factor(num.calls), y=days.absent) +
-    geom_boxplot() +
-    scale_x_discrete("Number of questions asked") +
-    scale_y_continuous("Days absent")
+## png("questions_absence_histogram_facets.png", units="px", width=600, height=400)
 
-dev.off()
+## ggplot(d) +
+##     aes(x=as.factor(num.calls)) +
+##     geom_bar() +
+##     stat_count() +
+##     scale_x_discrete("Number of questions answered") +
+##     scale_y_continuous("Number of students") +
+##     theme_bw() +
+##     facet_wrap(.~absences, ncol=5, labeller="absence.labeller")
 
diff --git a/coldcall.py b/coldcall.py
index 1905844..3fb79d6 100644
--- a/coldcall.py
+++ b/coldcall.py
@@ -8,19 +8,30 @@ from csv import DictReader
 
 import os.path
 import re
-import discord
+import json
 
 class ColdCall():
-    def __init__ (self):
+    def __init__ (self, record_attendance=True):
+        with open("configuration.json") as config_file:
+            config = json.loads(config_file.read())
+
         self.today = str(datetime.date(datetime.now()))
+
         # how much less likely should it be that a student is called upon?
-        self.weight = 2 
+        self.weight = 2
+        self.record_attendance = record_attendance
 
         # filenames
-        self.__fn_studentinfo = "data/student_information.tsv"
-        self.__fn_daily_calllist = f"data/call_list-{self.today}.tsv"
-        self.__fn_daily_attendance = f"data/attendance-{self.today}.tsv"
+        self.__fn_studentinfo = config["student_info_file"]
+        self.__fn_daily_calllist = config["daily_calllist_file"].format(date=self.today)
+        self.__fn_daily_attendance = config["daily_attendance"].format(date=self.today)
 
+        self.unique_row = config["unique_name_rowname"]
+        if "preferred_name_rowname" in config:
+            self.preferred_row = config["preferred_name_rowname"]
+        else:
+            self.preferred_row = None
+        
     def __load_prev_questions(self):
         previous_questions = defaultdict(int)
 
@@ -29,25 +40,29 @@ class ColdCall():
                 with open(f"./data/{fn}", 'r') as f:
                     for row in DictReader(f, delimiter="\t"):
                         if not row["answered"] == "FALSE":
-                            previous_questions[row["discord_name"]] += 1
+                            previous_questions[row[self.unique_row]] += 1
 
         return previous_questions
-    
-    def __get_preferred_name(self, selected_student):
-        # translate the discord name into the preferred students name,
-        # if possible, otherwise return the discord name
+
+    def get_preferred_names(self):
+        # translate the unique name into the preferred students name,
+        # if possible, otherwise return the unique name
 
         preferred_names = {}
         with open(self.__fn_studentinfo, 'r') as f:
             for row in DictReader(f, delimiter="\t"):
-                preferred_names[row["Your username on the class Discord server"]] = row["Name you'd like to go by in class"]
+                preferred_names[row[self.unique_row]] = row[self.preferred_row]
 
+        return(preferred_names)
+        
+    def __get_preferred_name(self, selected_student):
+        preferred_names = self.get_preferred_names()
         if selected_student in preferred_names:
             return preferred_names[selected_student]
         else:
             return None
 
-    def __select_student_from_list (self, students_present):
+    def select_student_from_list(self, students_present):
         prev_questions = self.__load_prev_questions()
         
         # created a weighted list by starting out with everybody 1
@@ -59,10 +74,10 @@ class ColdCall():
                 weights[s] = weights[s] / self.weight
 
         # choose one student from the weighted list
-        print(weights)
+        # print(weights) # DEBUG LINE
         return choices(list(weights.keys()), weights=list(weights.values()), k=1)[0]
 
-    def __record_attendance(self, students_present):
+    def record_attendance(self, students_present):
         # if it's the first one of the day, write it out
         if not os.path.exists(self.__fn_daily_attendance):
             with open(self.__fn_daily_attendance, "w") as f:
@@ -74,23 +89,28 @@ class ColdCall():
                              ",".join(students_present)]),
                   file=f)
 
-    def __record_coldcall(self, selected_student):
+    def record_coldcall(self, selected_student):
         # if it's the first one of the day, write it out
         if not os.path.exists(self.__fn_daily_calllist):
             with open(self.__fn_daily_calllist, "w") as f:
-                print("\t".join(["discord_name", "timestamp", "answered", "assessment"]), file=f)
+                print("\t".join([self.unique_row, self.preferred_row, "answered", "assessment", "timestamp"]), file=f)
+
+        preferred_name = self.__get_preferred_name(selected_student)
+        if preferred_name == None:
+            preferred_name = ""
 
         # open for appending the student
         with open(self.__fn_daily_calllist, "a") as f:
-            print("\t".join([selected_student, str(datetime.now()),
-                             "MISSING", "MISSING"]), file=f)
+            print("\t".join([selected_student, preferred_name,
+                             "MISSING", "MISSING", str(datetime.now())]), file=f)
 
     def coldcall(self, students_present):
-        selected_student = self.__select_student_from_list(students_present)
+        selected_student = self.select_student_from_list(students_present)
 
         # record the called-upon student in the right place
-        self.__record_attendance(students_present)
-        self.__record_coldcall(selected_student)
+        if self.record_attendance:
+            self.record_attendance(students_present)
+        self.record_coldcall(selected_student)
 
         preferred_name = self.__get_preferred_name(selected_student)
         if preferred_name:
@@ -99,13 +119,3 @@ class ColdCall():
             coldcall_message = f"@{selected_student}, you're up!"
         return coldcall_message
 
-# cc = ColdCall()
- 
-# test_student_list = ["jordan", "Kristen Larrick", "Madison Heisterman", "Maria.Au20", "Laura (Alia) Levi", "Leona Aklipi", "anne", "emmaaitelli", "ashleylee", "allie_partridge", "Tiana_Cole", "Hamin", "Ella Qu", "Shizuka", "Ben Baird", "Kim Do", "Isaacm24", "Sam Bell", "Courtneylg"]
-# print(cc.coldcall(test_student_list))
-
-# test_student_list = ["jordan", "Kristen Larrick", "Mako"]
-# print(cc.coldcall(test_student_list))
-
-# test_student_list = ["jordan", "Kristen Larrick"]
-# print(cc.coldcall(test_student_list))
diff --git a/coldcallbot-manual.py b/coldcallbot-manual.py
new file mode 100755
index 0000000..985ada0
--- /dev/null
+++ b/coldcallbot-manual.py
@@ -0,0 +1,94 @@
+#!/usr/bin/env python3
+
+from coldcall import ColdCall
+from datetime import datetime
+from csv import DictReader
+from random import sample
+import json
+import argparse
+
+parser = argparse.ArgumentParser(description='run the coldcall bot manually to create a coldcall list')
+
+parser.add_argument('-n', '--num', dest="num_calls", default=100, const=100, type=int, nargs='?',
+                    help="how many students should be called")
+
+parser.add_argument('-s', '--shuffle', dest="shuffle_roster", action="store_true",
+                    help="select without replacement (i.e., call each person once with n equal to the group size)")
+
+args = parser.parse_args()
+
+current_time = datetime.today()
+with open("configuration.json") as config_file:
+    config = json.loads(config_file.read())
+
+## create the coldcall object
+cc = ColdCall(record_attendance=False)
+
+def get_missing(d=current_time):
+    date_string = f'{d.month}/{d.day}/{d.year}'
+    with open(config["optout_file"], 'r') as f:
+        for row in DictReader(f, delimiter="\t"):
+            if row["Date of class session you will be absent"] == date_string:
+                yield(row[config["unique_name_rowname"]])
+
+full_names = {}
+registered_students = []
+with open(config["roster_file"], 'r') as f:
+    for row in DictReader(f, delimiter=","):
+        student_no = row["StudentNo"].strip()
+        registered_students.append(student_no)
+        full_names[student_no] = f"{row[config['roster_firstname_rowname']]} {row[config['roster_lastname_rowname']]}"
+# print("Registered:", registered_students) # useful for debug
+
+# get pronouns
+with open(config["student_info_file"], 'r') as f:
+    preferred_pronouns = {}
+    for row in DictReader(f, delimiter="\t"):
+        preferred_pronouns[row[config["unique_name_rowname"]]] = row["Preferred pronouns"]
+# print(preferred_pronouns)
+
+missing_today = [x for x in get_missing(current_time)]
+# print("Missing Today: ", missing_today)  # useful for debug
+
+preferred_names = cc.get_preferred_names()
+# print("Preferred names:", preferred_names)  # useful for debug
+
+students_present = [s for s in registered_students if s not in missing_today]
+# print("Students present:", students_present)  # useful for debug
+
+def print_selected(selected_student):
+    if "print_index" in globals():
+        global print_index
+    else:
+        global print_index
+        print_index = 1
+
+    try:
+        preferred_name = preferred_names[selected_student]
+    except KeyError:
+        preferred_name = "[unknown preferred name]"
+
+    if selected_student in preferred_pronouns:
+        pronouns = preferred_pronouns[selected_student]
+    else:
+        pronouns = "[unknown pronouns]"
+
+    print(f"{print_index}. {preferred_name} :: {pronouns} :: {full_names[selected_student]} :: {selected_student}")
+
+    cc.record_coldcall(selected_student)
+    print_index += 1 ## increase the index
+
+# if we're in suffle mode
+shuffle = args.shuffle_roster
+
+print_index = 1
+
+if shuffle:
+    for selected_student in sample(students_present, len(students_present)):
+        print_selected(selected_student)
+else:
+    num_calls = args.num_calls
+
+    for i in range(num_calls):
+        selected_student = cc.select_student_from_list(students_present)
+        print_selected(selected_student)
diff --git a/configuration.json b/configuration.json
new file mode 100644
index 0000000..4ea79ab
--- /dev/null
+++ b/configuration.json
@@ -0,0 +1,16 @@
+{ 
+    "roster_file" : "data/FIXME.csv",
+    "roster_unique_rowname" : "StudentNo",
+    "roster_firstname_rowname" : "FirstName",
+    "roster_lastname_rowname" : "LastName",
+    "student_info_file" : "data/student_information.tsv",
+    "student_info_gsheet_id" : "FIXME",
+    "student_info_gsheet_gid" : 99999999,
+    "optout_file" : "data/optout_poll_data.tsv",
+    "optout_gsheet_id" : "FIXME",
+    "optout_gsheet_gid" : 99999999,
+    "daily_calllist_file" : "data/call_list-{date}.tsv",
+    "daily_attendance" : "data/attendance-{date}.tsv",
+    "unique_name_rowname" : "Your UW student number", 
+    "preferred_name_rowname" : "Name you'd like to go by in class"
+}
diff --git a/download_student_info.py b/download_student_info.py
new file mode 100755
index 0000000..8a448b9
--- /dev/null
+++ b/download_student_info.py
@@ -0,0 +1,15 @@
+#!/usr/bin/env python3
+
+import json
+import subprocess
+
+with open("configuration.json", 'r') as config_file:
+    config = json.loads(config_file.read())
+
+base_url = 'https://docs.google.com/spreadsheets/d/{id}/export?gid={gid}&format=tsv'
+
+student_info_url = base_url.format(id=config["student_info_gsheet_id"], gid=config["student_info_gsheet_gid"])
+subprocess.run(["wget", student_info_url, "-O", config["student_info_file"]], check=True)
+
+optout_url = base_url.format(id=config["optout_gsheet_id"], gid=config["optout_gsheet_gid"])
+subprocess.run(["wget", optout_url, "-O", config["optout_file"]], check=True)