covid19.git
7 months agoMerge pull request #20 from makoshark/master master gh-cdsc/master
Benjamin Mako Hill [Tue, 7 Apr 2020 21:45:05 +0000 (14:45 -0700)]
Merge pull request #20 from makoshark/master

bug fix: change to the correct working directory before running cron jobs

7 months agoMerge pull request #18 from CommunityDataScienceCollective/dsaez_submodule
Benjamin Mako Hill [Tue, 7 Apr 2020 21:43:26 +0000 (14:43 -0700)]
Merge pull request #18 from CommunityDataScienceCollective/dsaez_submodule

Add dsaez's submodule for crawling wikidata

7 months agoupdated script to ensure the correct working dir
Benjamin Mako Hill [Tue, 7 Apr 2020 21:39:58 +0000 (16:39 -0500)]
updated script to ensure the correct working dir

7 months agomade cronjobs executable
Benjamin Mako Hill [Sat, 4 Apr 2020 16:19:43 +0000 (11:19 -0500)]
made cronjobs executable

7 months agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O... dsaez_submodule
Nathan TeBlunthuis [Fri, 3 Apr 2020 22:34:16 +0000 (15:34 -0700)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into dsaez_submodule

7 months agoAdd dsaez's submodule for crawling wikidata
Nathan TeBlunthuis [Fri, 3 Apr 2020 22:24:56 +0000 (15:24 -0700)]
Add dsaez's submodule for crawling wikidata

7 months agoMerge pull request #17 from makoshark/master
Kaylea Champion [Thu, 2 Apr 2020 21:18:05 +0000 (14:18 -0700)]
Merge pull request #17 from makoshark/master

changes to support historical view data

7 months agoMerge pull request #16 from aaronshaw/master
Kaylea Champion [Thu, 2 Apr 2020 21:16:57 +0000 (14:16 -0700)]
Merge pull request #16 from aaronshaw/master

renames files, adds an example analysis w small sample of revisions data

7 months agorevisions to reflect updated example filename and clean comments in R code
aaronshaw [Thu, 2 Apr 2020 18:38:32 +0000 (13:38 -0500)]
revisions to reflect updated example filename and clean comments in R code

7 months agochanges to allow historical view data collection
Benjamin Mako Hill [Thu, 2 Apr 2020 18:28:34 +0000 (13:28 -0500)]
changes to allow historical view data collection

- fix bug where it would fail if the first essay had no view data
- add ability to override dates in the cron script

7 months agoupdated to just write a single log file for each day
Benjamin Mako Hill [Thu, 2 Apr 2020 17:48:19 +0000 (12:48 -0500)]
updated to just write a single log file for each day

7 months agoinitial commit of revisions analysis example with output files
aaronshaw [Thu, 2 Apr 2020 15:59:44 +0000 (10:59 -0500)]
initial commit of revisions analysis example with output files

7 months agoremoving outdated file names
aaronshaw [Thu, 2 Apr 2020 12:49:08 +0000 (07:49 -0500)]
removing outdated file names

8 months agoMerge pull request #15 from aaronshaw/master
Aaron Shaw [Thu, 2 Apr 2020 00:15:21 +0000 (19:15 -0500)]
Merge pull request #15 from aaronshaw/master

renaming example analysis directories

8 months agorenaming example analysis directories
aaronshaw [Thu, 2 Apr 2020 00:12:45 +0000 (19:12 -0500)]
renaming example analysis directories

8 months agoMerge pull request #12 from makoshark/master
groceryheist [Wed, 1 Apr 2020 23:36:56 +0000 (16:36 -0700)]
Merge pull request #12 from makoshark/master

substantial changes to wikipedia fetching code

8 months agoignore __pycache__
Benjamin Mako Hill [Wed, 1 Apr 2020 23:23:50 +0000 (18:23 -0500)]
ignore __pycache__

8 months agofix bug in previous commit
Benjamin Mako Hill [Wed, 1 Apr 2020 23:22:36 +0000 (18:22 -0500)]
fix bug in previous commit

forgot to import digobs module in the scraper script

8 months agocleaned up unnecessary files
Benjamin Mako Hill [Wed, 1 Apr 2020 23:21:41 +0000 (18:21 -0500)]
cleaned up unnecessary files

8 months agouse the type= feature in argparse
Benjamin Mako Hill [Wed, 1 Apr 2020 23:13:02 +0000 (18:13 -0500)]
use the type= feature in argparse

- integrated the type= feature in argparse in all three scripts
- removed some redundant code from the third file

8 months agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Benjamin Mako Hill [Wed, 1 Apr 2020 22:19:33 +0000 (17:19 -0500)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

8 months agochanges in response to code review by nate
Benjamin Mako Hill [Wed, 1 Apr 2020 22:16:34 +0000 (17:16 -0500)]
changes in response to code review by nate

- moved some common functions into files
- other smaller changes

8 months agoMerge pull request #14 from aaronshaw/aaronshaw-master
Aaron Shaw [Wed, 1 Apr 2020 21:58:02 +0000 (16:58 -0500)]
Merge pull request #14 from aaronshaw/aaronshaw-master

pointing at updated data url, adding explicit NA handling to factor, …

8 months agopointing at updated data url, adding explicit NA handling to factor, cutting unnecess...
aaronshaw [Wed, 1 Apr 2020 21:52:22 +0000 (16:52 -0500)]
pointing at updated data url, adding explicit NA handling to factor, cutting unnecessary call to ggplot2, and updated corresponding output from new data file. May not work while kibo urls are getting resolved

8 months agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Benjamin Mako Hill [Wed, 1 Apr 2020 21:42:16 +0000 (16:42 -0500)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

8 months agotweaks to revision export code
Benjamin Mako Hill [Wed, 1 Apr 2020 21:39:53 +0000 (16:39 -0500)]
tweaks to revision export code

- flags were not being exported (e.g., minor, anon)
- broke with hidden/deleted user names

8 months agofix bug in rev scraper script
Benjamin Mako Hill [Wed, 1 Apr 2020 20:49:28 +0000 (15:49 -0500)]
fix bug in rev scraper script

Bug was a break, added for debugging, that caused the script to only
work for the first article.

8 months agochange copy to move in cron scripts
Benjamin Mako Hill [Wed, 1 Apr 2020 20:49:02 +0000 (15:49 -0500)]
change copy to move in cron scripts

8 months agoMerge branch 'master' of github.com:makoshark/COVID-19_Digital_Observatory
Benjamin Mako Hill [Wed, 1 Apr 2020 20:18:50 +0000 (15:18 -0500)]
Merge branch 'master' of github.com:makoshark/COVID-19_Digital_Observatory

8 months agoadd two small shellscripts for automation
Benjamin Mako Hill [Wed, 1 Apr 2020 20:15:11 +0000 (15:15 -0500)]
add two small shellscripts for automation

- Added two bash scripts usable as cronjobs to automate the production
  of revisions and view data.

These commands automate the process of running code and copying material

8 months agoaddress confusion with date
Benjamin Mako Hill [Wed, 1 Apr 2020 20:14:05 +0000 (15:14 -0500)]
address confusion with date

The timestamps in files should be the day that the exports are done. For
the view data, the query date needs to be the day before but this
shouldn't be the timestamp we use in files, etc.

8 months agofix bugs with the date stamps
Benjamin Mako Hill [Wed, 1 Apr 2020 15:47:33 +0000 (10:47 -0500)]
fix bugs with the date stamps

8 months agoMerge pull request #11 from jdfoote/master
Aaron Shaw [Wed, 1 Apr 2020 15:41:02 +0000 (10:41 -0500)]
Merge pull request #11 from jdfoote/master

Adding a tidyverse example (with very verbose comments)

8 months agofixed typo in debug message
Benjamin Mako Hill [Wed, 1 Apr 2020 15:18:05 +0000 (08:18 -0700)]
fixed typo in debug message

8 months agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Benjamin Mako Hill [Wed, 1 Apr 2020 14:53:40 +0000 (07:53 -0700)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

8 months agoadded gitignore for wikipedia/data directory
Benjamin Mako Hill [Wed, 1 Apr 2020 14:52:15 +0000 (07:52 -0700)]
added gitignore for wikipedia/data directory

8 months agorenamed the wikipedia_views module to wikipedia
Benjamin Mako Hill [Wed, 1 Apr 2020 14:51:20 +0000 (07:51 -0700)]
renamed the wikipedia_views module to wikipedia

8 months agoadded initial version of revision-scraper
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:38 +0000 (07:42 -0700)]
added initial version of revision-scraper

Borrows much of the structure from the (patched) version of the
dailyview scraper.

8 months agofixed typo in description of view scraper
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:24 +0000 (07:42 -0700)]
fixed typo in description of view scraper

8 months agorenamed daily views to make it clear that it's just enwiki
Benjamin Mako Hill [Wed, 1 Apr 2020 14:29:01 +0000 (07:29 -0700)]
renamed daily views to make it clear that it's just enwiki

8 months agochanges to a bunch of the wikipedia view code
Benjamin Mako Hill [Wed, 1 Apr 2020 14:15:12 +0000 (07:15 -0700)]
changes to a bunch of the wikipedia view code

- Renamed the articles.txt to something more specific

Changes to both scripts:

- Updated filenames to match the new standard
- Reworked the logging code so that it can write to stderr by
  default. Because we can only call logging.basicConfig() once, this
  eneded up being a bigger changes.
- Caused scripts to output git commits and export to track which code
  produced which dataset.
- Caused programs to take files instead of directories as
  output (allows us to run programs more than once a day).

Changes to the wikipedia_views/scripts/fetch_daily_views.py:

- Change output that it outputs a sequence of JSON dictionaries (one
  per line) as per the standard we agreed to and which is what
  Twitter, Github, and other dumps do. Previous behavior was to create
  output a single JSON list object.
- A number of other small changes and tweaks throughout.

8 months agoAdding a tidyverse example (with very verbose comments)
Jeremy Foote [Wed, 1 Apr 2020 02:42:31 +0000 (22:42 -0400)]
Adding a tidyverse example (with very verbose comments)

8 months agoadd examples using the translations data
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:56:59 +0000 (16:56 -0700)]
add examples using the translations data

8 months agoadd documentation for the output files
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:22:30 +0000 (16:22 -0700)]
add documentation for the output files

8 months agocreate 'latest.csv' to link to the most recent output.
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:16:36 +0000 (16:16 -0700)]
create 'latest.csv' to link to the most recent output.

8 months agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:01:43 +0000 (16:01 -0700)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

8 months agoupdate output
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:01:38 +0000 (16:01 -0700)]
update output

8 months agouse 'item' instead of 'entity'
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:30:08 +0000 (15:30 -0700)]
use 'item' instead of 'entity'

8 months agorename compile script
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:27:39 +0000 (15:27 -0700)]
rename compile script

8 months agoupdate compile script
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:27:21 +0000 (15:27 -0700)]
update compile script

8 months agoImprove README.md for keywords
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:25:51 +0000 (15:25 -0700)]
Improve README.md for keywords

8 months agorename 'transliterations' to 'keywords'
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:15:01 +0000 (15:15 -0700)]
rename 'transliterations' to 'keywords'

8 months agoUpdate README.md
Aaron Shaw [Tue, 31 Mar 2020 22:09:58 +0000 (17:09 -0500)]
Update README.md

linking to project pages more fully

8 months agoMerge pull request #10 from makoshark/master
Kaylea Champion [Tue, 31 Mar 2020 19:23:36 +0000 (12:23 -0700)]
Merge pull request #10 from makoshark/master

stop writing writing header to one-column list

8 months agostop writing writing header to one-column list
Benjamin Mako Hill [Tue, 31 Mar 2020 15:35:23 +0000 (08:35 -0700)]
stop writing writing header to one-column list

This feels like it's asking for trouble. Description of the contents
of the list is in the filename.

8 months agoreorganize file structure
Nathan TeBlunthuis [Mon, 30 Mar 2020 04:49:57 +0000 (21:49 -0700)]
reorganize file structure

- move 'input' files to resources
- outputs not meant for downstream go in output/intermediate
- csv outputs for downstream go in output/csv

8 months agomigrating to new directory structure
Kaylea Champion [Sun, 29 Mar 2020 18:42:01 +0000 (13:42 -0500)]
migrating to new directory structure

8 months agoMerge pull request #7 from kayleachampion/master
Kaylea Champion [Sun, 29 Mar 2020 18:39:32 +0000 (11:39 -0700)]
Merge pull request #7 from kayleachampion/master

cleanup with merge

8 months agoall march data
Kaylea Champion [Sun, 29 Mar 2020 07:19:54 +0000 (00:19 -0700)]
all march data

8 months agoadding a logs dir without adding my log files, assuming those don't
Kaylea Champion [Sun, 29 Mar 2020 06:50:04 +0000 (23:50 -0700)]
adding a logs dir without adding my log files, assuming those don't
belong in repo

8 months agonew version of this from scrape. no double quotes around articles any
Kaylea Champion [Sun, 29 Mar 2020 06:47:55 +0000 (23:47 -0700)]
new version of this from scrape. no double quotes around articles any
more

8 months agoadds a scraper to update the articles file
Kaylea Champion [Sun, 29 Mar 2020 06:46:48 +0000 (23:46 -0700)]
adds a scraper to update the articles file

8 months agoadds in new logging capability
Kaylea Champion [Sun, 29 Mar 2020 01:46:35 +0000 (18:46 -0700)]
adds in new logging capability

8 months agoMerge pull request #9 from aaronshaw/master
Aaron Shaw [Sun, 29 Mar 2020 01:42:40 +0000 (20:42 -0500)]
Merge pull request #9 from aaronshaw/master

minimal analysis example with pageview data

8 months agominimal analysis example with pageview data
aaronshaw [Sun, 29 Mar 2020 01:33:23 +0000 (20:33 -0500)]
minimal analysis example with pageview data

8 months agoMerge pull request #8 from aaronshaw/master
Aaron Shaw [Sat, 28 Mar 2020 22:38:20 +0000 (17:38 -0500)]
Merge pull request #8 from aaronshaw/master

Update to load data from github url and include 3/28 data in output

8 months agoregenerated following update to R src that creates this file
aaronshaw [Sat, 28 Mar 2020 22:31:36 +0000 (17:31 -0500)]
regenerated following update to R src that creates this file

8 months agoLoading data directly from github URL. Commenting out commands that assume cloned...
aaronshaw [Sat, 28 Mar 2020 22:30:37 +0000 (17:30 -0500)]
Loading data directly from github URL. Commenting out commands that assume cloned repository.

8 months agoMerge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID...
Kaylea Champion [Sat, 28 Mar 2020 21:46:00 +0000 (14:46 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory

8 months agoMerge pull request #5 from kayleachampion/master
Kaylea Champion [Sat, 28 Mar 2020 21:17:21 +0000 (14:17 -0700)]
Merge pull request #5 from kayleachampion/master

view data

8 months agoMerge pull request #1 from CommunityDataScienceCollective/kaylea/master
Kaylea Champion [Sat, 28 Mar 2020 21:15:53 +0000 (14:15 -0700)]
Merge pull request #1 from CommunityDataScienceCollective/kaylea/master

Some suggested changes.

8 months agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di... kaylea/master
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:13:46 +0000 (14:13 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

8 months agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di...
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

8 months agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di...
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

8 months agoRead the whole input file before making api calls
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls

8 months agoRead the whole input file before making api calls
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls

8 months agoMerge pull request #4 from CommunityDataScienceCollective/translations
groceryheist [Sat, 28 Mar 2020 21:07:04 +0000 (14:07 -0700)]
Merge pull request #4 from CommunityDataScienceCollective/translations

Transliterations: Use data from google trends and wikidata to find transliterations.

8 months agoUpdate transliteration results for 2020-03-28 translations
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:03:16 +0000 (14:03 -0700)]
Update transliteration results for 2020-03-28

- renamed results from yesterday into time stamped file

8 months agoRead entire input files before making api calls.
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:55:52 +0000 (13:55 -0700)]
Read entire input files before making api calls.

This is nicer style to not hold onto resources for as long.
It will use a bit more memory.

8 months agoKeep better track of time.
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:49:19 +0000 (13:49 -0700)]
Keep better track of time.

- Add timestamp ot transliterations output file.
- Append wikidata search terms instead of overwriting

8 months agoMerge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID...
Kaylea Champion [Sat, 28 Mar 2020 19:21:37 +0000 (12:21 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory

updates my branch with all the master changes so far

8 months agotrialing new approach
Kaylea Champion [Sat, 28 Mar 2020 19:18:01 +0000 (12:18 -0700)]
trialing new approach

8 months agotrialing new approach
Kaylea Champion [Sat, 28 Mar 2020 19:17:45 +0000 (12:17 -0700)]
trialing new approach

8 months agotypo fix
Nathan TeBlunthuis [Sat, 28 Mar 2020 17:01:43 +0000 (10:01 -0700)]
typo fix

8 months agoMerge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Dig...
Nathan TeBlunthuis [Sat, 28 Mar 2020 16:58:43 +0000 (09:58 -0700)]
Merge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into translations

8 months agoMerge pull request #6 from aaronshaw/translations
Aaron Shaw [Sat, 28 Mar 2020 15:28:41 +0000 (10:28 -0500)]
Merge pull request #6 from aaronshaw/translations

minimal example in R

8 months agoa minimal example in R that outputs a table of top 5 related search terms per day...
aaronshaw [Sat, 28 Mar 2020 15:18:33 +0000 (10:18 -0500)]
a minimal example in R that outputs a table of top 5 related search terms per day per query

8 months agoA few suggestions for the python script:
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:27:02 +0000 (20:27 -0700)]
A few suggestions for the python script:

- using format strings (f-strings) is a nice way in python to build
strings using variables.
- you can read and process a file in one pass if you iterate over the
open file itself instead of reading it into a variable and then
looping
- i had to change your strip code when i stopped using csv reader
- my python linter and auto-formater hate non-indendent comments
- i added a few lines to print cases where we don't get Ok responses.

8 months agoReorganize wikipedia views subproject into subpackage.
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:13:11 +0000 (20:13 -0700)]
Reorganize wikipedia views subproject into subpackage.

8 months agoadd mwapi to requirements
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:05:07 +0000 (20:05 -0700)]
add mwapi to requirements

8 months agoall data
Kaylea Champion [Sat, 28 Mar 2020 01:24:19 +0000 (18:24 -0700)]
all data

8 months agocleaning out commented code
Kaylea Champion [Sat, 28 Mar 2020 01:19:22 +0000 (18:19 -0700)]
cleaning out commented code

8 months agoreorganizes comments
Kaylea Champion [Sat, 28 Mar 2020 01:17:39 +0000 (18:17 -0700)]
reorganizes comments

8 months agoinitial files
Kaylea Champion [Sat, 28 Mar 2020 01:10:13 +0000 (18:10 -0700)]
initial files

8 months agomakes TSV
Kaylea Champion [Sat, 28 Mar 2020 01:08:43 +0000 (18:08 -0700)]
makes TSV
makes JSON

8 months agomany bug fixes
Kaylea Champion [Sat, 28 Mar 2020 00:24:18 +0000 (17:24 -0700)]
many bug fixes

8 months agoadd output files from tranliteration search using google trends
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:53:03 +0000 (16:53 -0700)]
add output files from tranliteration search using google trends

8 months agoexpand wikidata search to get keywords from google trends
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:52:19 +0000 (16:52 -0700)]
expand wikidata search to get keywords from google trends

8 months agofor testing
Kaylea Champion [Fri, 27 Mar 2020 23:00:36 +0000 (16:00 -0700)]
for testing

8 months agoMerge pull request #3 from kayleachampion/master
Benjamin Mako Hill [Fri, 27 Mar 2020 21:47:15 +0000 (14:47 -0700)]
Merge pull request #3 from kayleachampion/master

adding in an article list

Community Data Science Collective || Want to submit a patch?