covid19.git
2 years agofixed typo in debug message
Benjamin Mako Hill [Wed, 1 Apr 2020 15:18:05 +0000 (08:18 -0700)]
fixed typo in debug message

2 years agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Benjamin Mako Hill [Wed, 1 Apr 2020 14:53:40 +0000 (07:53 -0700)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

2 years agoadded gitignore for wikipedia/data directory
Benjamin Mako Hill [Wed, 1 Apr 2020 14:52:15 +0000 (07:52 -0700)]
added gitignore for wikipedia/data directory

2 years agorenamed the wikipedia_views module to wikipedia
Benjamin Mako Hill [Wed, 1 Apr 2020 14:51:20 +0000 (07:51 -0700)]
renamed the wikipedia_views module to wikipedia

2 years agoadded initial version of revision-scraper
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:38 +0000 (07:42 -0700)]
added initial version of revision-scraper

Borrows much of the structure from the (patched) version of the
dailyview scraper.

2 years agofixed typo in description of view scraper
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:24 +0000 (07:42 -0700)]
fixed typo in description of view scraper

2 years agorenamed daily views to make it clear that it's just enwiki
Benjamin Mako Hill [Wed, 1 Apr 2020 14:29:01 +0000 (07:29 -0700)]
renamed daily views to make it clear that it's just enwiki

2 years agochanges to a bunch of the wikipedia view code
Benjamin Mako Hill [Wed, 1 Apr 2020 14:15:12 +0000 (07:15 -0700)]
changes to a bunch of the wikipedia view code

- Renamed the articles.txt to something more specific

Changes to both scripts:

- Updated filenames to match the new standard
- Reworked the logging code so that it can write to stderr by
  default. Because we can only call logging.basicConfig() once, this
  eneded up being a bigger changes.
- Caused scripts to output git commits and export to track which code
  produced which dataset.
- Caused programs to take files instead of directories as
  output (allows us to run programs more than once a day).

Changes to the wikipedia_views/scripts/fetch_daily_views.py:

- Change output that it outputs a sequence of JSON dictionaries (one
  per line) as per the standard we agreed to and which is what
  Twitter, Github, and other dumps do. Previous behavior was to create
  output a single JSON list object.
- A number of other small changes and tweaks throughout.

2 years agoadd examples using the translations data
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:56:59 +0000 (16:56 -0700)]
add examples using the translations data

2 years agoadd documentation for the output files
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:22:30 +0000 (16:22 -0700)]
add documentation for the output files

2 years agocreate 'latest.csv' to link to the most recent output.
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:16:36 +0000 (16:16 -0700)]
create 'latest.csv' to link to the most recent output.

2 years agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:01:43 +0000 (16:01 -0700)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

2 years agoupdate output
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:01:38 +0000 (16:01 -0700)]
update output

2 years agouse 'item' instead of 'entity'
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:30:08 +0000 (15:30 -0700)]
use 'item' instead of 'entity'

2 years agorename compile script
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:27:39 +0000 (15:27 -0700)]
rename compile script

2 years agoupdate compile script
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:27:21 +0000 (15:27 -0700)]
update compile script

2 years agoImprove README.md for keywords
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:25:51 +0000 (15:25 -0700)]
Improve README.md for keywords

2 years agorename 'transliterations' to 'keywords'
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:15:01 +0000 (15:15 -0700)]
rename 'transliterations' to 'keywords'

2 years agoUpdate README.md
Aaron Shaw [Tue, 31 Mar 2020 22:09:58 +0000 (17:09 -0500)]
Update README.md

linking to project pages more fully

2 years agoMerge pull request #10 from makoshark/master
Kaylea Champion [Tue, 31 Mar 2020 19:23:36 +0000 (12:23 -0700)]
Merge pull request #10 from makoshark/master

stop writing writing header to one-column list

2 years agostop writing writing header to one-column list
Benjamin Mako Hill [Tue, 31 Mar 2020 15:35:23 +0000 (08:35 -0700)]
stop writing writing header to one-column list

This feels like it's asking for trouble. Description of the contents
of the list is in the filename.

2 years agoreorganize file structure
Nathan TeBlunthuis [Mon, 30 Mar 2020 04:49:57 +0000 (21:49 -0700)]
reorganize file structure

- move 'input' files to resources
- outputs not meant for downstream go in output/intermediate
- csv outputs for downstream go in output/csv

2 years agomigrating to new directory structure
Kaylea Champion [Sun, 29 Mar 2020 18:42:01 +0000 (13:42 -0500)]
migrating to new directory structure

2 years agoMerge pull request #7 from kayleachampion/master
Kaylea Champion [Sun, 29 Mar 2020 18:39:32 +0000 (11:39 -0700)]
Merge pull request #7 from kayleachampion/master

cleanup with merge

2 years agoall march data
Kaylea Champion [Sun, 29 Mar 2020 07:19:54 +0000 (00:19 -0700)]
all march data

2 years agoadding a logs dir without adding my log files, assuming those don't
Kaylea Champion [Sun, 29 Mar 2020 06:50:04 +0000 (23:50 -0700)]
adding a logs dir without adding my log files, assuming those don't
belong in repo

2 years agonew version of this from scrape. no double quotes around articles any
Kaylea Champion [Sun, 29 Mar 2020 06:47:55 +0000 (23:47 -0700)]
new version of this from scrape. no double quotes around articles any
more

2 years agoadds a scraper to update the articles file
Kaylea Champion [Sun, 29 Mar 2020 06:46:48 +0000 (23:46 -0700)]
adds a scraper to update the articles file

2 years agoadds in new logging capability
Kaylea Champion [Sun, 29 Mar 2020 01:46:35 +0000 (18:46 -0700)]
adds in new logging capability

2 years agoMerge pull request #9 from aaronshaw/master
Aaron Shaw [Sun, 29 Mar 2020 01:42:40 +0000 (20:42 -0500)]
Merge pull request #9 from aaronshaw/master

minimal analysis example with pageview data

2 years agominimal analysis example with pageview data
aaronshaw [Sun, 29 Mar 2020 01:33:23 +0000 (20:33 -0500)]
minimal analysis example with pageview data

2 years agoMerge pull request #8 from aaronshaw/master
Aaron Shaw [Sat, 28 Mar 2020 22:38:20 +0000 (17:38 -0500)]
Merge pull request #8 from aaronshaw/master

Update to load data from github url and include 3/28 data in output

2 years agoregenerated following update to R src that creates this file
aaronshaw [Sat, 28 Mar 2020 22:31:36 +0000 (17:31 -0500)]
regenerated following update to R src that creates this file

2 years agoLoading data directly from github URL. Commenting out commands that assume cloned...
aaronshaw [Sat, 28 Mar 2020 22:30:37 +0000 (17:30 -0500)]
Loading data directly from github URL. Commenting out commands that assume cloned repository.

2 years agoMerge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID...
Kaylea Champion [Sat, 28 Mar 2020 21:46:00 +0000 (14:46 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory

2 years agoMerge pull request #5 from kayleachampion/master
Kaylea Champion [Sat, 28 Mar 2020 21:17:21 +0000 (14:17 -0700)]
Merge pull request #5 from kayleachampion/master

view data

2 years agoMerge pull request #1 from CommunityDataScienceCollective/kaylea/master
Kaylea Champion [Sat, 28 Mar 2020 21:15:53 +0000 (14:15 -0700)]
Merge pull request #1 from CommunityDataScienceCollective/kaylea/master

Some suggested changes.

2 years agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di... kaylea/master
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:13:46 +0000 (14:13 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

2 years agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di...
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

2 years agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di...
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

2 years agoRead the whole input file before making api calls
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls

2 years agoRead the whole input file before making api calls
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls

2 years agoMerge pull request #4 from CommunityDataScienceCollective/translations
groceryheist [Sat, 28 Mar 2020 21:07:04 +0000 (14:07 -0700)]
Merge pull request #4 from CommunityDataScienceCollective/translations

Transliterations: Use data from google trends and wikidata to find transliterations.

2 years agoUpdate transliteration results for 2020-03-28 translations
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:03:16 +0000 (14:03 -0700)]
Update transliteration results for 2020-03-28

- renamed results from yesterday into time stamped file

2 years agoRead entire input files before making api calls.
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:55:52 +0000 (13:55 -0700)]
Read entire input files before making api calls.

This is nicer style to not hold onto resources for as long.
It will use a bit more memory.

2 years agoKeep better track of time.
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:49:19 +0000 (13:49 -0700)]
Keep better track of time.

- Add timestamp ot transliterations output file.
- Append wikidata search terms instead of overwriting

2 years agoMerge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID...
Kaylea Champion [Sat, 28 Mar 2020 19:21:37 +0000 (12:21 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory

updates my branch with all the master changes so far

2 years agotrialing new approach
Kaylea Champion [Sat, 28 Mar 2020 19:18:01 +0000 (12:18 -0700)]
trialing new approach

2 years agotrialing new approach
Kaylea Champion [Sat, 28 Mar 2020 19:17:45 +0000 (12:17 -0700)]
trialing new approach

2 years agotypo fix
Nathan TeBlunthuis [Sat, 28 Mar 2020 17:01:43 +0000 (10:01 -0700)]
typo fix

2 years agoMerge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Dig...
Nathan TeBlunthuis [Sat, 28 Mar 2020 16:58:43 +0000 (09:58 -0700)]
Merge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into translations

2 years agoMerge pull request #6 from aaronshaw/translations
Aaron Shaw [Sat, 28 Mar 2020 15:28:41 +0000 (10:28 -0500)]
Merge pull request #6 from aaronshaw/translations

minimal example in R

2 years agoa minimal example in R that outputs a table of top 5 related search terms per day...
aaronshaw [Sat, 28 Mar 2020 15:18:33 +0000 (10:18 -0500)]
a minimal example in R that outputs a table of top 5 related search terms per day per query

2 years agoA few suggestions for the python script:
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:27:02 +0000 (20:27 -0700)]
A few suggestions for the python script:

- using format strings (f-strings) is a nice way in python to build
strings using variables.
- you can read and process a file in one pass if you iterate over the
open file itself instead of reading it into a variable and then
looping
- i had to change your strip code when i stopped using csv reader
- my python linter and auto-formater hate non-indendent comments
- i added a few lines to print cases where we don't get Ok responses.

2 years agoReorganize wikipedia views subproject into subpackage.
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:13:11 +0000 (20:13 -0700)]
Reorganize wikipedia views subproject into subpackage.

2 years agoadd mwapi to requirements
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:05:07 +0000 (20:05 -0700)]
add mwapi to requirements

2 years agoall data
Kaylea Champion [Sat, 28 Mar 2020 01:24:19 +0000 (18:24 -0700)]
all data

2 years agocleaning out commented code
Kaylea Champion [Sat, 28 Mar 2020 01:19:22 +0000 (18:19 -0700)]
cleaning out commented code

2 years agoreorganizes comments
Kaylea Champion [Sat, 28 Mar 2020 01:17:39 +0000 (18:17 -0700)]
reorganizes comments

2 years agoinitial files
Kaylea Champion [Sat, 28 Mar 2020 01:10:13 +0000 (18:10 -0700)]
initial files

2 years agomakes TSV
Kaylea Champion [Sat, 28 Mar 2020 01:08:43 +0000 (18:08 -0700)]
makes TSV
makes JSON

2 years agomany bug fixes
Kaylea Champion [Sat, 28 Mar 2020 00:24:18 +0000 (17:24 -0700)]
many bug fixes

2 years agoadd output files from tranliteration search using google trends
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:53:03 +0000 (16:53 -0700)]
add output files from tranliteration search using google trends

2 years agoexpand wikidata search to get keywords from google trends
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:52:19 +0000 (16:52 -0700)]
expand wikidata search to get keywords from google trends

2 years agofor testing
Kaylea Champion [Fri, 27 Mar 2020 23:00:36 +0000 (16:00 -0700)]
for testing

2 years agoMerge pull request #3 from kayleachampion/master
Benjamin Mako Hill [Fri, 27 Mar 2020 21:47:15 +0000 (14:47 -0700)]
Merge pull request #3 from kayleachampion/master

adding in an article list

2 years agonew file -- list of article names
Kaylea Champion [Fri, 27 Mar 2020 21:41:38 +0000 (14:41 -0700)]
new file -- list of article names

2 years agostart keeping track of installation requirements
Nathan TeBlunthuis [Fri, 27 Mar 2020 17:55:24 +0000 (10:55 -0700)]
start keeping track of installation requirements

2 years agoupdate output using limited base terms list
Nathan TeBlunthuis [Thu, 26 Mar 2020 18:14:55 +0000 (11:14 -0700)]
update output using limited base terms list

2 years agoshell script to run the whoe process
Nathan TeBlunthuis [Thu, 26 Mar 2020 18:13:23 +0000 (11:13 -0700)]
shell script to run the whoe process

2 years agonarrow base terms
Nathan TeBlunthuis [Thu, 26 Mar 2020 17:24:31 +0000 (10:24 -0700)]
narrow base terms

2 years agoMerge pull request #2 from aaronshaw/patch-1
groceryheist [Thu, 26 Mar 2020 17:23:27 +0000 (10:23 -0700)]
Merge pull request #2 from aaronshaw/patch-1

Update base_terms.txt

2 years agoUpdate base_terms.txt
Aaron Shaw [Thu, 26 Mar 2020 01:28:20 +0000 (20:28 -0500)]
Update base_terms.txt

typo fix

2 years agoFinish MVP for transliterations
Nathan TeBlunthuis [Wed, 25 Mar 2020 05:06:08 +0000 (22:06 -0700)]
Finish MVP for transliterations

code is reasonably well-written
checked that we get seemingly good data back
adding README
adding data

2 years agoUntested code to get labels from wikidata in all languages.
Nathan TeBlunthuis [Wed, 25 Mar 2020 01:04:22 +0000 (18:04 -0700)]
Untested code to get labels from wikidata in all languages.

2 years agoPython code to find wikidata entities to translate. Here we search the api for entit...
Nathan TeBlunthuis [Tue, 24 Mar 2020 22:03:47 +0000 (15:03 -0700)]
Python code to find wikidata entities to translate.  Here we search the api for entities that have covid keywords.

Building system for finding translations from Wikidata.

2 years agoMerge pull request #1 from kayleachampion/patch-1
groceryheist [Tue, 24 Mar 2020 21:31:36 +0000 (14:31 -0700)]
Merge pull request #1 from kayleachampion/patch-1

Update README.md

2 years agoUpdate README.md
Kaylea Champion [Tue, 24 Mar 2020 21:28:17 +0000 (14:28 -0700)]
Update README.md

some language nicing and adding in immediacy goal

2 years agoUpdate README.md
Kaylea Champion [Tue, 24 Mar 2020 21:25:43 +0000 (14:25 -0700)]
Update README.md

language nicing

2 years agoadd code of conduct and elaborate description
Nathan TeBlunthuis [Tue, 24 Mar 2020 19:12:57 +0000 (12:12 -0700)]
add code of conduct and elaborate description

2 years agoInitial commit
groceryheist [Tue, 24 Mar 2020 18:10:29 +0000 (11:10 -0700)]
Initial commit

Community Data Science Collective || Want to submit a patch?