covid19.git
2 years agoaddress confusion with date
Benjamin Mako Hill [Wed, 1 Apr 2020 20:14:05 +0000 (15:14 -0500)]
address confusion with date

The timestamps in files should be the day that the exports are done. For
the view data, the query date needs to be the day before but this
shouldn't be the timestamp we use in files, etc.

2 years agofix bugs with the date stamps
Benjamin Mako Hill [Wed, 1 Apr 2020 15:47:33 +0000 (10:47 -0500)]
fix bugs with the date stamps

2 years agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Benjamin Mako Hill [Wed, 1 Apr 2020 14:53:40 +0000 (07:53 -0700)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

2 years agoadded gitignore for wikipedia/data directory
Benjamin Mako Hill [Wed, 1 Apr 2020 14:52:15 +0000 (07:52 -0700)]
added gitignore for wikipedia/data directory

2 years agorenamed the wikipedia_views module to wikipedia
Benjamin Mako Hill [Wed, 1 Apr 2020 14:51:20 +0000 (07:51 -0700)]
renamed the wikipedia_views module to wikipedia

2 years agoadded initial version of revision-scraper
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:38 +0000 (07:42 -0700)]
added initial version of revision-scraper

Borrows much of the structure from the (patched) version of the
dailyview scraper.

2 years agofixed typo in description of view scraper
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:24 +0000 (07:42 -0700)]
fixed typo in description of view scraper

2 years agorenamed daily views to make it clear that it's just enwiki
Benjamin Mako Hill [Wed, 1 Apr 2020 14:29:01 +0000 (07:29 -0700)]
renamed daily views to make it clear that it's just enwiki

2 years agochanges to a bunch of the wikipedia view code
Benjamin Mako Hill [Wed, 1 Apr 2020 14:15:12 +0000 (07:15 -0700)]
changes to a bunch of the wikipedia view code

- Renamed the articles.txt to something more specific

Changes to both scripts:

- Updated filenames to match the new standard
- Reworked the logging code so that it can write to stderr by
  default. Because we can only call logging.basicConfig() once, this
  eneded up being a bigger changes.
- Caused scripts to output git commits and export to track which code
  produced which dataset.
- Caused programs to take files instead of directories as
  output (allows us to run programs more than once a day).

Changes to the wikipedia_views/scripts/fetch_daily_views.py:

- Change output that it outputs a sequence of JSON dictionaries (one
  per line) as per the standard we agreed to and which is what
  Twitter, Github, and other dumps do. Previous behavior was to create
  output a single JSON list object.
- A number of other small changes and tweaks throughout.

2 years agoadd examples using the translations data
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:56:59 +0000 (16:56 -0700)]
add examples using the translations data

2 years agoadd documentation for the output files
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:22:30 +0000 (16:22 -0700)]
add documentation for the output files

2 years agocreate 'latest.csv' to link to the most recent output.
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:16:36 +0000 (16:16 -0700)]
create 'latest.csv' to link to the most recent output.

2 years agoMerge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_O...
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:01:43 +0000 (16:01 -0700)]
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory

2 years agoupdate output
Nathan TeBlunthuis [Tue, 31 Mar 2020 23:01:38 +0000 (16:01 -0700)]
update output

2 years agouse 'item' instead of 'entity'
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:30:08 +0000 (15:30 -0700)]
use 'item' instead of 'entity'

2 years agorename compile script
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:27:39 +0000 (15:27 -0700)]
rename compile script

2 years agoupdate compile script
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:27:21 +0000 (15:27 -0700)]
update compile script

2 years agoImprove README.md for keywords
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:25:51 +0000 (15:25 -0700)]
Improve README.md for keywords

2 years agorename 'transliterations' to 'keywords'
Nathan TeBlunthuis [Tue, 31 Mar 2020 22:15:01 +0000 (15:15 -0700)]
rename 'transliterations' to 'keywords'

2 years agoUpdate README.md
Aaron Shaw [Tue, 31 Mar 2020 22:09:58 +0000 (17:09 -0500)]
Update README.md

linking to project pages more fully

2 years agoMerge pull request #10 from makoshark/master
Kaylea Champion [Tue, 31 Mar 2020 19:23:36 +0000 (12:23 -0700)]
Merge pull request #10 from makoshark/master

stop writing writing header to one-column list

2 years agostop writing writing header to one-column list
Benjamin Mako Hill [Tue, 31 Mar 2020 15:35:23 +0000 (08:35 -0700)]
stop writing writing header to one-column list

This feels like it's asking for trouble. Description of the contents
of the list is in the filename.

2 years agoreorganize file structure
Nathan TeBlunthuis [Mon, 30 Mar 2020 04:49:57 +0000 (21:49 -0700)]
reorganize file structure

- move 'input' files to resources
- outputs not meant for downstream go in output/intermediate
- csv outputs for downstream go in output/csv

2 years agomigrating to new directory structure
Kaylea Champion [Sun, 29 Mar 2020 18:42:01 +0000 (13:42 -0500)]
migrating to new directory structure

2 years agoMerge pull request #7 from kayleachampion/master
Kaylea Champion [Sun, 29 Mar 2020 18:39:32 +0000 (11:39 -0700)]
Merge pull request #7 from kayleachampion/master

cleanup with merge

2 years agoall march data
Kaylea Champion [Sun, 29 Mar 2020 07:19:54 +0000 (00:19 -0700)]
all march data

2 years agoadding a logs dir without adding my log files, assuming those don't
Kaylea Champion [Sun, 29 Mar 2020 06:50:04 +0000 (23:50 -0700)]
adding a logs dir without adding my log files, assuming those don't
belong in repo

2 years agonew version of this from scrape. no double quotes around articles any
Kaylea Champion [Sun, 29 Mar 2020 06:47:55 +0000 (23:47 -0700)]
new version of this from scrape. no double quotes around articles any
more

2 years agoadds a scraper to update the articles file
Kaylea Champion [Sun, 29 Mar 2020 06:46:48 +0000 (23:46 -0700)]
adds a scraper to update the articles file

2 years agoadds in new logging capability
Kaylea Champion [Sun, 29 Mar 2020 01:46:35 +0000 (18:46 -0700)]
adds in new logging capability

2 years agoMerge pull request #9 from aaronshaw/master
Aaron Shaw [Sun, 29 Mar 2020 01:42:40 +0000 (20:42 -0500)]
Merge pull request #9 from aaronshaw/master

minimal analysis example with pageview data

2 years agominimal analysis example with pageview data
aaronshaw [Sun, 29 Mar 2020 01:33:23 +0000 (20:33 -0500)]
minimal analysis example with pageview data

2 years agoMerge pull request #8 from aaronshaw/master
Aaron Shaw [Sat, 28 Mar 2020 22:38:20 +0000 (17:38 -0500)]
Merge pull request #8 from aaronshaw/master

Update to load data from github url and include 3/28 data in output

2 years agoregenerated following update to R src that creates this file
aaronshaw [Sat, 28 Mar 2020 22:31:36 +0000 (17:31 -0500)]
regenerated following update to R src that creates this file

2 years agoLoading data directly from github URL. Commenting out commands that assume cloned...
aaronshaw [Sat, 28 Mar 2020 22:30:37 +0000 (17:30 -0500)]
Loading data directly from github URL. Commenting out commands that assume cloned repository.

2 years agoMerge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID...
Kaylea Champion [Sat, 28 Mar 2020 21:46:00 +0000 (14:46 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory

2 years agoMerge pull request #5 from kayleachampion/master
Kaylea Champion [Sat, 28 Mar 2020 21:17:21 +0000 (14:17 -0700)]
Merge pull request #5 from kayleachampion/master

view data

2 years agoMerge pull request #1 from CommunityDataScienceCollective/kaylea/master
Kaylea Champion [Sat, 28 Mar 2020 21:15:53 +0000 (14:15 -0700)]
Merge pull request #1 from CommunityDataScienceCollective/kaylea/master

Some suggested changes.

2 years agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di... kaylea/master
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:13:46 +0000 (14:13 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

2 years agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di...
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

2 years agoMerge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Di...
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master

2 years agoRead the whole input file before making api calls
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls

2 years agoRead the whole input file before making api calls
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls

2 years agoMerge pull request #4 from CommunityDataScienceCollective/translations
groceryheist [Sat, 28 Mar 2020 21:07:04 +0000 (14:07 -0700)]
Merge pull request #4 from CommunityDataScienceCollective/translations

Transliterations: Use data from google trends and wikidata to find transliterations.

2 years agoUpdate transliteration results for 2020-03-28 translations
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:03:16 +0000 (14:03 -0700)]
Update transliteration results for 2020-03-28

- renamed results from yesterday into time stamped file

2 years agoRead entire input files before making api calls.
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:55:52 +0000 (13:55 -0700)]
Read entire input files before making api calls.

This is nicer style to not hold onto resources for as long.
It will use a bit more memory.

2 years agoKeep better track of time.
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:49:19 +0000 (13:49 -0700)]
Keep better track of time.

- Add timestamp ot transliterations output file.
- Append wikidata search terms instead of overwriting

2 years agoMerge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID...
Kaylea Champion [Sat, 28 Mar 2020 19:21:37 +0000 (12:21 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory

updates my branch with all the master changes so far

2 years agotrialing new approach
Kaylea Champion [Sat, 28 Mar 2020 19:18:01 +0000 (12:18 -0700)]
trialing new approach

2 years agotrialing new approach
Kaylea Champion [Sat, 28 Mar 2020 19:17:45 +0000 (12:17 -0700)]
trialing new approach

2 years agotypo fix
Nathan TeBlunthuis [Sat, 28 Mar 2020 17:01:43 +0000 (10:01 -0700)]
typo fix

2 years agoMerge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Dig...
Nathan TeBlunthuis [Sat, 28 Mar 2020 16:58:43 +0000 (09:58 -0700)]
Merge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into translations

2 years agoMerge pull request #6 from aaronshaw/translations
Aaron Shaw [Sat, 28 Mar 2020 15:28:41 +0000 (10:28 -0500)]
Merge pull request #6 from aaronshaw/translations

minimal example in R

2 years agoa minimal example in R that outputs a table of top 5 related search terms per day...
aaronshaw [Sat, 28 Mar 2020 15:18:33 +0000 (10:18 -0500)]
a minimal example in R that outputs a table of top 5 related search terms per day per query

2 years agoA few suggestions for the python script:
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:27:02 +0000 (20:27 -0700)]
A few suggestions for the python script:

- using format strings (f-strings) is a nice way in python to build
strings using variables.
- you can read and process a file in one pass if you iterate over the
open file itself instead of reading it into a variable and then
looping
- i had to change your strip code when i stopped using csv reader
- my python linter and auto-formater hate non-indendent comments
- i added a few lines to print cases where we don't get Ok responses.

2 years agoReorganize wikipedia views subproject into subpackage.
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:13:11 +0000 (20:13 -0700)]
Reorganize wikipedia views subproject into subpackage.

2 years agoadd mwapi to requirements
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:05:07 +0000 (20:05 -0700)]
add mwapi to requirements

2 years agoall data
Kaylea Champion [Sat, 28 Mar 2020 01:24:19 +0000 (18:24 -0700)]
all data

2 years agocleaning out commented code
Kaylea Champion [Sat, 28 Mar 2020 01:19:22 +0000 (18:19 -0700)]
cleaning out commented code

2 years agoreorganizes comments
Kaylea Champion [Sat, 28 Mar 2020 01:17:39 +0000 (18:17 -0700)]
reorganizes comments

2 years agoinitial files
Kaylea Champion [Sat, 28 Mar 2020 01:10:13 +0000 (18:10 -0700)]
initial files

2 years agomakes TSV
Kaylea Champion [Sat, 28 Mar 2020 01:08:43 +0000 (18:08 -0700)]
makes TSV
makes JSON

2 years agomany bug fixes
Kaylea Champion [Sat, 28 Mar 2020 00:24:18 +0000 (17:24 -0700)]
many bug fixes

2 years agoadd output files from tranliteration search using google trends
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:53:03 +0000 (16:53 -0700)]
add output files from tranliteration search using google trends

2 years agoexpand wikidata search to get keywords from google trends
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:52:19 +0000 (16:52 -0700)]
expand wikidata search to get keywords from google trends

2 years agofor testing
Kaylea Champion [Fri, 27 Mar 2020 23:00:36 +0000 (16:00 -0700)]
for testing

2 years agoMerge pull request #3 from kayleachampion/master
Benjamin Mako Hill [Fri, 27 Mar 2020 21:47:15 +0000 (14:47 -0700)]
Merge pull request #3 from kayleachampion/master

adding in an article list

2 years agonew file -- list of article names
Kaylea Champion [Fri, 27 Mar 2020 21:41:38 +0000 (14:41 -0700)]
new file -- list of article names

2 years agostart keeping track of installation requirements
Nathan TeBlunthuis [Fri, 27 Mar 2020 17:55:24 +0000 (10:55 -0700)]
start keeping track of installation requirements

2 years agoupdate output using limited base terms list
Nathan TeBlunthuis [Thu, 26 Mar 2020 18:14:55 +0000 (11:14 -0700)]
update output using limited base terms list

2 years agoshell script to run the whoe process
Nathan TeBlunthuis [Thu, 26 Mar 2020 18:13:23 +0000 (11:13 -0700)]
shell script to run the whoe process

2 years agonarrow base terms
Nathan TeBlunthuis [Thu, 26 Mar 2020 17:24:31 +0000 (10:24 -0700)]
narrow base terms

2 years agoMerge pull request #2 from aaronshaw/patch-1
groceryheist [Thu, 26 Mar 2020 17:23:27 +0000 (10:23 -0700)]
Merge pull request #2 from aaronshaw/patch-1

Update base_terms.txt

2 years agoUpdate base_terms.txt
Aaron Shaw [Thu, 26 Mar 2020 01:28:20 +0000 (20:28 -0500)]
Update base_terms.txt

typo fix

2 years agoFinish MVP for transliterations
Nathan TeBlunthuis [Wed, 25 Mar 2020 05:06:08 +0000 (22:06 -0700)]
Finish MVP for transliterations

code is reasonably well-written
checked that we get seemingly good data back
adding README
adding data

2 years agoUntested code to get labels from wikidata in all languages.
Nathan TeBlunthuis [Wed, 25 Mar 2020 01:04:22 +0000 (18:04 -0700)]
Untested code to get labels from wikidata in all languages.

2 years agoPython code to find wikidata entities to translate. Here we search the api for entit...
Nathan TeBlunthuis [Tue, 24 Mar 2020 22:03:47 +0000 (15:03 -0700)]
Python code to find wikidata entities to translate.  Here we search the api for entities that have covid keywords.

Building system for finding translations from Wikidata.

2 years agoMerge pull request #1 from kayleachampion/patch-1
groceryheist [Tue, 24 Mar 2020 21:31:36 +0000 (14:31 -0700)]
Merge pull request #1 from kayleachampion/patch-1

Update README.md

2 years agoUpdate README.md
Kaylea Champion [Tue, 24 Mar 2020 21:28:17 +0000 (14:28 -0700)]
Update README.md

some language nicing and adding in immediacy goal

2 years agoUpdate README.md
Kaylea Champion [Tue, 24 Mar 2020 21:25:43 +0000 (14:25 -0700)]
Update README.md

language nicing

2 years agoadd code of conduct and elaborate description
Nathan TeBlunthuis [Tue, 24 Mar 2020 19:12:57 +0000 (12:12 -0700)]
add code of conduct and elaborate description

2 years agoInitial commit
groceryheist [Tue, 24 Mar 2020 18:10:29 +0000 (11:10 -0700)]
Initial commit

Community Data Science Collective || Want to submit a patch?