]>
code.communitydata.science - covid19.git/log
summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Benjamin Mako Hill [Wed, 1 Apr 2020 14:52:15 +0000 (07:52 -0700)]
added gitignore for wikipedia/data directory
Benjamin Mako Hill [Wed, 1 Apr 2020 14:51:20 +0000 (07:51 -0700)]
renamed the wikipedia_views module to wikipedia
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:38 +0000 (07:42 -0700)]
added initial version of revision-scraper
Borrows much of the structure from the (patched) version of the
dailyview scraper.
Benjamin Mako Hill [Wed, 1 Apr 2020 14:42:24 +0000 (07:42 -0700)]
fixed typo in description of view scraper
Benjamin Mako Hill [Wed, 1 Apr 2020 14:29:01 +0000 (07:29 -0700)]
renamed daily views to make it clear that it's just enwiki
Benjamin Mako Hill [Wed, 1 Apr 2020 14:15:12 +0000 (07:15 -0700)]
changes to a bunch of the wikipedia view code
- Renamed the articles.txt to something more specific
Changes to both scripts:
- Updated filenames to match the new standard
- Reworked the logging code so that it can write to stderr by
default. Because we can only call logging.basicConfig() once, this
eneded up being a bigger changes.
- Caused scripts to output git commits and export to track which code
produced which dataset.
- Caused programs to take files instead of directories as
output (allows us to run programs more than once a day).
Changes to the wikipedia_views/scripts/fetch_daily_views.py:
- Change output that it outputs a sequence of JSON dictionaries (one
per line) as per the standard we agreed to and which is what
Twitter, Github, and other dumps do. Previous behavior was to create
output a single JSON list object.
- A number of other small changes and tweaks throughout.
Benjamin Mako Hill [Tue, 31 Mar 2020 15:35:23 +0000 (08:35 -0700)]
stop writing writing header to one-column list
This feels like it's asking for trouble. Description of the contents
of the list is in the filename.
Nathan TeBlunthuis [Mon, 30 Mar 2020 04:49:57 +0000 (21:49 -0700)]
reorganize file structure
- move 'input' files to resources
- outputs not meant for downstream go in output/intermediate
- csv outputs for downstream go in output/csv
Kaylea Champion [Sun, 29 Mar 2020 18:42:01 +0000 (13:42 -0500)]
migrating to new directory structure
Kaylea Champion [Sun, 29 Mar 2020 18:39:32 +0000 (11:39 -0700)]
Merge pull request #7 from kayleachampion/master
cleanup with merge
Kaylea Champion [Sun, 29 Mar 2020 07:19:54 +0000 (00:19 -0700)]
all march data
Kaylea Champion [Sun, 29 Mar 2020 06:50:04 +0000 (23:50 -0700)]
adding a logs dir without adding my log files, assuming those don't
belong in repo
Kaylea Champion [Sun, 29 Mar 2020 06:47:55 +0000 (23:47 -0700)]
new version of this from scrape. no double quotes around articles any
more
Kaylea Champion [Sun, 29 Mar 2020 06:46:48 +0000 (23:46 -0700)]
adds a scraper to update the articles file
Kaylea Champion [Sun, 29 Mar 2020 01:46:35 +0000 (18:46 -0700)]
adds in new logging capability
Aaron Shaw [Sun, 29 Mar 2020 01:42:40 +0000 (20:42 -0500)]
Merge pull request #9 from aaronshaw/master
minimal analysis example with pageview data
aaronshaw [Sun, 29 Mar 2020 01:33:23 +0000 (20:33 -0500)]
minimal analysis example with pageview data
Aaron Shaw [Sat, 28 Mar 2020 22:38:20 +0000 (17:38 -0500)]
Merge pull request #8 from aaronshaw/master
Update to load data from github url and include 3/28 data in output
aaronshaw [Sat, 28 Mar 2020 22:31:36 +0000 (17:31 -0500)]
regenerated following update to R src that creates this file
aaronshaw [Sat, 28 Mar 2020 22:30:37 +0000 (17:30 -0500)]
Loading data directly from github URL. Commenting out commands that assume cloned repository.
Kaylea Champion [Sat, 28 Mar 2020 21:46:00 +0000 (14:46 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory
Kaylea Champion [Sat, 28 Mar 2020 21:17:21 +0000 (14:17 -0700)]
Merge pull request #5 from kayleachampion/master
view data
Kaylea Champion [Sat, 28 Mar 2020 21:15:53 +0000 (14:15 -0700)]
Merge pull request #1 from CommunityDataScienceCollective/kaylea/master
Some suggested changes.
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:13:46 +0000 (14:13 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:12:36 +0000 (14:12 -0700)]
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:09:28 +0000 (14:09 -0700)]
Read the whole input file before making api calls
groceryheist [Sat, 28 Mar 2020 21:07:04 +0000 (14:07 -0700)]
Merge pull request #4 from CommunityDataScienceCollective/translations
Transliterations: Use data from google trends and wikidata to find transliterations.
Nathan TeBlunthuis [Sat, 28 Mar 2020 21:03:16 +0000 (14:03 -0700)]
Update transliteration results for 2020-03-28
- renamed results from yesterday into time stamped file
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:55:52 +0000 (13:55 -0700)]
Read entire input files before making api calls.
This is nicer style to not hold onto resources for as long.
It will use a bit more memory.
Nathan TeBlunthuis [Sat, 28 Mar 2020 20:49:19 +0000 (13:49 -0700)]
Keep better track of time.
- Add timestamp ot transliterations output file.
- Append wikidata search terms instead of overwriting
Kaylea Champion [Sat, 28 Mar 2020 19:21:37 +0000 (12:21 -0700)]
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory
updates my branch with all the master changes so far
Kaylea Champion [Sat, 28 Mar 2020 19:18:01 +0000 (12:18 -0700)]
trialing new approach
Kaylea Champion [Sat, 28 Mar 2020 19:17:45 +0000 (12:17 -0700)]
trialing new approach
Nathan TeBlunthuis [Sat, 28 Mar 2020 17:01:43 +0000 (10:01 -0700)]
typo fix
Nathan TeBlunthuis [Sat, 28 Mar 2020 16:58:43 +0000 (09:58 -0700)]
Merge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into translations
Aaron Shaw [Sat, 28 Mar 2020 15:28:41 +0000 (10:28 -0500)]
Merge pull request #6 from aaronshaw/translations
minimal example in R
aaronshaw [Sat, 28 Mar 2020 15:18:33 +0000 (10:18 -0500)]
a minimal example in R that outputs a table of top 5 related search terms per day per query
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:27:02 +0000 (20:27 -0700)]
A few suggestions for the python script:
- using format strings (f-strings) is a nice way in python to build
strings using variables.
- you can read and process a file in one pass if you iterate over the
open file itself instead of reading it into a variable and then
looping
- i had to change your strip code when i stopped using csv reader
- my python linter and auto-formater hate non-indendent comments
- i added a few lines to print cases where we don't get Ok responses.
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:13:11 +0000 (20:13 -0700)]
Reorganize wikipedia views subproject into subpackage.
Nathan TeBlunthuis [Sat, 28 Mar 2020 03:05:07 +0000 (20:05 -0700)]
add mwapi to requirements
Kaylea Champion [Sat, 28 Mar 2020 01:24:19 +0000 (18:24 -0700)]
all data
Kaylea Champion [Sat, 28 Mar 2020 01:19:22 +0000 (18:19 -0700)]
cleaning out commented code
Kaylea Champion [Sat, 28 Mar 2020 01:17:39 +0000 (18:17 -0700)]
reorganizes comments
Kaylea Champion [Sat, 28 Mar 2020 01:10:13 +0000 (18:10 -0700)]
initial files
Kaylea Champion [Sat, 28 Mar 2020 01:08:43 +0000 (18:08 -0700)]
makes TSV
makes JSON
Kaylea Champion [Sat, 28 Mar 2020 00:24:18 +0000 (17:24 -0700)]
many bug fixes
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:53:03 +0000 (16:53 -0700)]
add output files from tranliteration search using google trends
Nathan TeBlunthuis [Fri, 27 Mar 2020 23:52:19 +0000 (16:52 -0700)]
expand wikidata search to get keywords from google trends
Kaylea Champion [Fri, 27 Mar 2020 23:00:36 +0000 (16:00 -0700)]
for testing
Benjamin Mako Hill [Fri, 27 Mar 2020 21:47:15 +0000 (14:47 -0700)]
Merge pull request #3 from kayleachampion/master
adding in an article list
Kaylea Champion [Fri, 27 Mar 2020 21:41:38 +0000 (14:41 -0700)]
new file -- list of article names
Nathan TeBlunthuis [Fri, 27 Mar 2020 17:55:24 +0000 (10:55 -0700)]
start keeping track of installation requirements
Nathan TeBlunthuis [Thu, 26 Mar 2020 18:14:55 +0000 (11:14 -0700)]
update output using limited base terms list
Nathan TeBlunthuis [Thu, 26 Mar 2020 18:13:23 +0000 (11:13 -0700)]
shell script to run the whoe process
Nathan TeBlunthuis [Thu, 26 Mar 2020 17:24:31 +0000 (10:24 -0700)]
narrow base terms
groceryheist [Thu, 26 Mar 2020 17:23:27 +0000 (10:23 -0700)]
Merge pull request #2 from aaronshaw/patch-1
Update base_terms.txt
Aaron Shaw [Thu, 26 Mar 2020 01:28:20 +0000 (20:28 -0500)]
Update base_terms.txt
typo fix
Nathan TeBlunthuis [Wed, 25 Mar 2020 05:06:08 +0000 (22:06 -0700)]
Finish MVP for transliterations
code is reasonably well-written
checked that we get seemingly good data back
adding README
adding data
Nathan TeBlunthuis [Wed, 25 Mar 2020 01:04:22 +0000 (18:04 -0700)]
Untested code to get labels from wikidata in all languages.
Nathan TeBlunthuis [Tue, 24 Mar 2020 22:03:47 +0000 (15:03 -0700)]
Python code to find wikidata entities to translate. Here we search the api for entities that have covid keywords.
Building system for finding translations from Wikidata.
groceryheist [Tue, 24 Mar 2020 21:31:36 +0000 (14:31 -0700)]
Merge pull request #1 from kayleachampion/patch-1
Update README.md
Kaylea Champion [Tue, 24 Mar 2020 21:28:17 +0000 (14:28 -0700)]
Update README.md
some language nicing and adding in immediacy goal
Kaylea Champion [Tue, 24 Mar 2020 21:25:43 +0000 (14:25 -0700)]
Update README.md
language nicing
Nathan TeBlunthuis [Tue, 24 Mar 2020 19:12:57 +0000 (12:12 -0700)]
add code of conduct and elaborate description
groceryheist [Tue, 24 Mar 2020 18:10:29 +0000 (11:10 -0700)]
Initial commit
Community Data Science Collective || Want to submit a patch?