]> code.communitydata.science - mediawiki_dump_tools.git/log
mediawiki_dump_tools.git
5 years agoMerge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into... user_level_wikiq
groceryheist [Fri, 31 Aug 2018 23:03:07 +0000 (16:03 -0700)]
Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq

5 years agoMerge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into...
groceryheist [Fri, 31 Aug 2018 23:02:05 +0000 (16:02 -0700)]
Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq

5 years agoMerge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into...
groceryheist [Fri, 31 Aug 2018 23:02:05 +0000 (16:02 -0700)]
Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq

5 years agoMerge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into...
groceryheist [Fri, 31 Aug 2018 23:01:07 +0000 (16:01 -0700)]
Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq

5 years agoMerge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into...
groceryheist [Fri, 31 Aug 2018 23:01:07 +0000 (16:01 -0700)]
Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq

5 years agoadd more variables and support for persistence
groceryheist [Fri, 31 Aug 2018 22:57:48 +0000 (15:57 -0700)]
add more variables and support for persistence

5 years agoadd more variables and support for persistence
groceryheist [Fri, 31 Aug 2018 22:57:48 +0000 (15:57 -0700)]
add more variables and support for persistence

5 years agoadd spark program for running group by users
groceryheist [Fri, 31 Aug 2018 20:40:22 +0000 (20:40 +0000)]
add spark program for running group by users

5 years agoMerge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into...
groceryheist [Tue, 14 Aug 2018 21:44:37 +0000 (14:44 -0700)]
Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq

5 years agoUse dask to parallelize and scale user level datasets
groceryheist [Tue, 14 Aug 2018 21:37:03 +0000 (14:37 -0700)]
Use dask to parallelize and scale user level datasets

5 years agoUse dask to parallelize and scale user level datasets
groceryheist [Tue, 14 Aug 2018 21:37:03 +0000 (14:37 -0700)]
Use dask to parallelize and scale user level datasets

5 years agoMerge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into...
groceryheist [Mon, 13 Aug 2018 04:34:12 +0000 (21:34 -0700)]
Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq

5 years agorefactor wikiq to seperate script from classes and functions. Code reuse in testing.
groceryheist [Mon, 13 Aug 2018 04:33:19 +0000 (21:33 -0700)]
refactor wikiq to seperate script from classes and functions. Code reuse in testing.

5 years agomove tests to test folder
groceryheist [Mon, 13 Aug 2018 01:05:59 +0000 (18:05 -0700)]
move tests to test folder

5 years agomove tests to test folder
groceryheist [Mon, 13 Aug 2018 01:05:59 +0000 (18:05 -0700)]
move tests to test folder

5 years agoPrefix page titles with namespace names. mediawiki-utils-migration
groceryheist [Tue, 10 Jul 2018 05:11:17 +0000 (22:11 -0700)]
Prefix page titles with namespace names.

5 years agomigrate to mwxml. This completes the migration away from python-mediawiki-utilities...
groceryheist [Thu, 5 Jul 2018 08:16:00 +0000 (01:16 -0700)]
migrate to mwxml. This completes the migration away from python-mediawiki-utilities. Except for preserving legacy persistence behavior, we can safely use the nice updates from the mediawiki-utils project.

5 years agomigrate to mwpersistence. this fixes many issues. We preserve legacy persistence...
groceryheist [Thu, 5 Jul 2018 02:06:07 +0000 (19:06 -0700)]
migrate to mwpersistence. this fixes many issues. We preserve legacy persistence behavior using the --persistence-legacy.

5 years agomigrate reverts to python-mwreverts
groceryheist [Wed, 4 Jul 2018 22:29:48 +0000 (15:29 -0700)]
migrate reverts to python-mwreverts

5 years agoadd note to readme about dependency on compression software
groceryheist [Wed, 4 Jul 2018 22:20:52 +0000 (15:20 -0700)]
add note to readme about dependency on compression software

5 years agoadd tests for wikipedia, malformed xml, bzip2, correct bz2 bug in wikiq.
groceryheist [Wed, 4 Jul 2018 22:08:30 +0000 (15:08 -0700)]
add tests for wikipedia, malformed xml, bzip2, correct bz2 bug in wikiq.

5 years agocreate baseline tests for xml dump processing
groceryheist [Wed, 4 Jul 2018 06:43:47 +0000 (23:43 -0700)]
create baseline tests for xml dump processing

5 years agoa number of small updates and fixes
Benjamin Mako Hill [Thu, 17 May 2018 21:37:20 +0000 (14:37 -0700)]
a number of small updates and fixes

- fix regex for filename/filetype matches
- unload all files not just ones with end with xml in 7z archives
- fix bug that broke stdout
- minor cosmetic fixes
- updated mediawiki-utilities submodule to latest version

6 years agosupport 7z archives with multiple files. add urlencode paraeter
groceryheist [Thu, 7 Dec 2017 23:10:56 +0000 (15:10 -0800)]
support 7z archives with multiple files. add urlencode paraeter

7 years agofix code to work with bzip files
Benjamin Mako Hill [Tue, 7 Feb 2017 02:25:17 +0000 (18:25 -0800)]
fix code to work with bzip files

8 years agoadded list of compressed dump files to .gitignore
Benjamin Mako Hill [Thu, 23 Jul 2015 19:16:31 +0000 (12:16 -0700)]
added list of compressed dump files to .gitignore

8 years agoadded support to parse namespaces from title
Benjamin Mako Hill [Thu, 23 Jul 2015 19:12:20 +0000 (12:12 -0700)]
added support to parse namespaces from title

This is necessary for wikis (e.g., Wikia XML dumps) that do not include
namespace metadata as tags within each <page>.

8 years agoadded README file to document the submodule
Benjamin Mako Hill [Thu, 23 Jul 2015 02:55:08 +0000 (19:55 -0700)]
added README file to document the submodule

8 years agocreated new repository for wikiq with Mediawiki-Utilities as a submodule
Benjamin Mako Hill [Thu, 23 Jul 2015 02:44:52 +0000 (19:44 -0700)]
created new repository for wikiq with Mediawiki-Utilities as a submodule

Community Data Science Collective || Want to submit a patch?