============================ Wikia user roles scraper ============================ This package provides a pair of python scripts that obtain data on the roles of mediawiki users from the Wikia API. It is maintained by Nate TeBlunthuis: nathante@uw.edu. Usage ======= The scripts read a list of wikis that have urls and names. See example/wikiList.csv for an example wiki list. In this example the list is comma-separated and has a header. The listusers api provides data on current bots or adminsistrators. The logevents api provides historical data. Both are needed to identify bots or administrators for the entire history of a Wiki. The data can be parsed using the RCommunityData package found at code.communitydata.cc. The shell scripts scrape_log.sh and scrape_list.sh provide examples of how to use the python programs. The scripts are able to detect and log errors caused by deleted wikis and other cases where the API data is unvailable. userroles_from_listusers.py -------------------------------- usage: userroles_from_listusers.py [-h] [--no-header] [--nuke-old] [--sep SEP] [-i I] wikilist output Get user roles for Wikis from the Mediawiki list users API positional arguments: wikilist path to the input file: a wiki list with wiki url ilename output path to put the logs we scrape e.g. /com/projects/messagewalls/allusers/ optional arguments: -h, --help show this help message and exit --no-header does the wikilist have no header? --nuke-old remove old files --sep SEP input table delimiter -i I two 0-based indices for wiki and url in the csv, default=0,1 userroles_from_logevents.py --------------------------------- usage: userroles_from_logevents.py [-h] [--no-header] [--nuke-old] [--sep SEP] [-i I] [--blocks-output BLOCKS_OUTPUT] wikilist output Get user roles for Wikis from the Mediawiki list users API positional arguments: wikilist path to the input file: a wiki list with wiki url ilename output path to put the logs we scrape e.g. /com/projects/messagewalls/allusers/ optional arguments: -h, --help show this help message and exit --no-header does the wikilist have no header? --nuke-old remove old files. --sep SEP input table delimiter -i I two 0-based indices for wiki and url in the csv, default=0,1 --blocks-output BLOCKS_OUTPUT Path to output block event logs. If empty, blocks are ignored. License ========= Copyright (C) 2018 Nathan TeBlunthuis. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the file entitled "fdl-1.3.md".