1 ============================
2 Wikia user roles scraper
3 ============================
4 This package provides a pair of python scripts that obtain data on the roles of mediawiki users from the Wikia API. It is maintained by Nate TeBlunthuis: nathante@uw.edu.
8 The scripts read a list of wikis that have urls and names. See example/wikiList.csv for an example wiki list. In this example the list is comma-separated and has a header. The listusers api provides data on current bots or adminsistrators. The logevents api provides historical data. Both are needed to identify bots or administrators for the entire history of a Wiki. The data can be parsed using the RCommunityData package found at code.communitydata.cc.
10 The shell scripts scrape_log.sh and scrape_list.sh provide examples of how to use the python programs.
12 The scripts are able to detect and log errors caused by deleted wikis and other cases where the API data is unvailable.
14 userroles_from_listusers.py
15 --------------------------------
17 usage: userroles_from_listusers.py [-h] [--no-header] [--nuke-old] [--sep SEP]
21 Get user roles for Wikis from the Mediawiki list users API
24 wikilist path to the input file: a wiki list with wiki url ilename
25 output path to put the logs we scrape e.g.
26 /com/projects/messagewalls/allusers/
29 -h, --help show this help message and exit
30 --no-header does the wikilist have no header?
31 --nuke-old remove old files
32 --sep SEP input table delimiter
33 -i I <j,k> two 0-based indices for wiki and url in the csv,
36 userroles_from_logevents.py
37 ---------------------------------
38 usage: userroles_from_logevents.py [-h] [--no-header] [--nuke-old] [--sep SEP]
39 [-i I] [--blocks-output BLOCKS_OUTPUT]
42 Get user roles for Wikis from the Mediawiki list users API
45 wikilist path to the input file: a wiki list with wiki url
47 output path to put the logs we scrape e.g.
48 /com/projects/messagewalls/allusers/
51 -h, --help show this help message and exit
52 --no-header does the wikilist have no header?
53 --nuke-old remove old files.
54 --sep SEP input table delimiter
55 -i I <j,k> two 0-based indices for wiki and url in the csv,
57 --blocks-output BLOCKS_OUTPUT
58 Path to output block event logs. If empty, blocks are
63 Copyright (C) 2018 Nathan TeBlunthuis.
64 Permission is granted to copy, distribute and/or modify this document
65 under the terms of the GNU Free Documentation License, Version 1.3
66 or any later version published by the Free Software Foundation;
67 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
68 A copy of the license is included in the file entitled "fdl-1.3.md".