2 This module is a collection of utilities for efficiently processing MediaWiki's
3 XML database dumps. There are two important concerns that this module intends
4 to address: *performance* and the *complexity* of streaming XML parsing.
7 Performance is a serious concern when processing large database XML dumps.
8 Regretfully, the Global Intepreter Lock prevents us from running threads on
9 multiple CPUs. This library provides a :func:`map`, a function
10 that maps a dump processing over a set of dump files using
11 :class:`multiprocessing` to distribute the work over multiple CPUS
14 Streaming XML parsing is gross. XML dumps are (1) some site meta data, (2)
15 a collection of pages that contain (3) collections of revisions. The
16 module allows you to think about dump files in this way and ignore the
17 fact that you're streaming XML. An :class:`Iterator` contains
18 site meta data and an iterator of :class:`Page`'s. A
19 :class:`Page` contains page meta data and an iterator of
20 :class:`Revision`'s. A :class:`Revision` contains revision meta data
21 including a :class:`Contributor` (if one a contributor was specified in the
26 from .iteration import *
27 from .functions import file, open_file