]> code.communitydata.science - mediawiki_dump_tools.git/commitdiff
added counting functionality to regex code
authorBenjamin Mako Hill <mako@atdot.cc>
Sat, 29 Apr 2023 18:40:03 +0000 (11:40 -0700)
committerBenjamin Mako Hill <mako@atdot.cc>
Sat, 29 Apr 2023 18:40:03 +0000 (11:40 -0700)
The regex code has historically returned the actual matched patterns and the
named capture groups within regexes.  When trying to count common and/or large
patterns, this leads to very large outputs.

I've added two new functions -RPc and -CPc that will cause wikiq to return
counts of each pattern (0 when there are no matches). The options apply to all
comment or revision patterns. I considered interfaces to make it possible to do
some but others but concluded this would be too complicated an interface.

This code should be checked before it's merged.


No differences found

Community Data Science Collective || Want to submit a patch?