]> code.communitydata.science - mediawiki_dump_tools.git/commit
added counting functionality to regex code
authorBenjamin Mako Hill <mako@atdot.cc>
Sat, 29 Apr 2023 18:40:03 +0000 (11:40 -0700)
committerBenjamin Mako Hill <mako@atdot.cc>
Sat, 29 Apr 2023 18:40:03 +0000 (11:40 -0700)
commit2ff4d6061399c22eb539f1fd609e7046aa44dba9
treed4c9e2c808524558114fe6232c79628238e2fb26
parent4729371d5ab057b5d028d5ca9b7aeafd7bc40478
added counting functionality to regex code

The regex code has historically returned the actual matched patterns and the
named capture groups within regexes.  When trying to count common and/or large
patterns, this leads to very large outputs.

I've added two new functions -RPc and -CPc that will cause wikiq to return
counts of each pattern (0 when there are no matches). The options apply to all
comment or revision patterns. I considered interfaces to make it possible to do
some but others but concluded this would be too complicated an interface.

This code should be checked before it's merged.
wikiq

Community Data Science Collective || Want to submit a patch?