]> code.communitydata.science - cdsc_reddit.git/log
cdsc_reddit.git
4 years agoupdate the reddit comment dumps
Nate E TeBlunthuis [Fri, 3 Jul 2020 17:41:13 +0000 (10:41 -0700)]
update the reddit comment dumps

4 years agoDon't clobber old dumps so that we can just download the new ones.
Nate E TeBlunthuis [Fri, 3 Jul 2020 17:40:43 +0000 (10:40 -0700)]
Don't clobber old dumps so that we can just download the new ones.

4 years agoscript for getting submissions dumps from pushshift.
Nate E TeBlunthuis [Fri, 3 Jul 2020 00:40:17 +0000 (17:40 -0700)]
script for getting submissions dumps from pushshift.

4 years agoExtract variables from pushshift comment to parquet
Nate E TeBlunthuis [Thu, 2 Jul 2020 21:06:36 +0000 (14:06 -0700)]
Extract variables from pushshift comment to parquet

A spark script

Community Data Science Collective || Want to submit a patch?