]> code.communitydata.science - cdsc_reddit.git/commit
Extract variables from pushshift comment to parquet
authorNate E TeBlunthuis <nathante@n2347.hyak.local>
Thu, 2 Jul 2020 21:06:36 +0000 (14:06 -0700)
committerNate E TeBlunthuis <nathante@mox2.hyak.local>
Thu, 2 Jul 2020 21:35:55 +0000 (14:35 -0700)
commit64e9408a65a5b40c4a62a904017dec14c80cc77f
tree6efdd2e913d9645ff4283f29e578b83436b897fe
Extract variables from pushshift comment to parquet

A spark script
reddit_bz2_2parquet.py [new file with mode: 0755]

Community Data Science Collective || Want to submit a patch?