]> code.communitydata.science - cdsc_reddit.git/shortlog
cdsc_reddit.git
2020-07-07 Nate E TeBlunthuisBuild comments dataset similarly to submissions and...
2020-07-07 Nate E TeBlunthuisupdate .gitignore
2020-07-07 Nate E TeBlunthuisScript for example of streaming pyarrow.
2020-07-07 Nate E TeBlunthuisScript to demonstrate reading parquet.
2020-07-07 Nate E TeBlunthuisCheck the shas when we download dumps
2020-07-07 Nate E TeBlunthuisScript to run both parts of submissions_2_parquet.sh
2020-07-07 Nate E TeBlunthuisCache before sorting so we don't extract twice.
2020-07-07 Nate E TeBlunthuisMove the spark part of submissions_2_parquet to a separ...
2020-07-06 Nate E TeBlunthuisFix whitespace at top of file.
2020-07-06 Nate E TeBlunthuisSecondary sort for the by_author dataset should be...
2020-07-06 Nate E TeBlunthuisCreate a second dataset sorted by author.
2020-07-06 Nate E TeBlunthuisCreate parquet datasets of reddit submissions from...
2020-07-03 Nate E TeBlunthuisRename spark script to reflect that it is for comments.
2020-07-03 Nate E TeBlunthuisupdate .gitignore
2020-07-03 Nate E TeBlunthuisbugfix in retrieving old data and rename file.
2020-07-03 Nate E TeBlunthuisScript for checking shas for submissions.
2020-07-03 Nate E TeBlunthuisBugfix: use timestamp types
2020-07-03 Nate E TeBlunthuisupdate the reddit comment dumps
2020-07-03 Nate E TeBlunthuisDon't clobber old dumps so that we can just download...
2020-07-03 Nate E TeBlunthuisscript for getting submissions dumps from pushshift.
2020-07-02 Nate E TeBlunthuisExtract variables from pushshift comment to parquet

Community Data Science Collective || Want to submit a patch?