]> code.communitydata.science - cdsc_reddit.git/tree
improve tokenizer.
-rw-r--r-- 75 .gitignore
-rw-r--r-- 921 check_comments_shas.py
-rwxr-xr-x 1052 check_submission_shas.py
-rwxr-xr-x 327 comments_2_parquet.sh
-rwxr-xr-x 3349 comments_2_parquet_part1.py
-rwxr-xr-x 1390 comments_2_parquet_part2.py
drwxr-xr-x - examples
-rw-r--r-- 1508 helper.py
-rwxr-xr-x 459 pull_pushshift_comments.sh
-rwxr-xr-x 759 pull_pushshift_submissions.sh
-rw-r--r-- 503 sort_tf_comments.py
-rw-r--r-- 329 submissions_2_parquet.sh
-rwxr-xr-x 4007 submissions_2_parquet_part1.py
-rw-r--r-- 1814 submissions_2_parquet_part2.py
-rw-r--r-- 2674 tf_reddit_comments.py

Community Data Science Collective || Want to submit a patch?