cdsc_reddit.git
11 days ago Nathan TeBlunthuisrefactor visualization code. master
11 days ago Nathan TeBlunthuisMerge remote-tracking branch 'refs/remotes/origin/maste... synced/master
11 days ago Nathan TeBlunthuisgit-annex in nathante@nate-x1:~/cdsc_reddit
11 days ago Nate E TeBlunthuisgit-annex in nathante@mox2.hyak.local:/gscratch/comdata...
11 days ago Nate E TeBlunthuisUpdate code for clustering + tsne.
11 days ago Nate E TeBlunthuisUpdate code for building simlarity matrices.
2020-11-12 Nate E TeBlunthuisbugfix in completing tfidf similarity matrices.
2020-11-12 Nate E TeBlunthuisincrease learning rate.
2020-11-12 Nate E TeBlunthuisincrease iterations and perplectity and early_exaggeration
2020-11-12 Nate E TeBlunthuisincrease learning rate
2020-11-12 Nate E TeBlunthuisFix bug in tsne.
2020-11-12 Nate E TeBlunthuisgit-annex in nathante@mox2.hyak.local:/gscratch/comdata...
2020-11-12 Nate E TeBlunthuissplit fitting and plotting tsne.
2020-11-12 Nathan TeBlunthuisAdd file to plot related subreddits using tsne.
2020-11-10 Nate E TeBlunthuisBugfix (typo)
2020-11-10 Nate E TeBlunthuisReuse code for term and author cosine similarity.
2020-11-10 Nate E TeBlunthuisRefactor tfidf code to for code resuse.
2020-11-10 Nate E TeBlunthuisrename 'idf' files to 'tfidf'
2020-11-10 Nate E TeBlunthuisImprovements to idf code
2020-11-02 Nate E TeBlunthuisMerge branch 'master' of code:cdsc_reddit
2020-11-02 Nate E TeBlunthuisadd term_cosine_similarity.py
2020-11-02 Nathan TeBlunthuisAdd Cosine similarities to README.md
2020-11-02 Nathan TeBlunthuisUpdate Readme.
2020-11-02 Nathan TeBlunthuisMerge branch 'master' of code:cdsc_reddit into master
2020-11-02 Nathan TeBlunthuisCreate README.md
2020-10-03 Nate E TeBlunthuisUpdate reddit comments data with daily dumps.
2020-08-23 Nate E TeBlunthuisCompute IDF for terms and authors.
2020-08-12 Nate E TeBlunthuisUpdate submissions to parse using the backfill queue.
2020-08-11 Nate E TeBlunthuisbugfix in checking submission shas
2020-08-10 Nate E TeBlunthuisUse multiword expressions in tf.
2020-08-10 Nate E TeBlunthuisFinish generating multiword expressions.
2020-08-09 Nate E TeBlunthuisBugfix
2020-08-09 Nate E TeBlunthuisUse groupby - joins instead of windows
2020-08-04 Nate E TeBlunthuisrenamte tf_comments part 2.
2020-08-04 Nate E TeBlunthuisrename tf_reddit_comments.py step1.
2020-08-04 Nate E TeBlunthuisImprove tokenization following data. Generate author...
2020-08-04 Nate E TeBlunthuisimprove tokenizer.
2020-08-04 Nate E TeBlunthuisTF reddit comments.
2020-08-04 Nate E TeBlunthuiscode to sort tf
2020-07-10 Nate E TeBlunthuisremove is_submitter field from submissions which doesn...
2020-07-08 Nate E TeBlunthuisBugfixes in scripts.
2020-07-07 Nate E TeBlunthuisclean up comments in streaming example.
2020-07-07 Nate E TeBlunthuisupdate .gitignore
2020-07-07 Nate E TeBlunthuisupdate examples with working streaming
2020-07-07 Nate E TeBlunthuisBuild comments dataset similarly to submissions and...
2020-07-07 Nate E TeBlunthuisupdate .gitignore
2020-07-07 Nate E TeBlunthuisScript for example of streaming pyarrow.
2020-07-07 Nate E TeBlunthuisScript to demonstrate reading parquet.
2020-07-07 Nate E TeBlunthuisCheck the shas when we download dumps
2020-07-07 Nate E TeBlunthuisScript to run both parts of submissions_2_parquet.sh
2020-07-07 Nate E TeBlunthuisCache before sorting so we don't extract twice.
2020-07-07 Nate E TeBlunthuisMove the spark part of submissions_2_parquet to a separ...
2020-07-06 Nate E TeBlunthuisFix whitespace at top of file.
2020-07-06 Nate E TeBlunthuisSecondary sort for the by_author dataset should be...
2020-07-06 Nate E TeBlunthuisCreate a second dataset sorted by author.
2020-07-06 Nate E TeBlunthuisCreate parquet datasets of reddit submissions from...
2020-07-03 Nate E TeBlunthuisRename spark script to reflect that it is for comments.
2020-07-03 Nate E TeBlunthuisupdate .gitignore
2020-07-03 Nate E TeBlunthuisbugfix in retrieving old data and rename file.
2020-07-03 Nate E TeBlunthuisScript for checking shas for submissions.
2020-07-03 Nate E TeBlunthuisBugfix: use timestamp types
2020-07-03 Nate E TeBlunthuisupdate the reddit comment dumps
2020-07-03 Nate E TeBlunthuisDon't clobber old dumps so that we can just download...
2020-07-03 Nate E TeBlunthuisscript for getting submissions dumps from pushshift.
2020-07-02 Nate E TeBlunthuisExtract variables from pushshift comment to parquet

Community Data Science Collective || Want to submit a patch?