]> code.communitydata.science - cdsc_reddit.git/commitdiff
Rename spark script to reflect that it is for comments.
authorNate E TeBlunthuis <nathante@n2347.hyak.local>
Fri, 3 Jul 2020 21:00:36 +0000 (14:00 -0700)
committerNate E TeBlunthuis <nathante@n2347.hyak.local>
Fri, 3 Jul 2020 21:00:36 +0000 (14:00 -0700)
comments_2_parquet.py [moved from reddit_bz2_2parquet.py with 98% similarity]

similarity index 98%
rename from reddit_bz2_2parquet.py
rename to comments_2_parquet.py
index 93c3d45c9ed59cb6b7345bb7fdbb4d5f59bf762a..3042f58efae5e4d0545fb7731fccdea0499c23ed 100755 (executable)
@@ -6,7 +6,7 @@ from pyspark.sql.types import *
 from pyspark import SparkConf, SparkContext
 from pyspark.sql import SparkSession, SQLContext
 
-conf = SparkConf().setAppName("Reddit to bz2")
+conf = SparkConf().setAppName("Reddit comments to parquet")
 conf = conf.set('spark.sql.crossJoin.enabled',"true")
 
 spark = SparkSession.builder.getOrCreate()

Community Data Science Collective || Want to submit a patch?