]> code.communitydata.science - cdsc_reddit.git/blobdiff - comments_2_parquet.py
Rename spark script to reflect that it is for comments.
[cdsc_reddit.git] / comments_2_parquet.py
similarity index 98%
rename from reddit_bz2_2parquet.py
rename to comments_2_parquet.py
index 93c3d45c9ed59cb6b7345bb7fdbb4d5f59bf762a..3042f58efae5e4d0545fb7731fccdea0499c23ed 100755 (executable)
@@ -6,7 +6,7 @@ from pyspark.sql.types import *
 from pyspark import SparkConf, SparkContext
 from pyspark.sql import SparkSession, SQLContext
 
-conf = SparkConf().setAppName("Reddit to bz2")
+conf = SparkConf().setAppName("Reddit comments to parquet")
 conf = conf.set('spark.sql.crossJoin.enabled',"true")
 
 spark = SparkSession.builder.getOrCreate()

Community Data Science Collective || Want to submit a patch?