]> code.communitydata.science - mediawiki_dump_tools.git/blobdiff - wikiq_users/run_wikiq_users_cluster.sh
add spark program for running group by users
[mediawiki_dump_tools.git] / wikiq_users / run_wikiq_users_cluster.sh
diff --git a/wikiq_users/run_wikiq_users_cluster.sh b/wikiq_users/run_wikiq_users_cluster.sh
new file mode 100755 (executable)
index 0000000..beca0f9
--- /dev/null
@@ -0,0 +1,2 @@
+#!/usr/bin/env bash
+spark-submit --master  spark://n0649:18899 wikiq_users_spark.py --output-format parquet  -i "/com/output/wikiq-enwiki-20180301/enwiki-20180301-pages-meta-history*.tsv" -o  "/com/output/wikiq-users-enwiki-20180301-parquet/" --num-partitions 500

Community Data Science Collective || Want to submit a patch?