Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 301
Kernel: Python 3 (system-wide)

PySpark on CoCalc

Run Spark locally to learn about its API...

import sys sys.version
'3.6.9 (default, Apr 18 2020, 01:56:04) \n[GCC 8.4.0]'
import os, sys os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3' os.environ['SPARK_HOME'] = '/ext/spark/default' os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-1.8.0-openjdk-amd64' sys.path.insert(0, os.environ['SPARK_HOME'] + '/python') import pyspark pyspark.__version__
'2.4.5'
sc = pyspark.SparkContext('local')
sc.range(100).filter(lambda x : (x+1) % 7 == 0).collect()
[6, 13, 20, 27, 34, 41, 48, 55, 62, 69, 76, 83, 90, 97]
fn = 'spark-data.txt'
%%sh cat <<EOF > 'spark-data.txt' 9 2 3 -12 49 2
! cat $fn
9 2 3 -12 49 2
sc.textFile(fn).map(int).sum()
53