Description: This is a sageMath Worksheet accompaniment for the Apache Spark databricks notebook in Scala for understanding the K-Means clustering on a small sample of the 1M Songs data used in the course on Scalable Data Engineering Science available freely from https://lamastex.github.io/scalable-data-science/sds/2/x/

Author: Raazesh Sainudiin

Views : 3

This is a support notebook in sageMath (actually a Worksheet) for Scalable Data Science Course. It is mostly used as a visual cognitive tool.

SageMath is perhaps the largest open-source effort to do mathematical computing and you can use it for serious mathematical computing:

See

- http://doc.sagemath.org/html/en/index.html and
- FAQ at http://doc.sagemath.org/html/en/faq/index.html for why you might want to use SageMath for your own research (it is Python-based).
- Finally COCalc - this worksheet runs on is free for light workloads, so you can do your hoeworks, research, collaborate in social media with your colleagues, etc here.

For relevant plotting we will do now see docs here: https://doc.sagemath.org/html/en/reference/plot3d/sage/plot/plot3d/shapes2.html

And 3D interactive visualization possibilities here: http://sagemath.wikispaces.com/point3d http://sagemath.wikispaces.com/plot3d (see 10 minutes long YouTube video in the link).

3D rendering not yet implemented

3D rendering not yet implemented

3D rendering not yet implemented

See https://ask.sagemath.org/question/9393/how-to-plot-data-from-a-file/.

The file has been downloaded from the display in the databricks notebook from https://lamastex.github.io/scalable-data-science/sds/2/2/.

The first 10 lines of the file looks like this:

```
prediction,loudness,tempo,log_duration
0,-11.422,113.924,5.715779455566171
1,-9.086,149.709,5.128421707524712
1,-12.934,134.957,5.246686488869869
1,-6.552,130.152,5.700695916131519
0,-8.849,96.006,5.337347484616152
0,-20.277,100.777,4.290263120508892
0,-7.877,109.267,5.0043102665494
0,-5.989,114.493,5.630042029827832
0,-11.66,125.022,6.007690245599277
```

There are 1000 rows in the file that has been uploaded to this sageMath Worksheet in COCALC. This file is in the current directory with the path in the Python `open()`

function below.

[(-11.422, 113.924), (-8.849, 96.006), (-20.277, 100.777), (-7.877, 109.267), (-5.989, 114.493), (-11.66, 125.022), (-14.1, 90.442)]

(-52.781, 76.42, 6.292396371381782)
(-32.349, 134.157, 5.87741203189931)

(-3.069, 95.323, 5.554721829705378)
(-2.385, 209.986, 5.201678098897631)

3D rendering not yet implemented

3D rendering not yet implemented

To manipulate the rendering of the interactive 3D Plot above uncomment and put the cursor after the '.' and hit TAB to see methods

Also don't forget sageMath docs http://doc.sagemath.org/html/en/index.html (sage has arithmetic, geometry, cryptography, calculus, and a lot lot more - finally COALC is free for small learning workloads).

/ext/sage/sage-8.0/local/lib/python2.7/site-packages/urllib3/contrib/pyopenssl.py:46: DeprecationWarning: OpenSSL.rand is deprecated - you should use os.urandom instead
import OpenSSL.SSL