{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Julia with DataFrames and Queryverse on CoCalc\n", "\n", "## shell commands - one-time setup\n", "\n", "- open a .term file\n", "```\n", "JULIA_DEPOT_PATH=\"/home/user/qvtest\" julia-1\n", "... type \"]\" to enter pkg mode\n", "pkg> add DataFrames ... takes less than a minute\n", "pkg> add Queryverse ... takes 3 minutes\n", "```\n", "## Jupyter notebook... follow the steps below\n", "\n", "- select Julia 1.x Jupyter kernel\n", "\n", "Note that `using Queryverse` takes up to 10 minutes the first time, about 30 seconds after that\n", "\n", "## References\n", "\n", "- YouTube video [Intro to the Queryverse, a Julia data science stack | David Anthoff](https://www.youtube.com/watch?v=OFPNph-WxLM)\n", "- VegaLite example is from GitHub [VegaLite.jl](https://github.com/fredo-dedup/VegaLite.jl)\n", "\n", "\n", "\\- Hal Snyder" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "v\"1.0.3\"" ] }, "execution_count": 1, "metadata": { }, "output_type": "execute_result" } ], "source": [ "VERSION" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\"/home/user/qvtest\"" ] }, "execution_count": 2, "metadata": { }, "output_type": "execute_result" } ], "source": [ "DEPOT_PATH[1] = \"/home/user/qvtest\"" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "using Queryverse" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "

3 rows × 3 columns

nameagechildren
StringFloat64Int64
1John23.03
2Sally42.05
3Kirk59.02
" ] }, "execution_count": 4, "metadata": { }, "output_type": "execute_result" } ], "source": [ "df = DataFrame(name=[\"John\", \"Sally\", \"Kirk\"], age=[23., 42., 59.], children=[3,5,2])\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "

1 rows × 1 columns

Name
String
1sally
" ] }, "execution_count": 5, "metadata": { }, "output_type": "execute_result" } ], "source": [ "x = @from i in df begin\n", " @where i.age>30. && i.children > 2\n", " @select {Name=lowercase(i.name)}\n", " @collect DataFrame\n", "end" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "save(\"mydata.csv\", df)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "fhead (generic function with 2 methods)" ] }, "execution_count": 15, "metadata": { }, "output_type": "execute_result" } ], "source": [ "# display first few lines of a text file\n", "function fhead(fname, lines=4)\n", " open(fname) do file\n", " for i in enumerate(eachline(file))\n", " println(i[2])\n", " if i[1] > lines\n", " break\n", " end\n", " end\n", " end\n", "end" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"name\",\"age\",\"children\"\n", "\"John\",23.0,3\n", "\"Sally\",42.0,5\n", "\"Kirk\",59.0,2\n" ] } ], "source": [ "fhead(\"mydata.csv\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "using VegaLite, VegaDatasets" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: Some output was deleted.\n" ] } ], "source": [ "dataset(\"cars\") |>\n", "@vlplot(\n", " :point,\n", " x=:Horsepower,\n", " y=:Miles_per_Gallon,\n", " color=:Origin,\n", " width=400,\n", " height=400\n", ")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "VegaDatasets.VegaDataset" ] }, "execution_count": 29, "metadata": { }, "output_type": "execute_result" } ], "source": [ "cars = dataset(\"cars\");\n", "typeof(cars)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 41, "metadata": { }, "output_type": "execute_result" } ], "source": [ "# default number of rows when displaying DataFrame\n", "ENV[\"LINES\"] = 3" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "

406 rows × 9 columns

Miles_per_GallonCylindersOriginWeight_in_lbsDisplacementAccelerationNameYearHorsepower
Float64⍰Int64StringInt64Float64Float64StringStringInt64⍰
118.08USA3504307.012.0chevrolet chevelle malibu1970-01-01130
215.08USA3693350.011.5buick skylark 3201970-01-01165
318.08USA3436318.011.0plymouth satellite1970-01-01150
" ] }, "execution_count": 42, "metadata": { }, "output_type": "execute_result" } ], "source": [ "df = DataFrame(cars)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "

67 rows × 9 columns

Miles_per_GallonCylindersOriginWeight_in_lbsDisplacementAccelerationNameYearHorsepower
Float64⍰Int64StringInt64Float64Float64StringStringInt64⍰
115.08USA4341429.010.0ford galaxie 5001970-01-01198
214.08USA4354454.09.0chevrolet impala1970-01-01220
314.08USA4312440.08.5plymouth fury iii1970-01-01215
" ] }, "execution_count": 35, "metadata": { }, "output_type": "execute_result" } ], "source": [ "cars |>\n", " @filter(_.Origin==\"USA\" && _.Weight_in_lbs>4000) |> DataFrame" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ "cars |>\n", " @filter(_.Origin==\"USA\" && _.Weight_in_lbs>4000) |>\n", " save(\"us_heavy_cars.csv\")" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"Miles_per_Gallon\",\"Cylinders\",\"Origin\",\"Weight_in_lbs\",\"Displacement\",\"Acceleration\",\"Name\",\"Year\",\"Horsepower\"\n", "15.0,8,\"USA\",4341,429.0,10.0,\"ford galaxie 500\",\"1970-01-01\",198\n", "14.0,8,\"USA\",4354,454.0,9.0,\"chevrolet impala\",\"1970-01-01\",220\n", "14.0,8,\"USA\",4312,440.0,8.5,\"plymouth fury iii\",\"1970-01-01\",215\n", "14.0,8,\"USA\",4425,455.0,10.0,\"pontiac catalina\",\"1970-01-01\",225\n" ] } ], "source": [ "fhead(\"us_heavy_cars.csv\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# use Command line Julia and X11 mode for Voyager\n", "## commands\n", "1. open .x11 file in CoCalc\n", "1. in terminal pane (upper left), type the following\n", " ```\n", " JULIA_DEPOT_PATH=\"/home/user/qvtest\" julia-1\n", " ... in julia REPL\n", " using Queryverse\n", " using Vegalite, VegaDatasets\n", " cars = dataset(\"cars\");\n", " cars |> Voyager()\n", " ... wait for x11 pane to show data exploration GUI\n", " ... the X11 interface may be slow, depending on your ping time to CoCalc servers\n", " ```\n", "1. UI operations\n", " 1. Click in \"Data Voyager\" title bar to get pointer focus in that pane\n", " 1. In Fields menu, hover cursor over \"+\" to right of \"A Cylinders\" until it highlights, then drag into Encoding column, \"x\" value.\n", " 1. In Fields menu, hover cursor over \"+\" to right of \"# Horsepower\" until it highlights, then drag into Encoding column, \"y\" value.\n", " 1. Observe display of Horsepower vs. Cylinders.\n", "1. screen capture\n", "\n", " \n", "\n", "1. Watch the YouTube video *Intro to the Queryverse, a Julia data science stack by David Anthoff*\n", "\n", " " ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "collapsed": false }, "outputs": [ ], "source": [ ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.0", "language": "julia", "name": "julia-1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.0.3" } }, "nbformat": 4, "nbformat_minor": 0 }