{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Julia with DataFrames and Queryverse on CoCalc\n",
"\n",
"## shell commands - one-time setup\n",
"\n",
"- open a .term file\n",
"```\n",
"JULIA_DEPOT_PATH=\"/home/user/qvtest\" julia-1\n",
"... type \"]\" to enter pkg mode\n",
"pkg> add DataFrames ... takes less than a minute\n",
"pkg> add Queryverse ... takes 3 minutes\n",
"```\n",
"## Jupyter notebook... follow the steps below\n",
"\n",
"- select Julia 1.x Jupyter kernel\n",
"\n",
"Note that `using Queryverse` takes up to 10 minutes the first time, about 30 seconds after that\n",
"\n",
"## References\n",
"\n",
"- YouTube video [Intro to the Queryverse, a Julia data science stack | David Anthoff](https://www.youtube.com/watch?v=OFPNph-WxLM)\n",
"- VegaLite example is from GitHub [VegaLite.jl](https://github.com/fredo-dedup/VegaLite.jl)\n",
"\n",
"\n",
"\\- Hal Snyder"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"v\"1.0.3\""
]
},
"execution_count": 1,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"VERSION"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\"/home/user/qvtest\""
]
},
"execution_count": 2,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"DEPOT_PATH[1] = \"/home/user/qvtest\""
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"using Queryverse"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
| name | age | children |
---|
| String | Float64 | Int64 |
---|
3 rows × 3 columns
1 | John | 23.0 | 3 |
---|
2 | Sally | 42.0 | 5 |
---|
3 | Kirk | 59.0 | 2 |
---|
"
]
},
"execution_count": 4,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"df = DataFrame(name=[\"John\", \"Sally\", \"Kirk\"], age=[23., 42., 59.], children=[3,5,2])\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
" | Name |
---|
| String |
---|
1 rows × 1 columns
1 | sally |
---|
"
]
},
"execution_count": 5,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"x = @from i in df begin\n",
" @where i.age>30. && i.children > 2\n",
" @select {Name=lowercase(i.name)}\n",
" @collect DataFrame\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"save(\"mydata.csv\", df)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"fhead (generic function with 2 methods)"
]
},
"execution_count": 15,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"# display first few lines of a text file\n",
"function fhead(fname, lines=4)\n",
" open(fname) do file\n",
" for i in enumerate(eachline(file))\n",
" println(i[2])\n",
" if i[1] > lines\n",
" break\n",
" end\n",
" end\n",
" end\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\"name\",\"age\",\"children\"\n",
"\"John\",23.0,3\n",
"\"Sally\",42.0,5\n",
"\"Kirk\",59.0,2\n"
]
}
],
"source": [
"fhead(\"mydata.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"using VegaLite, VegaDatasets"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING: Some output was deleted.\n"
]
}
],
"source": [
"dataset(\"cars\") |>\n",
"@vlplot(\n",
" :point,\n",
" x=:Horsepower,\n",
" y=:Miles_per_Gallon,\n",
" color=:Origin,\n",
" width=400,\n",
" height=400\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"VegaDatasets.VegaDataset"
]
},
"execution_count": 29,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"cars = dataset(\"cars\");\n",
"typeof(cars)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 41,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"# default number of rows when displaying DataFrame\n",
"ENV[\"LINES\"] = 3"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
" | Miles_per_Gallon | Cylinders | Origin | Weight_in_lbs | Displacement | Acceleration | Name | Year | Horsepower |
---|
| Float64⍰ | Int64 | String | Int64 | Float64 | Float64 | String | String | Int64⍰ |
---|
406 rows × 9 columns
1 | 18.0 | 8 | USA | 3504 | 307.0 | 12.0 | chevrolet chevelle malibu | 1970-01-01 | 130 |
---|
2 | 15.0 | 8 | USA | 3693 | 350.0 | 11.5 | buick skylark 320 | 1970-01-01 | 165 |
---|
3 | 18.0 | 8 | USA | 3436 | 318.0 | 11.0 | plymouth satellite | 1970-01-01 | 150 |
---|
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
---|
"
]
},
"execution_count": 42,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"df = DataFrame(cars)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
" | Miles_per_Gallon | Cylinders | Origin | Weight_in_lbs | Displacement | Acceleration | Name | Year | Horsepower |
---|
| Float64⍰ | Int64 | String | Int64 | Float64 | Float64 | String | String | Int64⍰ |
---|
67 rows × 9 columns
1 | 15.0 | 8 | USA | 4341 | 429.0 | 10.0 | ford galaxie 500 | 1970-01-01 | 198 |
---|
2 | 14.0 | 8 | USA | 4354 | 454.0 | 9.0 | chevrolet impala | 1970-01-01 | 220 |
---|
3 | 14.0 | 8 | USA | 4312 | 440.0 | 8.5 | plymouth fury iii | 1970-01-01 | 215 |
---|
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
---|
"
]
},
"execution_count": 35,
"metadata": {
},
"output_type": "execute_result"
}
],
"source": [
"cars |>\n",
" @filter(_.Origin==\"USA\" && _.Weight_in_lbs>4000) |> DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
"cars |>\n",
" @filter(_.Origin==\"USA\" && _.Weight_in_lbs>4000) |>\n",
" save(\"us_heavy_cars.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\"Miles_per_Gallon\",\"Cylinders\",\"Origin\",\"Weight_in_lbs\",\"Displacement\",\"Acceleration\",\"Name\",\"Year\",\"Horsepower\"\n",
"15.0,8,\"USA\",4341,429.0,10.0,\"ford galaxie 500\",\"1970-01-01\",198\n",
"14.0,8,\"USA\",4354,454.0,9.0,\"chevrolet impala\",\"1970-01-01\",220\n",
"14.0,8,\"USA\",4312,440.0,8.5,\"plymouth fury iii\",\"1970-01-01\",215\n",
"14.0,8,\"USA\",4425,455.0,10.0,\"pontiac catalina\",\"1970-01-01\",225\n"
]
}
],
"source": [
"fhead(\"us_heavy_cars.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# use Command line Julia and X11 mode for Voyager\n",
"## commands\n",
"1. open .x11 file in CoCalc\n",
"1. in terminal pane (upper left), type the following\n",
" ```\n",
" JULIA_DEPOT_PATH=\"/home/user/qvtest\" julia-1\n",
" ... in julia REPL\n",
" using Queryverse\n",
" using Vegalite, VegaDatasets\n",
" cars = dataset(\"cars\");\n",
" cars |> Voyager()\n",
" ... wait for x11 pane to show data exploration GUI\n",
" ... the X11 interface may be slow, depending on your ping time to CoCalc servers\n",
" ```\n",
"1. UI operations\n",
" 1. Click in \"Data Voyager\" title bar to get pointer focus in that pane\n",
" 1. In Fields menu, hover cursor over \"+\" to right of \"A Cylinders\" until it highlights, then drag into Encoding column, \"x\" value.\n",
" 1. In Fields menu, hover cursor over \"+\" to right of \"# Horsepower\" until it highlights, then drag into Encoding column, \"y\" value.\n",
" 1. Observe display of Horsepower vs. Cylinders.\n",
"1. screen capture\n",
"\n",
"
\n",
"\n",
"1. Watch the YouTube video *Intro to the Queryverse, a Julia data science stack by David Anthoff*\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [
],
"source": [
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.0",
"language": "julia",
"name": "julia-1"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.0.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}