# Julia with DataFrames and Queryverse on CoCalc

## shell commands - one-time setup

- open a .term file
```
JULIA_DEPOT_PATH="/home/user/qvtest" julia-1
... type "]" to enter pkg mode
pkg> add DataFrames ... takes less than a minute
pkg> add Queryverse ... takes 3 minutes
```
## Jupyter notebook... follow the steps below

- select Julia 1.x Jupyter kernel

Note that `using Queryverse` takes up to 10 minutes the first time, about 30 seconds after that

## References

- YouTube video [Intro to the Queryverse, a Julia data science stack | David Anthoff](https://www.youtube.com/watch?v=OFPNph-WxLM)
- VegaLite example is from GitHub [VegaLite.jl](https://github.com/fredo-dedup/VegaLite.jl)


\- Hal Snyder

In [1]:
VERSION

v"1.0.3"

In [2]:
DEPOT_PATH[1] = "/home/user/qvtest"

"/home/user/qvtest"

In [43]:
using Queryverse

In [4]:
df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])


Unnamed: 0_level_0,name,age,children
Unnamed: 0_level_1,String,Float64,Int64
1,John,23.0,3
2,Sally,42.0,5
3,Kirk,59.0,2


In [5]:
x = @from i in df begin
    @where i.age>30. && i.children > 2
    @select {Name=lowercase(i.name)}
    @collect DataFrame
end

Unnamed: 0_level_0,Name
Unnamed: 0_level_1,String
1,sally


In [6]:
save("mydata.csv", df)

In [15]:
# display first few lines of a text file
function fhead(fname, lines=4)
    open(fname) do file
        for i in enumerate(eachline(file))
            println(i[2])
            if i[1] > lines
                break
            end
        end
    end
end

fhead (generic function with 2 methods)

In [14]:
fhead("mydata.csv")

"name","age","children"
"John",23.0,3
"Sally",42.0,5
"Kirk",59.0,2


In [16]:
using VegaLite, VegaDatasets

In [17]:
dataset("cars") |>
@vlplot(
    :point,
    x=:Horsepower,
    y=:Miles_per_Gallon,
    color=:Origin,
    width=400,
    height=400
)



In [29]:
cars = dataset("cars");
typeof(cars)

VegaDatasets.VegaDataset

In [41]:
# default number of rows when displaying DataFrame
ENV["LINES"] = 3

3

In [42]:
df = DataFrame(cars)

Unnamed: 0_level_0,Miles_per_Gallon,Cylinders,Origin,Weight_in_lbs,Displacement,Acceleration,Name,Year,Horsepower
Unnamed: 0_level_1,Float64⍰,Int64,String,Int64,Float64,Float64,String,String,Int64⍰
1,18.0,8,USA,3504,307.0,12.0,chevrolet chevelle malibu,1970-01-01,130
2,15.0,8,USA,3693,350.0,11.5,buick skylark 320,1970-01-01,165
3,18.0,8,USA,3436,318.0,11.0,plymouth satellite,1970-01-01,150
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮


In [35]:
cars |>
    @filter(_.Origin=="USA" && _.Weight_in_lbs>4000) |> DataFrame

Unnamed: 0_level_0,Miles_per_Gallon,Cylinders,Origin,Weight_in_lbs,Displacement,Acceleration,Name,Year,Horsepower
Unnamed: 0_level_1,Float64⍰,Int64,String,Int64,Float64,Float64,String,String,Int64⍰
1,15.0,8,USA,4341,429.0,10.0,ford galaxie 500,1970-01-01,198
2,14.0,8,USA,4354,454.0,9.0,chevrolet impala,1970-01-01,220
3,14.0,8,USA,4312,440.0,8.5,plymouth fury iii,1970-01-01,215
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮


In [37]:
cars |>
    @filter(_.Origin=="USA" && _.Weight_in_lbs>4000) |>
    save("us_heavy_cars.csv")

In [38]:
fhead("us_heavy_cars.csv")

"Miles_per_Gallon","Cylinders","Origin","Weight_in_lbs","Displacement","Acceleration","Name","Year","Horsepower"
15.0,8,"USA",4341,429.0,10.0,"ford galaxie 500","1970-01-01",198
14.0,8,"USA",4354,454.0,9.0,"chevrolet impala","1970-01-01",220
14.0,8,"USA",4312,440.0,8.5,"plymouth fury iii","1970-01-01",215
14.0,8,"USA",4425,455.0,10.0,"pontiac catalina","1970-01-01",225


# use Command line Julia and X11 mode for Voyager
## commands
1. open .x11 file in CoCalc
1. in terminal pane (upper left), type the following
    ```
    JULIA_DEPOT_PATH="/home/user/qvtest" julia-1
    ... in julia REPL
    using Queryverse
    using Vegalite, VegaDatasets
    cars = dataset("cars");
    cars |> Voyager()
    ... wait for x11 pane to show data exploration GUI
    ... the X11 interface may be slow, depending on your ping time to CoCalc servers
    ```
1. UI operations
    1. Click in "Data Voyager" title bar to get pointer focus in that pane
    1. In Fields menu, hover cursor over "+" to right of "A Cylinders" until it highlights, then drag into Encoding column, "x" value.
    1. In Fields menu, hover cursor over "+" to right of "# Horsepower" until it highlights, then drag into Encoding column, "y" value.
    1. Observe display of Horsepower vs. Cylinders.
1. screen capture

    <img src="voyager-cars.png" width=90%>

1. Watch the YouTube video *Intro to the Queryverse, a Julia data science stack by David Anthoff*

    <iframe width="560" height="315" src="https://www.youtube.com/embed/OFPNph-WxLM" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>