A tale of two packages, that is a parable of two perspectives….
A short (maybe misinformed) history
Data visualization and plotting in Julia has had a bit of a mottled history1. Unlike the scientific Python community which had an early, publication quality incumbent like matplotlib, Julia visualization development has been a bit more ad-hoc and grassroots.
Initially (I believe) in the early days, the first libraries were simply wrappers to existing libraries2 since, on the whole, the Julia ecosystem was very young. In this pre Cambrian Explosion world PyPlot.jl ruled, and being an interface to
matplotlib, it could do everything the Pythonistas could do. However, as things progressed and other essential statistical/mathematical packages matured, space opened up to build out the visualization side of the package ecosystem.
But the ease of creating and distributing packages in Julia—combined with the rapidly growing enthusiastic Julia community—led to a mulitplicity of new libraries. Many of these were created to fill a scientific visualization niche3 and still do. But with this sort of proliferation, for common/general visualization tasks there was naturally some overlap and it was confusing/overwhelming as a newcomer to know which package is best suited for your goals (if you don’t fall into one of those exisitng niches3).
In this post we will look at (what seem to be) the two frontrunners for general data visualization and plotting. Each package has a unique approach and I will highlight things to consider in general when you are considering data visualization. We will do this by recreating a simple enough example that is easy to understand quickly yet complex enough to actually highlight the differences between these two libraries.
julia> ] # enter Pkg REPL (@v1.4) pkg> activate . (data-dailies) pkg> add Plots, Gadfly
using CSV, HTTP, DataFrames, Dates url = "https://covidtracking.com/api/v1/states/ca/daily.csv" res = HTTP.request("GET", url).body columns = [:date, :totalTestResultsIncrease] fmt = "yyyymmdd" t = Dict(:date=>Date) data = sort(CSV.read(res; dateformat=fmt, select=columns, types=t)) head(data)
6×2 DataFrames.DataFrame │ Row │ date │ totalTestResultsIncrease │ │ │ Dates.Date │ Int64 │ ├─────┼────────────┼──────────────────────────┤ │ 1 │ 2020-03-04 │ 0 │ │ 2 │ 2020-03-05 │ 0 │ │ 3 │ 2020-03-06 │ 7 │ │ 4 │ 2020-03-07 │ 9 │ │ 5 │ 2020-03-08 │ 19 │ │ 6 │ 2020-03-09 │ 254 │
Plots.jl was developed as a common interface for many of the various existing Julia visualization packages and provides a huge convenience for end-users. Instead of needing to rewrite a visualization for every backend you wanted to render to, now you could develop and fine tune a graphic once using the Plots.jl API and optionally export it to any of the supported environments4.
But what you gain in flexibility of output, you necessarily have to give up in expressivity of API. Since Plots.jl needs to interface to a variety of backends, the API is something of the average of all the backend APIs. And most scientific visualization APIs can trace their lineage back to the MATLAB plotting tradition (which is also true of the
While not intrinsically good or bad, know that this level of abstraction is really designed for rapidly creating scientific visualizations of tabular data (and often numeric data from experiments). As such, it should feel very easy and natural to use if you have….. mostly numeric tabular data that you want to visualize with conventional plots5.
using Plots, RollingFunctions # plot daily test increase as bars/sticks Plots.plot(data.date, data.totalTestResultsIncrease, seriestype=:sticks, label="Test Increase", title = "California Total Testing Capacity", lw = 2) # compute the 7-day average window = 7 average = rollmean(data.totalTestResultsIncrease, window) # to add another series we mutate the existing plot Plots.plot!(data.date, cat(zeros(window - 1), average, dims=1), label="7-day Average", lw=3)
Gadfly (and the Grammar of Graphics)
While Plots.jl API is a bit more high level than Gadfly.jl, it is much less expressive. I like to conceptualize Plots.jl as a plotting framework that enables customization (in the convention over configuration sense) where as Gadfly.jl is a library that provides you with visualization building blocks6. This distinction can be further analogized to the general difference between libraries and frameworks.
using Gadfly labels = ["Test Increase", "7-day Average"] colors = ["deepskyblue", "tomato"] # Gadfly can work with DataFrames directly p = Gadfly.plot(data, layer( x=:date, y=:totalTestResultsIncrease, Geom.hair, Theme(line_width=1.5pt) ), layer( x=:date, y=:totalTestResultsIncrease, Geom.line, Stat.smooth(method=:loess, smoothing=.15), Theme( default_color=colors, line_width=2pt ) ), Guide.xlabel("Date"), Guide.ylabel(labels), Guide.title("California Total Testing Capacity"), Guide.manual_color_key("", labels, colors), Theme(background_color="white") )
As you can see by comparing the Plots.jl and Gadfly.jl examples, even though the Plots.jl API on the whole is a bit more succinct, a composition of Gadfly.jl geometries is much more flexible than the Plots.jl series types.
|Package||Use if you want||Weakness||Most Similar|
|Plots.jl||Multiple backends||High level but inflexible API||matplotlib|
|Gadfly.jl||A declarative Grammar of Graphics API||No built-in interactivity||ggplot2|
|VegaLite.jl||Web based interactivity||Not designed for non-web environments (like PDFs)||Altair|
References and Extras
- JuliaPy. PyPlot.jl: Plotting for Julia based on matplotlib.pyplot. Accessed on: June 5, 2020.
- Christof Stocker. UnicodePlots.jl: Scientific plotting for working in the terminal. Accessed on: June 5, 2020.
- Tom Breloff. Plots.jl: A common API for Julia Visualization. Accessed on: June 5, 2020.
- Daniel C. Jones. Gadfly.jl: A Grammar of Graphics for Julia. Accessed on: June 5, 2020.
- David Anthoff. VegaLite.jl: Julia bindings to Vega-Lite. Accessed on: June 5, 2020.
- Leland Wilkinson. The Grammar of Graphics. Statistics and Computing. 1999.
- Hadley Wickham. A layered grammar of graphics. Journal of Computational and Graphical Statistics. 2010.
Writing a visualization library is far from a trivial task…. ↩︎
a GUI for rapid develoment, HTML/JS for interactive web plots, PDF for publications, etc. ↩︎