notebooks and related material from my PYPTUG talk:
There are many visualization packages available out there, each best suited to specific scenarios. In the past several years, I've covered Matplotlib, Seaborn, Vincent, ggplot2, 3d visualizations through matplotlib, D3.js, mpld3 and Bokeh and mentioned a few more.
In this presentation we will cover plotly (for javascript, R, Python and more) and related packages (dash, cufflinks) and when it makes sense to use it.
Francois Dion is the founder and Chief Data Scientist of Dion Research LLC, a firm specializing in analytics, data science, IoT and visualization.
He is the author of several open source software, such as stemgraphic, the founder of the Python user group for the Piedmont Triad of North Carolina PYPTUG and mentors various groups in Python, R and analytics at large. You might have run across his multiple part series on LinkedIn on data science books including part V on Visualization.
Everything here assumes python 3. Python 2 is untested.
notebooks require jupyter notebook to be installed. They also require matplotlib, numpy, pandas, plotly and cufflinks.
Web applications (plotly_01, dash_01 to 04, gapminder) will also need Flask, pandas_datareader, dash, dash-core-components, dash-html-components and dash-renderer.
To install all of these:
pip install -r requirements.txt
.
├── dash_01.py # Very simple Dash application
├── dash_02 # Combining traditional Flask app with Dash app
│ ├── composite.py
│ ├── static
│ └── templates
│ └── main.html
├── dash_03.py # callbacks, inputs and output
├── dash_04.py # a way around to have raw html
├── data
│ ├── gapminder.csv # for the gapminder app
│ ├── midwest.csv # for the notebooks
│ └── school_earnings.csv # for the notebooks
├── F_Dion_Gapminder_revisited.ogv # Video. See Gapminder Revisited section below
├── gapminder.py # Hans Rosling's 2006 TEDx visualization
├── LICENSE
├── notebooks # Start here, with the notebooks
│ ├── 01-table_basic.ipynb
│ ├── 02-cufflink.ipynb
│ ├── 03-more_cufflinks.ipynb
│ └── 04-mpl_box_and_box.ipynb
├── plotly_01 # Plotly + Flask
│ ├── app.py
│ └── templates
│ └── main.html
└── README.md
For the notebooks, simply launch jupyter notebook from this folder, go into the notebooks folder and open them.
For the web apps, execute:
python app.py
Replace app.py with each file (ie. dash_01.py). Direct your browser to http://localhost:8050 for Dash apps and to http://localhost:5000 for Flask apps.
In 2006, Hans Rosling (who passed away in February of 2017 - listen to datastori.es for a whole podcast on his legacy), gave a TEDx talk that fundamentally changed how people viewed visualization and the communication of statistics.
In this talk (first 5 minutes), he presented his gapminder software, where he compared in a bubble chart, longevity vs fertility rate, for a large number of countries. The size of each bubble represents the population of the country.
I gathered the data he used (see the data folder) and in the gapminder.py program, I filtered the years to be between 1964 and 2003, just as in the original TEDx talk. On the plotly site there is also a gapminder visualization, but it is a different comparison, of GDP and life expectancy.
My gapminder version is a relatively simple dash application, with a year slider at the bottom. In the original presentation, the sequence of years was on autoplay. Here in the below video, I am dragging the slider from 1964 to 2003. Notice what happens in 1994. Due to autoscale, the visual representation changes, which caused me to stop and hover the country that appeared to be out of place.
More on this after the video (click on the image below)
So what happened in 1994?
If you need a hint: Rwanda. Civil war and genocide. Suddenly, this visualization has a lot more meaning. All because of the interactivity. A reminder that behind statistics, there are people.
A few links shared during the presentation:
- What's going on with this graph?
- anaconda
- Flask
- Jupyter notebook
- pandas
- Classic visualization remakes
- dash
- cufflinks
- plotly
- Napoleon's March on Moscow, plotly style
See also my LinkedIn article: There are humans behind those stats