Covid Bar Chart Race in 16 lines of Python

Brian Dorricott
3 min readAug 12, 2021
Covid Bar Chart Race Example

Play the following video to see how Bar Chart Race works. It was created with just 16 lines of code and a little post processing to add the explanatory text.

Covid Bar Chart Race Video

Pre-requisites

You’ll need to have a couple of things installed to make this work. Install the python library “bar_char_race”. Use pip or conda to install it:

pip install bar_chart_race
conda install -c conda-forge bar_chart_race

And to make the video, you’ll need the video library FFmpeg which you can install by following the instructions at the official FFmpeg site: FFmpeg

Getting the data

First there was the question of the data. After some searching I found a list of new cases per day, per Australian state on https://www.covid19data.com.au. Here is a sample.

Date,NSW,VIC,QLD,SA,WA,TAS,NT,ACT
:::
30/03/2020,127,54,33,6,44,3,0,1
31/03/2020,114,88,55,32,9,0,4,2
1/04/2020,150,58,40,30,28,2,0,5
2/04/2020,116,63,57,18,8,3,3,4
3/04/2020,91,59,39,11,22,6,4,4
4/04/2020,104,22,27,11,14,2,0,2
:::

Preparing the data

Let’s load the raw data into a pandas array and check we get what we expected. Here’s are first code segment for 2 lines:

import pandas as pd
import bar_chart_race as bcr
df = pd.read_csv("cases_daily_state.csv")
df.sample(5)
Raw data from https://www.covid19data.com.au/states-and-territories

Great. The output is what we expected. But it is not in the format that our bar chart race function requires. So let’s do a bit of processing:

corona = pd.DataFrame()
days = 14
for i in range(0,df.shape[0],days):
y = df.iloc[i:i+days]
a = y.sum()
a["Date"] = y.iloc[-1]["Date"]
corona = corona.append(a, ignore_index=True)

corona=corona[0:-1]
corona.set_index("Date", inplace=True)
corona.index = pd.to_datetime(corona.index, format='%d/%m/%Y')
corona.sample(5)

Here I start with an empty DataFrame and append rows of summarised data with the last date for that data. “days” allows us to choose the resolution (1 day, 7 days, 14 days, etc.). Once our DataFrame is complete, we need to remove the last row since it refers to an incomplete sample, then create an index with the date. After processing we have the following DataFrame:

After processing and ready for Bar Chart Race function

Creating the video

So now we are ready to create our chart. This is done with bar_chart_race and you can see the parameters below in our last segment of code. Things to note (a) I’ve used a function to display the total number of cases found each period; (b) use a combination of figsize/dpi to produce a video of the size required — this is large so it can be posted on social media; (c ) in this case, the output is to a video file — miss out the filename and, if you are using a jupyter notebook, it will be displayed in the notebook itself; and (d) this function can take a long time to run — think minutes not seconds!

Have a play with the “days” parameter to see different video lengths and levels of detail.

# State colours from https://en.wikipedia.org/wiki/Australian_state_and_territory_colours
# NSW 87CEEB
# VIC 000080
# QLD 800000
# SA FF0000
# WA FFD700
# TAS 006A4E
# NT 000000
# ACT 00008B
colours = ["#87CEEB", "#000080", "#800000", "#ff0000", "#ffd700", "#006A4E", "#000000", "#00008B"]def summary(values, ranks):
total_cases = values.sum()
s = f'Total new {total_cases:,.0f}'
return {'x': .95, 'y': .10, 's': s, 'ha': 'right', 'size': 10}
bcr.bar_chart_race(df=corona,
filename="covid.mp4",
figsize=(8,4.5),
dpi=300,
title="New Australian Covid cases by State (fortnightly)",
cmap=colours,
period_summary_func=summary,
period_label={'x': .95, 'y': .15, 'ha': 'right', 'size': 10},
period_fmt='%d-%b-%y')

Acknowledgements

For more details on this function, check out the following web pages:

Data source: https://www.covid19data.com.au/states-and-territories
Data source: https://infogram.com/
Code info: https://www.dexplo.org/bar_chart_race/tutorial/
FFMpeg library: http://www.ffmpeg.org

--

--