Making beautiful plots in Python (plus a shameless book plug!)
When I transitioned over to working primarily in Python from R, one of the things that I missed was
ggplot2. For me, the plots in
ggplot2 look so much nicer and the syntax is more intuitive compared to
matplotlib. Happily, last year I discovered that Hassan Kibirige has made a comprehensive port of
ggplot2 to Python called plotnine. Although Hassan is still developing the package, I’ve found that the port is fairly complete and there is not much I’m unable to do in
plotnine that I could do in the original
In fact, the port is so good that Mauricio and I have written a technical book about how to get the most out of
plotnine. This book is an adaptation of our popular The Hitchhiker’s Guide to Ggplot2, but written entirely in Python using the
plotnine package, and it’s (very originally) called The Hitchhiker’s Guide to Plotnine. Like in the original
ggplot2 book, we’ve written chapters on a wide range of graphs, from line and bar graphs, scatterplots and boxplots, through to histograms, linear regression graphs and LOWESS plots. Using
plotnine, we have been able to recreate graphs that are very true to the original
ggplot2 style and degree of customisation.
For example, you can create charts like the density plot below, with customisation on the fonts, fill, background layout and legend.
You can also use
plotnine to recreate the styles of other publications. For example, this weighted scatterplot was created in the style of fivethirtyeight.
We’ve been able to recreate the XKCD style plots we made for our previous book, entirely in
plotnine also allows you to go beyond simpler plots like bar and scatterplots, and create more advanced statistical visualisations. In the plot below, you can see we’ve combined the capabilities of the
scipy package with
plotnine to chart some probability density functions.
The package also includes the useful faceting functions that are used in
ggplot2, so you can create subplots like the one below.
Hassan has made the syntax of
plotnine as close to that of
ggplot2 as possible, so the overall experience is quite intuitive for someone who is familiar with using
ggplot2. For example, in order to create a histogram with a customised title and axis labels, as in the code below, you can see that you use the same
geom_histogram() functions to create the plot that you would use in R. You can also see that, like in R, you can specify the metric used by the histogram, its binwidth, and the spacing of the x-axis, although you can see we’ve used
arange function to create the breaks rather than the
seq function you might use in R. Finally,
plotnine uses the same
ylab functions to allow you to label your title and your axes.
p7 = ( ggplot(diamonds, aes("price")) + geom_histogram(aes(y="..count.."), binwidth=500) + scale_x_continuous(breaks=np.arange(0, 22500, 2500)) + ggtitle("Price of diamonds by carat") + xlab("Price of diamond (US$)") + ylab("Frequency of price") ) p7
Overall, I really recommend
plotnine as an alternative to
matplotlib in Python, especially for when you need to create presentation-ready charts. Our book is available on Leanpub, should you need some extra help learning how to use the package or want to push your charts to the next level. Finally, I want to give a huge thanks to Hassan for all of his hard work in building and maintaining this wonderful implementation of
ggplot2 in Python!