}

How to plot with python pandas

Created:

Introduction

Pandas is a great python library for doing quick and easy data analysis. In this tutorial we are going to show you how to download a .csv file from the internet and we are going to do a simple plot to show the information. Additionally we are going to improve the default pandas data frame plot and finally save it to a file.

Recommended tutorial

We recommend to do the anaconda python tutorial for setup a data analysis development environment.

Step 1: Installation

conda install pandas requests ipython

Step 2: Download the data for analysis

We are going to download the dataset using python, but this step is optional. You can check for more free datasets here.

Start ipython console with the command ipython

import requests
url = 'https://tutorials.technology/data/world-population.csv'
response = requests.get(url)

with open('world-population.csv', 'wb') as csv_file:
    csv_file.write(response.content)

Step 3: Use pandas read_csv to load data

Now we are going to use read_csv to load the csv data into a pandas data frame. To use the year for X values, we use the parameter index_col. index_col is an integer which referers to the column number to use as an index of the data. In this particular case que have a csv with two columns.

import pandas
population = pandas.read_csv('world-population.csv', index_col=0)

Step 4: Plotting the data with pandas

import matplotlib.pyplot as plt
population.plot()
plt.show()

At this point you shpuld get a plot similar to this one:

Example of the obtained plot from pandas

Step 5: Improving the plot

First we are going to add the title to the plot. Then we set other parameters to improve the plot: * lw : Line width. The default value is usually low and we set it to 2. * colormap : We use the jet color map, but a a complete list of colormap is available here. * marker : Markers are used to let the user know where is the data point. for a complete list of markers check here. * markersize : The size of the marker.

Finally we set the x and y axis labels to the pandas data frame plot. Add the parameter title to the plot method.

plot = population.plot(title='World Population', lw=2, colormap='jet', marker='.', markersize=10)
plot.set_xlabel("Year")
plot.set_ylabel("Population")

The plot should looks like this one:

Improved plot of pandas data frame

Step 6: Saving the plot to an image

To save the plot to a file we just need to change the last python line. Here is the full example of the pandas data frame plot that will be saved to a file called population.png


import matplotlib.pyplot as plt
import pandas
population = pandas.read_csv('world-population.csv', index_col=0)
plot = population.plot(title='World Population', lw=2, colormap='jet', marker='.', markersize=10)
plot.set_xlabel("Year")
plot.set_ylabel("Population")
plt.savefig('population.png')

The final result should be like this:

Final plot from a csv file using pandas

Appendix: Error UnicodeDecodeError: 'utf-8' codec can't decode byte

If you get an error like this one while using pandas.read_csv:

UnicodeDecodeError: codec can't decode byte 0xa4 in position : invalid start byte

You need to use the encoding parameter of read_csv

pandas.read_csv('world-population.csv', encoding='utf8')

Usually utf8 solves the encoding problem, but it could be that the file is using other type of encoding.