ESRI REST and Python: pandas.DataFrame

13 Jan

I have been working in JavaScript a lot recently and have missed coding in python. I wanted to work on something that would have real world value so I decided to see if I could load an ESRI REST endpoint in to a pandas DataFrame. This post will only scratch the surface of what you can do with data in python but should give you an idea of what is possible and allow you to imagine some really interesting possibilities. I have posted all of the code as a static IPython notebook and on Python Fiddle for easy copy+paste.

Getting Data

We are going to start by grabbing some open data from the City of Albuquerque – I will use the Crime Incidents REST service. The first step is to make the request and read it.

import urllib, urllib2
param = {‘where’:’1=1′,’outFields’:’*’,’f’:’json’}
url = ‘…/APD_Incidents/MapServer/0/query ?’ + urllib.urlencode(param)
rawreply = urllib2.urlopen(url).read()

If you print rawreply, you will see a long string. We know it is a string because if you print rawreply[0] you will see { as the result. To confirm it, you can type(rawreply) and get back str.

Working with the Data

Now we need to convert it to JSON so that we can do something with the data.

import simplejson
reply = simplejson.loads(rawreply)
print reply[“features”][0][“attributes”][“date”]
print reply[“features”][0][“attributes”][“CVINC_TYPE”]

The above code will return something like


How many features do we have? 24,440.

print len(reply[“features”])

The above code will give you the count and show you the first three results. You can grab subsets of the data by splicing. If you want records 2 through 4, you can reply[“features”][2:5].

Let’s get to the good stuff.

Pandas DataFrame

JSON is great, but let’s put the data in to a table (DataFrame). First, a disclaimer. ESRI REST services return a deeply nested JSON object. Converting this to a DataFrame is more difficult than:

import as j
jsondata = j.read_json(the URL)

But it is not much more difficult. We just need to recreate the JSON object grabbing what we are interested in – the attributes of the features. We need a loop that will  create an array of dictionaries. Each dictionary will be the attributes of a single feature.

count = 0
while (count < len(reply[“features”])):
for key, value in reply[“features”][count][“attributes”].iteritems():
mydict[key]= value
count = count + 1

The code above initializes a counter, array and a dictionary. It then loops until there are no more features. The loop reads the attributes of the feature and creates a dictionary for each attribute and its value. Now we have an array of dictionaries. If you len(myFrame) you will get 24,440. That is the same number of records we had in our JSON object.

We have all our data and it is an array. We can now create a DataFrame from it.

from pandas import DataFrame as df
myFrame = df(data=myformateddata)

The code above imports the tools we need, assigns the frame to a variable and then prints it. You should see a table like the image below.

DataFrame with Crime Incidents from ESRI REST endpoint

DataFrame with Crime Incidents from ESRI REST endpoint

Data Manipulation

So what can we do with a DataFrame? First, let’s save it. We can export the DataFrame to Excel (The City of Albuquerque provides open data in several formats, but other sources may only provide you with the REST API so this would be a valuable method for converting the data).

from pandas import ExcelWriter
writer = ExcelWriter(‘CrimeIncidents.xlsx’)

Great! the data has been exported. Where did it go? The code below will return the directory with the file.

import os

You can open the Excel file and work with the data or send it to someone else. But let’s continue with some basic manipulation in the DataFrame.

How many of each type of incident do we have in our data?


Incidents Summarized

Incidents Summarized

Now I would like a barchart of the top 5 most frequent Incident Types.

import matplotlib.pyplot as plt

My Bar Chart of the 5 most common Incidents

My Bar Chart of the 5 most common Incidents

Lastly, let’s filter our results by date.

import datetime
import time
milli=time.mktime(myDate.timetuple()) * 1000+ myDate.microsecond / 1000

The above code creates a date of January 10, 2015. It then converts it to milliseconds – 1420873200000. If we pass the date to the DataFrame we can grab all the Incidents after January 10, 2015.


DataFrame for Incidents After January 10, 2015

DataFrame for Incidents After January 10, 2015

Now you know how to connect to an ESRI REST endpoint, grab the data, convert it to JSON and put it in a DataFrame. Once you have it in the DataFrame, you can now display a table, export the data to Excel, plot a barchart and filter the data on any field.


2 Responses to “ESRI REST and Python: pandas.DataFrame”


  1. Query Albuquerque Open Data with Turf.js | Architecture and Planning - February 9, 2015

    […] have posted on the Albuquerque Crime Incident data showing how to use Python and for converting the Date field. I have also demostrated how to use Turf.js for analysis. In this […]

  2. Esri, Python and pandas.DataFrame | Locationista - March 2, 2016

    […] Rickard has a nice post where he talks about getting government open data on crimes from an Esri REST web service into a Pandas DataFrame and out…. Lots of useful info in there. Thanks, […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: