Archive | Python RSS feed for this section

Graph Databases II: Routing

18 Aug

I wrote a previous post on graph databases that did little more than add some nodes and a relationship. The post required me to install neo4j and python py2neo. It was simple, but it was a start. I have moved on to other things since then, most notably Go. But I have found myself coming back in a round about way. I recently wrote a plugin for Open Trail Data in Leaflet. Trails are a network but the data is really just lines. I thought if we add nodes at the starting points, intersections and eds of the trails, we could use them for routing. I set off to work.

The Data

The first step was grabbing a simple network. I used the Open Trail Data dummy data. I was going to draw nodes then import them in to a graph database, but I decided to shelve that for later – when I know I can route. The data is drawn in the image below.

Open Trail Data

Open Trail Data

The Graph

Next, I needed to create the graph database. I started with all the endpoints of the lines and then added nodes to the intersections. The code below shows how I created two nodes and a path between them.

one = Node(“trail”,id=”1″,junction=”1″)
oneint = Node(“trail”,id=”1″,junction=”1,2,3,4″)
p=Path(one, Rel(“connects”),oneint)

The end result was a graph database as seen in the image below.

Finished Graph Database

Finished Graph Database


Now it was time to route. I do not have weights or costs on the edges. I just wanted to see if I could get from one point to another. I chose node 3 to node 6. They are both on the left. I like it because there are two possibly ways to go. To route, I used the following query:

results=graph.cypher.execute(“START beginning=node(22),end=node(32) MATCH p = shortestPath(beginning-[*..5000]-end) RETURN p”)

The nodes are 22 and 32 because that it the internal ID assigned by the database. My result is a path:

(:trail {id:”3″,junction:”3″})-[:connects]->(:trail {id:”1″,junction:”1,2,3,4″})-[:connects]->(:trail {id:”1″,junction:”1,7″})-
[:connects]->(:trail {id:”7″,junction:”7,6,8,9,12″})-[:connects]->(:trail {id:”6″,junction:”6″})

This tells me the database is going from node 3 to the intersection of 1,2,3,4 to the intersection of 1,7 to the intersection of 7,6,8,9,12 to the end node 6.

You can see the nodes by using

resultPath = results[0][0]
for n in solved:
   print n

Now I know all the nodes required to get from one point to another. If I added (lat,long) values to each node, I could have drawn the path on a map.

I will put this in my pocket for later. I am sure it will come up again and I will be better suited to solve a problem. In the meantime, I think I will work on figuring out how to get a standard data format in to the Database – shapefile or geojson?

Graph Database and Albuquerque Bus Stops: Neo4j with py2neo

15 Apr

I have been slightly obsessed with the question: “How do you define network service areas client-side on a map.” I know it needs a networked data set and something to do with the Djikstra algorithm (Yes, we could just use an ESRI REST service but there is not one available yet – I will ask the City). After looking at JavaScript implementations of NetworkX, I stumbled upon graph databases, most notably Neo4J.  A networked data set is a graph. Guess what, it has Djikstra built-in, so I must be on the right path. I installed it and added a fake social graph using py2neo. That allowed me to make sure I could do a few things:

  • Add a node
  • Add a relationship
  • add attributes

Now it was time to start with some real data.

My first test was to load Albuquerque Bus Stops for a single route. Here is what I have in my database.

Bus Stops for Route 766. No Relations added yet.

Bus Stops for Route 766. No Relations added yet.

The image above was generated by calling the City of Albuquerque REST Endpoint for bus stops, parsing the response, and putting it in to Neo4J. The image is a view from the DB Manager. The code to do this is below.

from py2neo import Graph
from py2neo import Node, Relationship
from py2neo import authenticate
import urllib2
import json




for x in reply[“features”]:

Notice there are no Relationships! This is crucial if we will ever walk the network. I have manually added on, seen in the image below.

San Mateo links to Louisianna.

San Mateo links to Louisianna.

The code for this is:



I need to think about how to automate the relationship creation based on stop order and direction (there are stops on both sides of the street). Then, I will need to figure out how to make a node have relationships to other routes. For example, many stops are connected to the 777 route and I do not want a separate node for each. I want one with a property showing routes.

Well, a start to say the least. It has been fun learning about graph databases and if GIS doesn’t interest you, you could map your social network and walk it.

Historic Bus Location Data

27 Mar

Last night I was trying to think of interesting uses for a Rasberry Pi. One thing I came up with was a data logger. But what to log? Then I thought about a previous post on the Albuquerque Realtime Bus Data. Hmmm. What if I wanted to show the bus locations using a time slider. What if I want to see if they ever deviated from their routes, or if they deviated from their schedules? I can’t really do any analysis without the historic data, and the City does not give that out currently. So I think I find a use – logging Albuquerque Bus Data.

I don’t have a Rasberry Pi, yet, so I wrote a python script on my desktop to test the logger.

I am not going to post the code because I don’t know the impact on the Albuquerque server. I will give a brief explanation. The City has KML files for each bus route. Each route has multiple buses. I grabbed a single route – 766, and parsed the results. I initially sent the results to a csv – as you can see in the data below this post. Writing to CSV is not too helpful when the data gets large (I am not going to say BIG). Once I knew it worked, I sent the data to a MongoDB that was spatially indexed. In the database, I can now:

Get the total records


Get all the Records

                        for x in collection.find():
                                      print x

Get a specific bus number

                       for x in collection.find({‘number’:’6903′}):
                                           print x

Or find near a lat,lng

                       for x in collection.find({“loc”:{“$near”:[35.10341,-106.56711]}}).limit(3):

With a database, multiple people can query it and perform operations on it. Lastly, if the data gets larger, Mongo can be split (sharding) across multiple machines to hold it all.

My MongoDB records look like:

{u’loc’: [35.08156, -106.6287], u’nextstop’: u’Central @ Cornell scheduled at 3:45 PM’, u’number’: u’6903′, u’time’: u’3:46:52 PM’,
u’_id’: ObjectId(‘5515cfd814cd2829e4c1b718′), u’speed’: u’20.5 MPH’}

Here is the results of my original run. I ran the script for 7 minutes and got the following results for Route 766.


Hard to see, but displays bus locations along route 766 over a 7 minute period.


6409,0.0 MPH,1:34:02 PM,Central @ San Mateo (Rapid) scheduled at 1:31 PM,-106.58642,35.07778
6904,1.9 MPH,1:34:03 PM,Central @ Edith scheduled at 1:31 PM,-106.64776,35.08401
6411,0.0 MPH,1:33:56 PM,CUTC Bay B scheduled at 1:39 PM,-106.72266,35.07726
6407,21.7 MPH,1:34:00 PM,Central @ 1st (across from A.T.C.) scheduled at 1:34 PM,-106.64317,35.08359
6410,35.4 MPH,1:33:53 PM,Central @ San Mateo (Rapid) scheduled at 1:36 PM,-106.58247,35.07749
6403,0.0 MPH,1:34:02 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57089,35.10305
6903,29.8 MPH,1:33:56 PM,Central @ Atrisco scheduled at 1:36 PM,-106.69049,35.08443
6409,0.0 MPH,1:35:14 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.5852,35.07764
6904,1.9 MPH,1:35:14 PM,Central @ Edith scheduled at 1:31 PM,-106.6478,35.08413
6411,0.0 MPH,1:35:07 PM,Next stop is CUTC Bay B scheduled at 1:44 PM,-106.72525,35.07886
6407,0.0 MPH,1:35:11 PM,Copper @ 2nd scheduled at 1:34 PM,-106.64784,35.08417
6410,19.3 MPH,1:35:17 PM,Central @ Carlisle (Rapid) scheduled at 1:40 PM,-106.58734,35.07802
6403,0.0 MPH,1:35:14 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57087,35.10306
6903,14.9 MPH,1:35:08 PM,Central @ Tingley (Rapid) scheduled at 1:38 PM,-106.68485,35.08576
6409,23.6 MPH,1:36:38 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.58084,35.07723
6904,18.0 MPH,1:35:53 PM,Central @ Edith scheduled at 1:31 PM,-106.647,35.08373
6411,0.0 MPH,1:36:31 PM,Next stop is CUTC Bay B scheduled at 1:44 PM,-106.72525,35.07885
6407,0.6 MPH,1:36:35 PM,Copper @ 5th scheduled at 1:35 PM,-106.64944,35.08541
6410,0.0 MPH,1:36:40 PM,Central @ Carlisle (Rapid) scheduled at 1:40 PM,-106.59512,35.07883
6403,0.0 MPH,1:36:42 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57086,35.10306
6903,31.7 MPH,1:36:33 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.67824,35.09252
6409,37.9 MPH,1:37:49 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.57203,35.07627
6904,0.6 MPH,1:37:55 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63771,35.08276
6411,0.0 MPH,1:37:55 PM,Central @ Coors scheduled at 1:47 PM,-106.72526,35.07885
6407,1.9 MPH,1:37:47 PM,Copper @ 5th scheduled at 1:35 PM,-106.6496,35.08487
6410,0.0 MPH,1:37:52 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.60369,35.07979
6403,0.0 MPH,1:37:57 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57087,35.10306
6903,0.0 MPH,1:37:56 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.67159,35.09515
6409,0.0 MPH,1:39:14 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.56872,35.07595
6904,18.0 MPH,1:38:19 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63713,35.08265
6411,0.0 MPH,1:39:19 PM,Central @ Coors scheduled at 1:47 PM,-106.72527,35.07884
6407,3.1 MPH,1:39:10 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.65295,35.086
6410,31.1 MPH,1:39:16 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.61167,35.08084
6403,5.6 MPH,1:39:23 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.5707,35.10368
6903,23.0 MPH,1:39:20 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.67011,35.09438
6409,26.1 MPH,1:40:37 PM,Louisiana @ Lomas scheduled at 1:38 PM,-106.56848,35.07723
6904,24.9 MPH,1:40:10 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63585,35.08255
6411,0.0 MPH,1:40:31 PM,Central @ Coors scheduled at 1:47 PM,-106.72526,35.07884
6407,23.0 MPH,1:40:36 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.65675,35.0863
6410,34.2 MPH,1:40:40 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.61567,35.0811
6403,0.0 MPH,1:40:42 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.56875,35.10221
6903,23.6 MPH,1:40:32 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.66327,35.08892
6409,0.6 MPH,1:41:49 PM,Indian School @ Uptown Loop Road scheduled at 1:42 PM,-106.56849,35.08691
6904,24.9 MPH,1:41:55 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63585,35.08255
6411,0.0 MPH,1:41:55 PM,Central @ Coors scheduled at 1:47 PM,-106.72526,35.07884
6407,0.0 MPH,1:41:58 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.65822,35.08648
6410,28.6 MPH,1:41:52 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.6216,35.08113
6403,0.0 MPH,1:42:01 PM,Louisiana @ Lomas scheduled at 1:45 PM,-106.56779,35.10184
6903,23.0 MPH,1:41:55 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.65819,35.08629
6409,41.0 MPH,1:43:13 PM,Indian School @ Uptown Loop Road scheduled at 1:42 PM,-106.56863,35.09282
6904,24.9 MPH,1:42:15 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.6217,35.08093
6411,0.0 MPH,1:43:19 PM,Central @ Coors scheduled at 1:47 PM,-106.72528,35.07883
6407,9.9 MPH,1:43:10 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.661,35.08784
6410,34.2 MPH,1:43:16 PM,Central @ Mulberry (Rapid) scheduled at 1:47 PM,-106.62524,35.08125
6403,16.8 MPH,1:43:22 PM,Louisiana @ Lomas scheduled at 1:45 PM,-106.56635,35.10156
6903,19.3 MPH,1:43:19 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.6549,35.084

Python Wrapper for Leaflet

20 Mar

I recently stumbled upon Folium – a python wrapper for leaflet. I was excited and it seemed to work well. I slowly ran in to problems and the pages loaded slow. I probably did something wrong on my end, but decided to write a simple wrapper on my own.

My wrapper is a python function for different Leaflet features such as map and marker. When you call each function, it writes a string to a file to generate the HTML. Below is my python code (

class l(object):

def __init__(self,path):
self.f.write(‘<html><head><title>Map From Python</title><link rel=”stylesheet” href=”; /></head><body><script src=””></script><div style=”height:900px; width:900px” id=”map”></div><script>\n’)

def map(self, lat,long,zoom):
self.f.write(“var map =‘map’, {center: [“+str(”,”+str(self.long)+”], zoom:”+str(self.zoom)+”});\n”)
def marker(self,lat,long, popup=””):
self.f.write(‘L.marker([‘+str(self.x) +’,’+str(self.y)+’]).bindPopup(“‘+str(self.popup)+'”).addTo(map);\n’)
def onclick():

def makeMap(self):

To use the code, follow the example below.

>>> from pyLeaflet import l
>>> L=l(“Paul.html”)
>>> L.marker(35,-106)
>>> L.marker(34,-106,”Hello from Python”)
>>> L.makeMap()

The output will be an HTML file called paul.html that displays a map with a maker.

ESRI REST and Python: pandas.DataFrame

13 Jan

I have been working in JavaScript a lot recently and have missed coding in python. I wanted to work on something that would have real world value so I decided to see if I could load an ESRI REST endpoint in to a pandas DataFrame. This post will only scratch the surface of what you can do with data in python but should give you an idea of what is possible and allow you to imagine some really interesting possibilities. I have posted all of the code as a static IPython notebook and on Python Fiddle for easy copy+paste.

Getting Data

We are going to start by grabbing some open data from the City of Albuquerque – I will use the Crime Incidents REST service. The first step is to make the request and read it.

import urllib, urllib2
param = {‘where’:’1=1′,’outFields’:’*’,’f’:’json’}
url = ‘…/APD_Incidents/MapServer/0/query ?’ + urllib.urlencode(param)
rawreply = urllib2.urlopen(url).read()

If you print rawreply, you will see a long string. We know it is a string because if you print rawreply[0] you will see { as the result. To confirm it, you can type(rawreply) and get back str.

Working with the Data

Now we need to convert it to JSON so that we can do something with the data.

import simplejson
reply = simplejson.loads(rawreply)
print reply[“features”][0][“attributes”][“date”]
print reply[“features”][0][“attributes”][“CVINC_TYPE”]

The above code will return something like


How many features do we have? 24,440.

print len(reply[“features”])

The above code will give you the count and show you the first three results. You can grab subsets of the data by splicing. If you want records 2 through 4, you can reply[“features”][2:5].

Let’s get to the good stuff.

Pandas DataFrame

JSON is great, but let’s put the data in to a table (DataFrame). First, a disclaimer. ESRI REST services return a deeply nested JSON object. Converting this to a DataFrame is more difficult than:

import as j
jsondata = j.read_json(the URL)

But it is not much more difficult. We just need to recreate the JSON object grabbing what we are interested in – the attributes of the features. We need a loop that will  create an array of dictionaries. Each dictionary will be the attributes of a single feature.

count = 0
while (count < len(reply[“features”])):
for key, value in reply[“features”][count][“attributes”].iteritems():
mydict[key]= value
count = count + 1

The code above initializes a counter, array and a dictionary. It then loops until there are no more features. The loop reads the attributes of the feature and creates a dictionary for each attribute and its value. Now we have an array of dictionaries. If you len(myFrame) you will get 24,440. That is the same number of records we had in our JSON object.

We have all our data and it is an array. We can now create a DataFrame from it.

from pandas import DataFrame as df
myFrame = df(data=myformateddata)

The code above imports the tools we need, assigns the frame to a variable and then prints it. You should see a table like the image below.

DataFrame with Crime Incidents from ESRI REST endpoint

DataFrame with Crime Incidents from ESRI REST endpoint

Data Manipulation

So what can we do with a DataFrame? First, let’s save it. We can export the DataFrame to Excel (The City of Albuquerque provides open data in several formats, but other sources may only provide you with the REST API so this would be a valuable method for converting the data).

from pandas import ExcelWriter
writer = ExcelWriter(‘CrimeIncidents.xlsx’)

Great! the data has been exported. Where did it go? The code below will return the directory with the file.

import os

You can open the Excel file and work with the data or send it to someone else. But let’s continue with some basic manipulation in the DataFrame.

How many of each type of incident do we have in our data?


Incidents Summarized

Incidents Summarized

Now I would like a barchart of the top 5 most frequent Incident Types.

import matplotlib.pyplot as plt

My Bar Chart of the 5 most common Incidents

My Bar Chart of the 5 most common Incidents

Lastly, let’s filter our results by date.

import datetime
import time
milli=time.mktime(myDate.timetuple()) * 1000+ myDate.microsecond / 1000

The above code creates a date of January 10, 2015. It then converts it to milliseconds – 1420873200000. If we pass the date to the DataFrame we can grab all the Incidents after January 10, 2015.


DataFrame for Incidents After January 10, 2015

DataFrame for Incidents After January 10, 2015

Now you know how to connect to an ESRI REST endpoint, grab the data, convert it to JSON and put it in a DataFrame. Once you have it in the DataFrame, you can now display a table, export the data to Excel, plot a barchart and filter the data on any field.

Revit and RabbitMQ: Passing Data in to Revit from Outside Applications

25 Oct


I threw together a quick website using CherryPY to pass data to Revit. The QR Code pictured below, when scanned, will pull up the website, triggering the data in it to be sent to Revit. Picture a QR Code with info on a HVAC Unit, you scan it and the data is passed to Revit for when it was installed or serviced.

Sending Data in a URL to Revit. Data can be parsed to up date parameters of object.

Sending Data in a URL to Revit. Data can be parsed to update parameters of object.

QR Code with embedded data to send to Revit.

QR Code with embedded data to send to Revit.














Modifying Revit from outside the program has been my primary interest for some time.

Initially, I was fascinated by the Revit DBLink plugin and wrote a few proof of concepts that relied on the following architecture:

Revit <—–> Database<—–>Outside Program

I have recently posted proof of concepts about sending data in to Revit from outside – skipping the database export– but used GIS as my example, arguing that because GIS and Revit use the same API language it should be possible.

I now have a working example in Revit. The architecture no longer requires a Revit DB to be exported.

Revit <——- RabbitMQ<———-Outside Program. (can be bidirectional if want it to)

I wrote a Revit Add-In that connects to a Messaging Queue and retrieves any waiting messages. Getting the message in to Revit is the hard part. Once in, we can use it to modify properties – like in the GIS example that drew points from a message.

The screenshots show the Add-In, No Messages Waiting, and a Message Coming In.

This is my proof of concept and the goal is to modify this program to allow the scanning of a QR code to send a message in to Revit and modify/add properties to an object. That architects looks like this:

Revit<—— RabbitMQ<——-WebServer <——- QR Code scanned by mobile device

The Qr Code Opens a URL with Parameters(

The WebServer is a CherryPy Script that reads in the parameters and passes them as a message to RabbitMQ using Pika.

The Revit Add-In gets the message and modifies the properties of the wall with ID=123

More pymongo

13 Sep


I am back in MongoDB mode. I grabbed some pollen data from the City of Albuquerque. It has a date, location, pollen type and the count from 2004 to 2013. I loaded this in to MongoDB from a CSV with this script:
from pymongo import MongoClient


for x in file.readlines():

I can then query for elm data sorted by date.
from pymongo import MongoClient



for entry in x:

Great! But I want all elm data on the east side of ABQ sorted by date and plotted. Easy! Matplotlib and Pandas help out here:

from pandas import Series
import matplotlib.pyplot as plt
from pymongo import MongoClient


for w in x: