Archive | Data RSS feed for this section

Graph Databases II: Routing

18 Aug

I wrote a previous post on graph databases that did little more than add some nodes and a relationship. The post required me to install neo4j and python py2neo. It was simple, but it was a start. I have moved on to other things since then, most notably Go. But I have found myself coming back in a round about way. I recently wrote a plugin for Open Trail Data in Leaflet. Trails are a network but the data is really just lines. I thought if we add nodes at the starting points, intersections and eds of the trails, we could use them for routing. I set off to work.

The Data

The first step was grabbing a simple network. I used the Open Trail Data dummy data. I was going to draw nodes then import them in to a graph database, but I decided to shelve that for later – when I know I can route. The data is drawn in the image below.

Open Trail Data

Open Trail Data

The Graph

Next, I needed to create the graph database. I started with all the endpoints of the lines and then added nodes to the intersections. The code below shows how I created two nodes and a path between them.

one = Node(“trail”,id=”1″,junction=”1″)
oneint = Node(“trail”,id=”1″,junction=”1,2,3,4″)
p=Path(one, Rel(“connects”),oneint)

The end result was a graph database as seen in the image below.

Finished Graph Database

Finished Graph Database

Route

Now it was time to route. I do not have weights or costs on the edges. I just wanted to see if I could get from one point to another. I chose node 3 to node 6. They are both on the left. I like it because there are two possibly ways to go. To route, I used the following query:

results=graph.cypher.execute(“START beginning=node(22),end=node(32) MATCH p = shortestPath(beginning-[*..5000]-end) RETURN p”)

The nodes are 22 and 32 because that it the internal ID assigned by the database. My result is a path:

(:trail {id:”3″,junction:”3″})-[:connects]->(:trail {id:”1″,junction:”1,2,3,4″})-[:connects]->(:trail {id:”1″,junction:”1,7″})-
[:connects]->(:trail {id:”7″,junction:”7,6,8,9,12″})-[:connects]->(:trail {id:”6″,junction:”6″})

This tells me the database is going from node 3 to the intersection of 1,2,3,4 to the intersection of 1,7 to the intersection of 7,6,8,9,12 to the end node 6.

You can see the nodes by using

resultPath = results[0][0]
solved=newPath.nodes
for n in solved:
   print n

Now I know all the nodes required to get from one point to another. If I added (lat,long) values to each node, I could have drawn the path on a map.

I will put this in my pocket for later. I am sure it will come up again and I will be better suited to solve a problem. In the meantime, I think I will work on figuring out how to get a standard data format in to the Database – shapefile or geojson?

Bypass Cognos Forms

29 Jul

The City of Albuquerque has a Cognos report that allows you to see employee earnings. You can view the report on their transparency website or you can download a copy on the open data site. I do not want to go through a Cognos front page every time I need the data and I do not want to check when the last time they exported the xls or xml version to the open data site. I want to grab a fresh copy – I really want a service. Since they do not have one, we are stuck using Cognos every time to get the live data. Luckily, we can script it.

Cognos has several JavaScript functions cognosLaunch(), cognosLaunchWithTarget() and cognosLaunchInWindow(). These functions take different parameters and then call cognosLaunchArgArray(). Where do you get the JavaScipt library from? The City of Albuquerque – or anyone who has Cognos installed. The file is located at:

http://cognospublic.cabq.gov/cabqcognos/cognoslaunch.js

You can link to this file in your HTML

http://cognospublic.cabq.gov/cabqcognos/cognoslaunch.js

Now, you just need to know how to format the inputs properly. You can find all the information you need by running the report on the transparency site first. When the report finishes, view the source. You will see all the variables highlighted in the image below:

Cognos Report Source reveals all the parameters.

Cognos Report Source reveals all the parameters.

Now, format the function, passing the correct data. For cognosLaunch(), you will have the function below:

cognosLaunch(“ui.gateway”,”http://cognospublic.cabq.gov/cabqcognos/cgi-bin/cognos.cgi”,”ui.tool”,”CognosViewer”,”ui.object”,
“/content/folder[@name=’Transparency’]/report[@name=’Transparency Report – Graded employees’]”,”ui.action”,”run”,”run.prompt”,
“false”,”ui”,”h1h2h3h4″,”run.outputFormat”,”CSV”);

Put this in an HTML file in the <script> section and you will launch a window and download the CSV automatically. I have put a file on GitHub. There is another example which includes HTML and a JS file. The CABQ.js file formats the function for you. In this file, you could pass optional search parameters. I will leave that part up to you – I like grabbing all the data.

You can pas different outputFormats as well – CSV, HTML, PDF, singleXLS, XHTML, XLWA, and XML. Lastly, the City does not allow ajax calls from cross domain, so you may need to have the CORS extension installed in chrome. You can get it from chrome.google.com.

How would I use this in production? I think I would run a simple Python(cherrypy) or Go server that hosts the HTML and JS. Then I would write my application to call the site. I know where the download will go, so I could parse it when done. Then I could either return it to the server or do something with it on my machine.

Data and Design

8 May

If you have read any of the posts on this blog, you should know I love data. But what you may not know is that I love architecture, design and a good sketch. I spent five hours getting Study For The War Coffer by Eugene Delacroix tattooed on my chest.

coffer

Often these two worlds collide. I came across a tweet today:

Mindlessly drawing with data? How dare she. I once thought it a good idea to write computer code that could read an architectural program and develop the floor plan automatically. While I still favor some of this thinking, I have had to think it through. And slowly I have come out against it, and I have sided with Tara on this issue.

There is something to be said for hand drawing. The lines made by a pen, with their varying weights, show movement in a still image. There is something beautiful about them. About the process of sketching. Freely moving your hand across the canvas. The AIA had a podcast on Didactic Drawing that really brought it all home for me. On a computer, scale can change. You can draw a hundred foot line and based on your zoom level (scale) it could be a millimeter long. On paper, your scale is fixed. The movement of your hand across the page lets you know how long the line is.

I am not against BIM. But without pre-sketching designs, these program make it easy to create boxes, squares and overall bland buildings, to draw without a set scale, to fully understand and feel the building you are creating.  To design with data is an idea I am still deeply attached to. But I think we walk a fine line between letting data inform design- on how people use buildings for example – to creating the design for us – as in my program example earlier.

Applications like Revit or Grasshopper make it east to start with a simple form – a box – and twist, pull, rotate and skew it to come up with a whole host of possible forms. The results are soulless – though some look really cool. I do not see the art in it. If we are just going to feed some data in to a model to generate a form and say “look at this cool form I created from using the coordinates of all tweets that had the word Gehry in it” then we might as well give up – though I find these kinds of experiments interesting.

Data is, of course, valuable for facility maintenance. I also find value in data on movements of individuals within buildings and with modeling designs for things like airflow, heat, sunlight, etc. These are the kinds of data that can inform design – or confirm that a specific design is a functional design.

I do not want to live in a City full of bland buildings, just as much as I do not want to live in a world full of monuments to the architect that are outrageously out of context. There needs to exist a balance of the art and the science, of architecture and data. And each needs to compliment the other.

 

 

Excel Geocoder

17 Apr

ESRI makes maps for Office. I thought this could be interesting and went to work on trying to insert a map in to a spreadsheet. I did not have much luck. Instead, I decided to throw together a quick macro that would geocode addresses in a spreadsheet and give back coordinates. I do not think VB Script can parse JSON, so the result is ugly, but you get the idea.

Start with a spreadsheet of addresses

sheet

 

Then create a Macro – I copied the code for reading the URL from Ryan Farley.

Sub Macro1()
Range(“A2”).Select
i = 2

Do Until IsEmpty(ActiveCell)

URL = “http://coagisweb.cabq.gov/arcgis/rest/services/locators/CABQ_NetCurr/GeocodeServer/findAddressCandidates?f=json&outSR=4326&street=&#8221; & ActiveCell.Value
Dim objHttp
Set objHttp = CreateObject(“Msxml2.ServerXMLHTTP”)
objHttp.Open “GET”, URL, False
objHttp.Send
Cells(i, 2).Value = objHttp.ResponseText
i = i + 1
ActiveCell.Offset(1, 0).Select
Set objHttp = Nothing
Loop

End Sub

The code starts at cell A2 and reads addresses until it reaches an empty cell. It takes the value and sends it to the ESRI REST endpoint for the City of Albuquerque Geocoding Service. It sets the cell next to it with the results. They should really be parsed, but I am too lazy and was just curious if it could be done. It can. The result is below.

results

I am still thinking of how to embed a web page in the sheet.

Graph Database and Albuquerque Bus Stops: Neo4j with py2neo

15 Apr

I have been slightly obsessed with the question: “How do you define network service areas client-side on a map.” I know it needs a networked data set and something to do with the Djikstra algorithm (Yes, we could just use an ESRI REST service but there is not one available yet – I will ask the City). After looking at JavaScript implementations of NetworkX, I stumbled upon graph databases, most notably Neo4J.  A networked data set is a graph. Guess what, it has Djikstra built-in, so I must be on the right path. I installed it and added a fake social graph using py2neo. That allowed me to make sure I could do a few things:

  • Add a node
  • Add a relationship
  • add attributes

Now it was time to start with some real data.

My first test was to load Albuquerque Bus Stops for a single route. Here is what I have in my database.

Bus Stops for Route 766. No Relations added yet.

Bus Stops for Route 766. No Relations added yet.

The image above was generated by calling the City of Albuquerque REST Endpoint for bus stops, parsing the response, and putting it in to Neo4J. The image is a view from the DB Manager. The code to do this is below.

from py2neo import Graph
from py2neo import Node, Relationship
from py2neo import authenticate
import urllib2
import json

authenticate(“localhost:7474″,”myUserName”,”myPassword”)
graph=Graph()
graph.delete_all()

url=”http://coagisweb.cabq.gov/arcgis/rest/services/public/fullviewer/mapserver/22/query?where=ROUTE=’766’&f=json&outFields=*&outSR=4326&#8243;
rawreply=urllib2.urlopen(url).read()

reply=json.loads(rawreply)

for x in reply[“features”]:
graph.create(Node(“stop”,route=x[“attributes”][“ROUTE”],direction=x[“attributes”][“DIRECTION”],street=x[“attributes”][“STREET”],intersection=x[“attributes”][“NEAR_INTER”],lat=x[“geometry”][“y”],long=x[“geometry”][“x”]))

Notice there are no Relationships! This is crucial if we will ever walk the network. I have manually added on, seen in the image below.

San Mateo links to Louisianna.

San Mateo links to Louisianna.

The code for this is:

rel=Relationship(graph.node(42),”Next”,graph.node(41))

graph.create(rel)

I need to think about how to automate the relationship creation based on stop order and direction (there are stops on both sides of the street). Then, I will need to figure out how to make a node have relationships to other routes. For example, many stops are connected to the 777 route and I do not want a separate node for each. I want one with a property showing routes.

Well, a start to say the least. It has been fun learning about graph databases and if GIS doesn’t interest you, you could map your social network and walk it.

Open Data Disclaimers and Terms of Service

7 Apr

I saw a tweet by @waldojaquith that commented on the State of NY Open Data Portals TOS. The TOS states that you cannot access the page

“…by using an automated device, script, bot, spider, crawler or scraper”

I can guess their intent is to prevent people from hitting the site with bots and overloading the server. But a device and script? This seems at best, too broad, at worst, ignorance as to how people will want to use the data. Furthermore, will I be prosecuted from doing so or will my IP address be blocked? Is this even enforceable? Can a government block a private citizen from a civic website?

This tweet made me curious about the city I live in – Albuquerque. I was dumbfounded by the content of their disclaimer/TOS. It read

“The City may require a user of this data to terminate any and all display, distribution or other use of any or all of the data provided at this website for any reason…”

Really? The City can call me and tell me to remove the bar chart of car thefts by month from my website because it is based on their data? Or, that I can no longer send the PublicArt.kmz file to someone?

I love Albuquerque Open Data. I find it quite useful and very good. So when the page says that

“The City makes no warranty, representation, or guaranty as to the content, accuracy, timeliness, or completeness of any of the data provided at this website.”

it makes me question the quality of the data. I get it, the data might be wrong, don’t hold it as gospel. Seems to be common sense to me. But this statement creates a distrust in the data. Is it authoritative? After reading the TOS, I would not take it to be.

If I were a gambling man, I would bet this is the work of the Legal Department. The same people that probably have email signatures saying if they send you confidential information by mistake you are in trouble for not deleting it. Yeah, I don’t think so Matlock. Saying it doesn’t make it true.

Government is made up of many departments. Those departments create a bunch of data. That data finds its way to an open data website probably run by the IT Department. The IT department just puts it out there and doesn’t really know anything about the data. They don’t make it so why would they stand by it? They point you to the department that made it. But that department probably didn’t care to release it in the first place and don’t want to be bothered exporting it or updating it. Or worse yet, fielding calls from citizens about it. But if someone is going to put out some data, there needs to be a level of trust or quality in the data.

Is it better to put out some data of quality by choice than to be legally obligated to provide it as the result of a FOIA request? I would imagine open data cuts down on a lot of those requests, saving a lot of time in money.

Once you put it out, please don’t think you have any control over what I do with it or who I send it to.

I do not know the answer so I am putting the question out there, and I did so on Twitter as well:

Has a City ever been sued for the quality of their open data?

 

Historic Bus Location Data

27 Mar

Last night I was trying to think of interesting uses for a Rasberry Pi. One thing I came up with was a data logger. But what to log? Then I thought about a previous post on the Albuquerque Realtime Bus Data. Hmmm. What if I wanted to show the bus locations using a time slider. What if I want to see if they ever deviated from their routes, or if they deviated from their schedules? I can’t really do any analysis without the historic data, and the City does not give that out currently. So I think I find a use – logging Albuquerque Bus Data.

I don’t have a Rasberry Pi, yet, so I wrote a python script on my desktop to test the logger.

I am not going to post the code because I don’t know the impact on the Albuquerque server. I will give a brief explanation. The City has KML files for each bus route. Each route has multiple buses. I grabbed a single route – 766, and parsed the results. I initially sent the results to a csv – as you can see in the data below this post. Writing to CSV is not too helpful when the data gets large (I am not going to say BIG). Once I knew it worked, I sent the data to a MongoDB that was spatially indexed. In the database, I can now:

Get the total records

                         collection.count()

Get all the Records

                        for x in collection.find():
                                      print x

Get a specific bus number

                       for x in collection.find({‘number’:’6903′}):
                                           print x

Or find near a lat,lng

                       for x in collection.find({“loc”:{“$near”:[35.10341,-106.56711]}}).limit(3):
                                           repr(x)

With a database, multiple people can query it and perform operations on it. Lastly, if the data gets larger, Mongo can be split (sharding) across multiple machines to hold it all.

My MongoDB records look like:

{u’loc’: [35.08156, -106.6287], u’nextstop’: u’Central @ Cornell scheduled at 3:45 PM’, u’number’: u’6903′, u’time’: u’3:46:52 PM’,
u’_id’: ObjectId(‘5515cfd814cd2829e4c1b718′), u’speed’: u’20.5 MPH’}

Here is the results of my original run. I ran the script for 7 minutes and got the following results for Route 766.

Bus

Hard to see, but displays bus locations along route 766 over a 7 minute period.

 

6409,0.0 MPH,1:34:02 PM,Central @ San Mateo (Rapid) scheduled at 1:31 PM,-106.58642,35.07778
6904,1.9 MPH,1:34:03 PM,Central @ Edith scheduled at 1:31 PM,-106.64776,35.08401
6411,0.0 MPH,1:33:56 PM,CUTC Bay B scheduled at 1:39 PM,-106.72266,35.07726
6407,21.7 MPH,1:34:00 PM,Central @ 1st (across from A.T.C.) scheduled at 1:34 PM,-106.64317,35.08359
6410,35.4 MPH,1:33:53 PM,Central @ San Mateo (Rapid) scheduled at 1:36 PM,-106.58247,35.07749
6403,0.0 MPH,1:34:02 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57089,35.10305
6903,29.8 MPH,1:33:56 PM,Central @ Atrisco scheduled at 1:36 PM,-106.69049,35.08443
6409,0.0 MPH,1:35:14 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.5852,35.07764
6904,1.9 MPH,1:35:14 PM,Central @ Edith scheduled at 1:31 PM,-106.6478,35.08413
6411,0.0 MPH,1:35:07 PM,Next stop is CUTC Bay B scheduled at 1:44 PM,-106.72525,35.07886
6407,0.0 MPH,1:35:11 PM,Copper @ 2nd scheduled at 1:34 PM,-106.64784,35.08417
6410,19.3 MPH,1:35:17 PM,Central @ Carlisle (Rapid) scheduled at 1:40 PM,-106.58734,35.07802
6403,0.0 MPH,1:35:14 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57087,35.10306
6903,14.9 MPH,1:35:08 PM,Central @ Tingley (Rapid) scheduled at 1:38 PM,-106.68485,35.08576
6409,23.6 MPH,1:36:38 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.58084,35.07723
6904,18.0 MPH,1:35:53 PM,Central @ Edith scheduled at 1:31 PM,-106.647,35.08373
6411,0.0 MPH,1:36:31 PM,Next stop is CUTC Bay B scheduled at 1:44 PM,-106.72525,35.07885
6407,0.6 MPH,1:36:35 PM,Copper @ 5th scheduled at 1:35 PM,-106.64944,35.08541
6410,0.0 MPH,1:36:40 PM,Central @ Carlisle (Rapid) scheduled at 1:40 PM,-106.59512,35.07883
6403,0.0 MPH,1:36:42 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57086,35.10306
6903,31.7 MPH,1:36:33 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.67824,35.09252
6409,37.9 MPH,1:37:49 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.57203,35.07627
6904,0.6 MPH,1:37:55 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63771,35.08276
6411,0.0 MPH,1:37:55 PM,Central @ Coors scheduled at 1:47 PM,-106.72526,35.07885
6407,1.9 MPH,1:37:47 PM,Copper @ 5th scheduled at 1:35 PM,-106.6496,35.08487
6410,0.0 MPH,1:37:52 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.60369,35.07979
6403,0.0 MPH,1:37:57 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.57087,35.10306
6903,0.0 MPH,1:37:56 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.67159,35.09515
6409,0.0 MPH,1:39:14 PM,Louisiana @ Central (Rapid) scheduled at 1:36 PM,-106.56872,35.07595
6904,18.0 MPH,1:38:19 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63713,35.08265
6411,0.0 MPH,1:39:19 PM,Central @ Coors scheduled at 1:47 PM,-106.72527,35.07884
6407,3.1 MPH,1:39:10 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.65295,35.086
6410,31.1 MPH,1:39:16 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.61167,35.08084
6403,5.6 MPH,1:39:23 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.5707,35.10368
6903,23.0 MPH,1:39:20 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.67011,35.09438
6409,26.1 MPH,1:40:37 PM,Louisiana @ Lomas scheduled at 1:38 PM,-106.56848,35.07723
6904,24.9 MPH,1:40:10 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63585,35.08255
6411,0.0 MPH,1:40:31 PM,Central @ Coors scheduled at 1:47 PM,-106.72526,35.07884
6407,23.0 MPH,1:40:36 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.65675,35.0863
6410,34.2 MPH,1:40:40 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.61567,35.0811
6403,0.0 MPH,1:40:42 PM,Indian School @ Louisiana scheduled at 1:40 PM,-106.56875,35.10221
6903,23.6 MPH,1:40:32 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.66327,35.08892
6409,0.6 MPH,1:41:49 PM,Indian School @ Uptown Loop Road scheduled at 1:42 PM,-106.56849,35.08691
6904,24.9 MPH,1:41:55 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.63585,35.08255
6411,0.0 MPH,1:41:55 PM,Central @ Coors scheduled at 1:47 PM,-106.72526,35.07884
6407,0.0 MPH,1:41:58 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.65822,35.08648
6410,28.6 MPH,1:41:52 PM,Central @ Yale (UNM) scheduled at 1:44 PM,-106.6216,35.08113
6403,0.0 MPH,1:42:01 PM,Louisiana @ Lomas scheduled at 1:45 PM,-106.56779,35.10184
6903,23.0 MPH,1:41:55 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.65819,35.08629
6409,41.0 MPH,1:43:13 PM,Indian School @ Uptown Loop Road scheduled at 1:42 PM,-106.56863,35.09282
6904,24.9 MPH,1:42:15 PM,Central @ Cedar (Rapid) scheduled at 1:33 PM,-106.6217,35.08093
6411,0.0 MPH,1:43:19 PM,Central @ Coors scheduled at 1:47 PM,-106.72528,35.07883
6407,9.9 MPH,1:43:10 PM,Central @ Rio Grande (Rapid) scheduled at 1:40 PM,-106.661,35.08784
6410,34.2 MPH,1:43:16 PM,Central @ Mulberry (Rapid) scheduled at 1:47 PM,-106.62524,35.08125
6403,16.8 MPH,1:43:22 PM,Louisiana @ Lomas scheduled at 1:45 PM,-106.56635,35.10156
6903,19.3 MPH,1:43:19 PM,Gold @ 5th (Rapid) scheduled at 1:44 PM,-106.6549,35.084