Tag Archives: Albuquerque

Albuquerque Open Data

18 Mar

I have put together several posts on Albuquerque Open Data and have felt bad that the examples were not live. I shutdown my old website and tried to use JsBin for a few examples, but I was not consistent and couldn’t upload third party JavaScript files. Well, I found a solution.

GitHub Pages to the Rescue

I have a GitHub repo. A GitHub Repo is a place to put code where you can harness the full power of git – branching, committing and others can pull and push code to your repo. GitHub also has a thing called GitHub Pages where you can host sites. I have one. And I am now using it to host my presentations. I do not own Albuquerque Open Data, but the community does. The City puts the data out there for us to use. As a community member, I decided to create a GitHub repo for Albuquerque Data and a GitHub page for it. You can download the code for all the examples, or run them. You can also grab a copy and modify thm. Lastly, you can add your own code examples to the repo.

The Examples

There are only a few examples thus far.

  • Aerial – Zoom to a location in ABQ. Double click to get back an aerial image of the map window.
  • Crime – Click the map to see incidents near you.
  • Density
  • Elevation – Draw a line with lots of segments to see an elevation profile.
  • Game – Click the map to add a marker. Drag the marker to a park to win.

The Code for the Examples

You can get the code for each example by clicking the link.

  1. Aerial
  2. Crime
  3. Density
  4. Elevation
  5. Game

You can also grab the whole Repo as a ZIP file.

Advertisements

Reverse Engineer City of Albuquerque Web Server Logs API

25 Feb

I recently stumbled across a Code For America experiment that shows live information from the City of Albuquerque Web Server. This is cool, but if you read this blog, you know I want the service! Give me that REST endpoint. I set out to reverse engineering their application and will show you two real-time endpoints with some different parameters.

The Application

The Code for America application is clearly hitting an endpoint. It’s just the way it has to work. So where is the endpoint? Well, a cursory glance at the code reveals two: realtime and historic. So hit the endpoint

https://shrouded-fjord-5635.herokuapp.com/realtime?

and you get the following error:

{“error”:{“errors”:[{“domain”:”global”,”reason”:”required”,”message”:”Required parameter: ids”,”locationType”:”parameter”,”location”:”ids”}],”code”:400,”message”:”Required parameter: ids”}}

Interesting. we need a parameter ids. Hit the endpoint with the parameter.

{“error”:{“errors”:[{“domain”:”global”,”reason”:”invalidParameter”,”message”:”Invalid value ”. Values must match the following regular expression: ‘ga:[0-9]+'”,”locationType”:”parameter”,”location”:”ids”}],”code”:400,”message”:”Invalid value ”. Values must match the following regular expression: ‘ga:[0-9]+'”}}

Now I know the value of

ids=ga: SOME NUMBER

The number is in the code, so I grabbed it. Reading the code, you can see the structure of the endpoint.

endpoint: function(){
return “/realtime?ids=ga:”+matrix.settings.profileId+”&metrics=rt:activeUsers&max-results=10”
}

So let’s give it a try.

{
“kind”: “analytics#realtimeData”,
“id”: “https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&metrics=rt:activeUsers”,
“query”: {
“ids”: “ga:48754”,
“metrics”: [
“rt:activeUsers”
],
“max-results”: 1000
},
“totalResults”: 1,
“selfLink”: “https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&metrics=rt:activeUsers”,
“profileInfo”: {
“profileId”: “48754”,
“accountId”: “86004”,
“webPropertyId”: “UA-86004-1”,
“internalWebPropertyId”: “97739”,
“profileName”: “2. http://www.cabq.gov All Traffic”,
“tableId”: “realtime:48754”
},
“columnHeaders”: [
{
“name”: “rt:activeUsers”,
“columnType”: “METRIC”,
“dataType”: “INTEGER”
}
],
“totalsForAllResults”: {
“rt:activeUsers”: “172”
},
“rows”: [
[
“172”
]
]
}

There you have it: 172 active users. We can pass another parameter

metrics=rt:pageViews

Then we see:

{“kind”:”analytics#realtimeData”,”id”:”https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&metrics=rt:pageviews”,”query”:{“ids”:”ga:48754″,”metrics”:[“rt:pageviews”],”max-results”:1000},”totalResults”:1,”selfLink”:”https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&metrics=rt:pageviews”,”profileInfo”:{“profileId”:”48754″,”accountId”:”86004″,”webPropertyId”:”UA-86004-1″,”internalWebPropertyId”:”97739″,”profileName”:”2. http://www.cabq.gov All Traffic”,”tableId”:”realtime:48754″},”columnHeaders”:[{“name”:”rt:pageviews”,”columnType”:”METRIC”,”dataType”:”INTEGER”}],”totalsForAllResults”:{“rt:pageviews”:”2172″},“rows”:[[“2172”]]}

I formatted the first result nicely and will just post JSON blobs so as not to take up to much space. Now you see 2172 views. My guess is that this is within either today or some time time frame of today.

What are people looking at? Lets change our metrics to pageviews and get the pagetitle. We also pass the largest possible result value.

metrics=rt:pageviews&dimensions=rt:pageTitle&max-results=10000

Right now, this is what we get.

{“kind”:”analytics#realtimeData”,”id”:”https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&dimensions=rt:pageTitle&metrics=rt:pageviews&max-results=10000″,”query”:{“ids”:”ga:48754″,”dimensions”:”rt:pageTitle”,”metrics”:[“rt:pageviews”],”max-results”:10000},”totalResults”:454,”selfLink”:”https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&dimensions=rt:pageTitle&metrics=rt:pageviews&max-results=10000″,”profileInfo”:{“profileId”:”48754″,”accountId”:”86004″,”webPropertyId”:”UA-86004-1″,”internalWebPropertyId”:”97739″,”profileName”:”2. http://www.cabq.gov All Traffic”,”tableId”:”realtime:48754″},”columnHeaders”:[{“name”:”rt:pageTitle”,”columnType”:”DIMENSION”,”dataType”:”STRING”},{“name”:”rt:pageviews”,”columnType”:”METRIC”,”dataType”:”INTEGER”}],”totalsForAllResults”:{“rt:pageviews”:”2127″},”rows”:[[“2015 Spring Green Waste Pick-up — City of Albuquerque”,”1″],[“4th Magic Treehouse: Dinosaurs after Dark — City of Albuquerque”,”1″],[“A Night in the 40’s: Big Band Swing — City of Albuquerque”,”1″],[“ABQ BioPark — City of Albuquerque”,”12″],[“ABQ RIDE Providing Free Transportation to Veterans with V.A. Hospital Cards — City of Albuquerque”,”5″],[“ABQ Recycles — City of Albuquerque”,”2″],[“ABQ View — City of Albuquerque”,”24″],[“ABQ Volunteers — City of Albuquerque”,”6″],[“ACH Services Information — City of Albuquerque”,”1″],[“About 311 — City of Albuquerque”,”1″],[“About Us — City of Albuquerque”,”2″],[“About the BioPark — City of Albuquerque”,”1″],[“About the Councilor — City of Albuquerque”,”1″],[“Activities & Education Programs — City of Albuquerque”,”1″],[“Activities Catalog — City of Albuquerque”,”1″],[“Address Atlas Pages — City of Albuquerque”,”2″],[“Address Query – Advanced Map Viewer — City of Albuquerque”,”5″],[“Address Report — City of Albuquerque”,”19″],[“Administration, Two AFSCME Unions Reach Agreement — City of Albuquerque”,”1″],[“Admission & Hours — City of Albuquerque”,”5″],[“Advanced Map Viewer User Guide — City of Albuquerque”,”1″],[“Advanced Map Viewer — City of Albuquerque”,”14″],[“Agendas from Greater Albuquerque Bicycling Advisory Committee Meetings — City of Albuquerque”,”1″],

Notice there were only 454 total results. Each record looks like

[“ABQ RIDE Providing Free Transportation to Veterans with V.A. Hospital Cards — City of Albuquerque”,”5″]

This is pageTitle comma PageViews as stated in the columnHeaders property below.

columnHeaders”:[{“name”:”rt:pageTitle”,”columnType”:”DIMENSION”,”dataType”:”STRING”},{“name”:”rt:pageviews”,”columnType”:”METRIC”,”dataType”:”INTEGER”}],

Where is the page with the title ABQ View – City of Albuquerque? Using pagePath we can find out.

{“kind”:”analytics#realtimeData”,”id”:”https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&dimensions=rt:pagePath,rt:pageTitle&metrics=rt:pageviews&max-results=10000″,”query”:{“ids”:”ga:48754″,”dimensions”:”rt:pagePath,rt:pageTitle”,”metrics”:[“rt:pageviews”],”max-results”:10000},”totalResults”:704,”selfLink”:”https://www.googleapis.com/analytics/v3/data/realtime?ids=ga:48754&dimensions=rt:pagePath,rt:pageTitle&metrics=rt:pageviews&max-results=10000″,”profileInfo”:{“profileId”:”48754″,”accountId”:”86004″,”webPropertyId”:”UA-86004-1″,”internalWebPropertyId”:”97739″,”profileName”:”2. http://www.cabq.gov All Traffic”,”tableId”:”realtime:48754″},”columnHeaders”:[{“name”:”rt:pagePath”,”columnType”:”DIMENSION”,”dataType”:”STRING”},{“name”:”rt:pageTitle”,”columnType”:”DIMENSION”,”dataType”:”STRING”},{“name”:”rt:pageviews”,”columnType”:”METRIC”,”dataType”:”INTEGER”}],”totalsForAllResults”:{“rt:pageviews”:”2173″},”rows”:[[“/311/311-Information/about-311″,”About 311 — City of Albuquerque”,”1″],[“/311/311-Information/contacting-311″,”How to Contact 311 — City of Albuquerque”,”1″],[“/311/311-Information/contacting-311/dialing-311/index.html”,”Dialing 311 — City of Albuquerque”,”1″],[“/311/resident-services”,”Online Resident Services — City of Albuquerque”,”1″],[“/311/resident-services/online-resident-services/folder_summary_view?b_start:int=15&-C=”,”Online Resident Services — City of Albuquerque”,”1″],[“/311/resident-services/online-resident-services/folder_summary_view?b_start:int=30&-C=”,”Online Resident Services — City of Albuquerque”,”1″],[“/a-z”,”Albuquerque A-Z — City of Albuquerque”,”27″],[“/a-z/a-z”,”Albuquerque A-Z — City of Albuquerque”,”7″],[“/about/offsite.html”,”Leaving http://www.cabq.gov External Link Disclaimer — City of Albuquerque”,”1″],[“/abq-apps”,”ABQ Apps — City of Albuquerque”,”1″],[“/abq-data”,”ABQ Data — City of Albuquerque”,”1″],[“/abq-view/index.html”,”ABQ View — City of Albuquerque”,”27″],[“/abq-volunteers”,”ABQ Volunteers — City of Albuquerque”,”9″],[“/abq-volunteers/index.html”,”ABQ Volunteers — City of Albuquerque”,”1″],[“/acl_users/credentials_cookie_auth/require_login?came_from=https://www.cabq.gov/abq-volunteers/volunteering-announcement-form”,”City of Albuquerque”,”1″],[“/airport”,”Airport — City of Albuquerque”,”7″],[“/airport/airlines-flight-services”,”Airlines — City of Albuquerque”,”7″]

Here is a single record:

[“/311/311-Information/about-311″,”About 311 — City of Albuquerque”,”1″]

And appending the URL to cabq.gov gives us:

Capture

 

Now you can see what people are looking at on the City of Albuquerque website. Try gathering statistics and compile the most popular page for a day as Code for America has done.

I said I was only going to show the real-time but I had to grab historic.

Historic Data

You can grab historic data by changing the endpoint to /historic. The metric values change to:

start-date

end-date

They are formatted as YYYY-MM-DD. You can also pass dimensions of:

nthMinute

nthHour

And here is an example of a session from 2015-02-21 to 2015-02-23 using Hours:

https://shrouded-fjord-5635.herokuapp.com//historic?ids=ga:48754&dimensions=ga:nthHour&metrics=ga:sessions&start-date=2015-02-21&end-date=2015-02-23

{“kind”:”analytics#gaData”,”id”:”https://www.googleapis.com/analytics/v3/data/ga?ids=ga:48754&dimensions=ga:nthHour&metrics=ga:sessions&start-date=2015-02-21&end-date=2015-02-23″,”query”:{“start-date”:”2015-02-21″,”end-date”:”2015-02-23″,”ids”:”ga:48754″,”dimensions”:”ga:nthHour”,”metrics”:[“ga:sessions”],”start-index”:1,”max-results”:1000},”itemsPerPage”:1000,”totalResults”:72,”selfLink”:”https://www.googleapis.com/analytics/v3/data/ga?ids=ga:48754&dimensions=ga:nthHour&metrics=ga:sessions&start-date=2015-02-21&end-date=2015-02-23″,”profileInfo”:{“profileId”:”48754″,”accountId”:”86004″,”webPropertyId”:”UA-86004-1″,”internalWebPropertyId”:”97739″,”profileName”:”2. http://www.cabq.gov All Traffic”,”tableId”:”ga:48754″},”containsSampledData”:false,”columnHeaders”:[{“name”:”ga:nthHour”,”columnType”:”DIMENSION”,”dataType”:”STRING”},{“name”:”ga:sessions”,”columnType”:”METRIC”,”dataType”:”INTEGER”}],”totalsForAllResults”:{“ga:sessions”:”48097″},“rows”:[[“000000″,”242”],[“000001″,”110”],[“000002″,”91”],[“000003″,”91”],[“000004″,”106”],[“000005″,”147”],[“000006″,”308”],[“000007″,”671”],[“000008″,”912”],[“000009″,”1098”],[“000010″,”1119”],[“000011″,”1019”],[“000012″,”1016”],[“000013″,”959”],[“000014″,”901”],[“000015″,”880”],[“000016″,”800”],[“000017″,”665”],[“000018″,”693”],[“000019″,”646”],[“000020″,”610”],[“000021″,”543”],[“000022″,”443”],[“000023″,”289”],[“000024″,”219”],[“000025″,”133”],[“000026″,”100”],[“000027″,”84”],[“000028″,”81”],[“000029″,”149”],[“000030″,”231”],[“000031″,”524”],[“000032″,”744”],[“000033″,”859”],[“000034″,”894”],[“000035″,”848”],[“000036″,”861”],[“000037″,”892”],[“000038″,”860”],[“000039″,”832”],[“000040″,”842”],[“000041″,”809”],[“000042″,”744”],[“000043″,”650”],[“000044″,”711”],[“000045″,”583”],[“000046″,”483”],[“000047″,”305”],[“000048″,”220”],[“000049″,”115”],[“000050″,”98”],[“000051″,”56”],[“000052″,”127”],[“000053″,”269”],[“000054″,”481”],[“000055″,”787”],[“000056″,”1269”],[“000057″,”1467”],[“000058″,”1595”],[“000059″,”1613”],[“000060″,”1462”],[“000061″,”1446”],[“000062″,”1528”],[“000063″,”1545”],[“000064″,”1343”],[“000065″,”995”],[“000066″,”792”],[“000067″,”742”],[“000068″,”812”],[“000069″,”728”],[“000070″,”508”],[“000071″,”302”]]}

A single records contains the hour and the number of sessions:

[“000064″,”1343”]

What to the People of Albuquerque want from their Government?

By analyzing what the people of Albuquerque search for on the City website, we can come to some conclusions as to what they deem important or what concerns they have. To do so, we need the pages they looked at over a period of time. We can use the historic endpoint and pass pageTitle to find out.

https://shrouded-fjord-5635.herokuapp.com//historic?ids=ga:48754&dimensions=ga:nthHour&metrics=ga:pageviews&dimensions=ga:pageTitle&sessions&start-date=2015-02-21&end-date=2015-02-23

{“kind”:”analytics#gaData”,”id”:”https://www.googleapis.com/analytics/v3/data/ga?ids=ga:48754&dimensions=ga:pageTitle&metrics=ga:pageviews&start-date=2015-02-21&end-date=2015-02-23″,”query”:{“start-date”:”2015-02-21″,”end-date”:”2015-02-23″,”ids”:”ga:48754″,”dimensions”:”ga:pageTitle”,”metrics”:[“ga:pageviews”],”start-index”:1,”max-results”:1000},”itemsPerPage”:1000,”totalResults”:3090,”selfLink”:”https://www.googleapis.com/analytics/v3/data/ga?ids=ga:48754&dimensions=ga:pageTitle&metrics=ga:pageviews&start-date=2015-02-21&end-date=2015-02-23″,”nextLink”:”https://www.googleapis.com/analytics/v3/data/ga?ids=ga:48754&dimensions=ga:pageTitle&metrics=ga:pageviews&start-date=2015-02-21&end-date=2015-02-23&start-index=1001&max-results=1000″,”profileInfo”:{“profileId”:”48754″,”accountId”:”86004″,”webPropertyId”:”UA-86004-1″,”internalWebPropertyId”:”97739″,”profileName”:”2. http://www.cabq.gov All Traffic”,”tableId”:”ga:48754″},”containsSampledData”:false,”columnHeaders”:[{“name”:”ga:pageTitle”,”columnType”:”DIMENSION”,”dataType”:”STRING”},{“name”:”ga:pageviews”,”columnType”:”METRIC”,”dataType”:”INTEGER”}],”totalsForAllResults”:{“ga:pageviews”:”129345″},”rows”:[[“\”A Senior I Know\” Essay Contest — City of Albuquerque”,”30″],[“\”On the Commons\” Magazine Features Albuquerque Open Space — City of Albuquerque”,”3″],[“\”We Love Our BioPark\” Video Contest Launches — City of Albuquerque”,”3″],[“\”Website Transparency Team\”: Week of April 18, 2011 — City of Albuquerque”,”1″],[“‘Pacific Coral Reef’ Exhibit Now Open — City of Albuquerque”,”1″],[“(Memorial) Tribute Tree Program — City of Albuquerque”,”3″],[“(not set)”,”13″],[“01-28-2015 – 15-11-104F – Follow-up – On-Call Contractors – Department of Municipal Development — City of Albuquerque”,”1″],[“01-28-2015 – 15-12-107F – Follow-up – Health and Social Service Centers – Department of Family and Community Services — City of Albuquerque”,”1″],[“02-17-12 CASE# 12-202 Whistleblower Complaint against the Solid Waste Management Department — City of Albuquerque”,”1″],[“02-28-07 — 06-04-105F Follow-Up – Environmental Health Department Expenditures — City of Albuquerque”,”2″],[“04-16-09 — 07-204 – Final Investigative Report – Stolen Vehicles, SWMD — City of Albuquerque”,”1″],[“06-25-12 Case# 12-213 Collision Involving a Motor Coach Operator — City of Albuquerque”,”1″],[“06-25-2014 Case # 204 Water Authority — City of Albuquerque”,”1″],[“1 – Educated, literate residents — City of Albuquerque”,”1″],[“1-11-10 RYAB Facilitated Meeting Notes — City of Albuquerque”,”1″],[“1.1 Adult Educational Achievement Rates — City of Albuquerque”,”2″],[“10 – Basic needs provided for — City of Albuquerque”,”1″],[“10-Year Spending Trends — City of Albuquerque”,”8″],[“10.1 Residents Living in Poverty — City of Albuquerque”,”1″],[“10.2 Unemployment Rate — City of Albuquerque”,”1″],[“11 – The public is safe — City of Albuquerque”,”1″],[“11-19-07 — 07-202 – Investigation – Alleged Excessive Overtime Claim, ABQ Ride Department — City of Albuquerque”,”1″],

Conclusion

I have given you the endpoints. It is up to you to parse out the data. The code for that is on almost every post on this blog. Just grab any AJAX request up to JSON.parse() and grab results.rows[x].[0 through 1]. Something like that should be close.

 

Improving Albuquerque Crime Incidents Query

10 Feb

In my last post, I showed how to query the Albuquerque Crime Incident. This post will improve on it.

In my original post, you had to do the following:

  1. Catch a point on click
  2. Draw a buffer
  3. Get the envelope of the buffer
  4. Query

There was nothing wrong with this, but it does bring in incidents beyond a half mile – the envelope is a square representation of the buffer. But we used the envelope parameter in the REST endpoint so we had no choice.

Well, let’s use the polygon parameter and only grab points in our buffer!

In the original code, after the line:

buffered = turf.buffer(b,.5,”miles”);

add the line below.

g='{“rings”:’+JSON.stringify(buffered.features[0].geometry.coordinates)+’}’;

You can remove the envelope line from the original code.

Now, just change your parameters to reflect the geometryType=Polygon and the new geometry=g.

var params=”f=json&outSR=4326&outFields=*&geometryType=esriGeometryPolygon&spatialRel=esriSpatialRelIntersects&inSR=4326&geometry=”+g;

You will now have incidents only in your buffer.

Capture

Query Albuquerque Open Data with Turf.js

9 Feb

I have posted on the Albuquerque Crime Incident data showing how to use Python and for converting the Date field. I have also demostrated how to use Turf.js for analysis. In this post, I will show you how to combine the two to limit your query of the Albuquerque Crime Incident Data.

The Idea

The goal of this post is to allow a user to click on a map and get back all the crime data within a half mile of the click. The end result will look like the image below.

Capture

 

Create the Geometry

To make this work, we need to catch where the user clicks on the map and convert it to GeoJSON so we can use it in our Turf.js operations without having to convert it to a Turf.point.

map.on(“click”,function(e){

a=L.marker(e.latlng);
b=a.toGeoJSON();

});

Now that we have a point, we need to create the buffer and then grab the envelope.

var buffered = turf.buffer(b,.5,”miles”);
var result = turf.featurecollection(buffered.features.concat(b));
enveloped = turf.envelope(result);

Query the Service

With the envelope, we can query the REST endpoint and pass the lower left and upper right coordinates to get the Crime Incidents in the bounding box. I have done several posts on making an AJAX request to ArcServer and will not go in to detail here. Just note that we will have two parameters that I do not use often.

geometryType=esriGeometryEnvelope

spatialRel=esriSpatialRelIntersects

And a geometry parameter that will take the corner points of the envelope. The code below shows how to make the query.

var params=”f=json&outSR=4326&outFields=*&geometryType=esriGeometryEnvelope&spatialRel=esriSpatialRelIntersects&inSR=4326&geometry=”+enveloped.geometry.coordinates[0][0][0]+”,”+enveloped.geometry.coordinates[0][0][1]+”,”+enveloped.geometry.coordinates[0][2][0]+”,”+enveloped.geometry.coordinates[0][2][1];
var url = “http://coagisweb.cabq.gov/arcgis/rest/services/public/APD_Incidents/MapServer/0/query”;
console.log(params);
http=new XMLHttpRequest();
http.open(“POST”, url, true);
http.setRequestHeader(“Content-type”, “application/x-www-form-urlencoded”);
http.onreadystatechange = function() {//Call a function when the state changes.
if(http.readyState == 4 && http.status == 200) {
console.log(http.responseText);
var result= JSON.parse(http.responseText);
for(x=0;x<Object.keys(result.features).length;x++){
xy=result.features[x].geometry;
console.log(result.features[x].geometry);
adate=new Date(result.features[x].attributes.date);
L.marker([result.features[x].geometry.y,result.features[x].geometry.x]).bindPopup(“<h3>”+result.features[x].attributes.CVINC_TYPE+”</h3><p>”+adate.toString()).addTo(map);}
}}

http.send(params);

I added a popup showing the crime type and then formatted the epoch data to a human readable string. Now if you click on the map, you will get back a series of incidents. To improve on this, you could pass a specific date range but the data itself is for the last 180 days (rolling, meaning tomorrow one day will be dropped off).

Learning R with Albuquerque Open Data

19 Jan

I have posted several tutorials on using REST endpoints from the City of Albuquerque, but the city has many data sets available in CSV that have no REST endpoint. In this post, I am going to show you how to use RStudio to perform simple statistics and generate graphs from the data.

The first thing we need to do is get a copy of R – a free statistical computing environment. We also need RStudio. This will make working with R much easier.

Once you have everything installed, go the the Albuquerque Open Data page. Scroll to the bottom and select “Graded Employee Earnings.” Download the csv file. I renamed the file “pay.txt” just to make things easier later. This file contains all the graded city employee names, their base pay and ytd earnings. Should be an interesting data set to play with.

Now launch your newly installed RStudio.

The details presented below will be very minimal. It is my intent to show you how quickly you can accomplish basic statistical analysis in R, not to provide an in depth tutorial on R. There are many free tutorials, a long one at the r-project and a shorter one at the r-project.

Start by opening the data. The data is actually tab delimited, so set that as the sep value.

data <- read.delim(file=”pay.txt”,header=TRUE, sep=”\t”, quote=”\””)

You can see what is in it by typing data, but it is a long list so we will use the head function to get the first few rows with a header.

head(data)

If you only want a specific column you can type

data[“BASE.HOURLY.RATE”]

or multiple columns

data[c(“BASE.HOURLY.RATE”,”DEPARTMENT”)]

You can grab some summary statistics. I only want to know about the hourly wage so I will pass that column as shown earlier.

summary(data[“BASE.HOURLY.RATE”])

This should result in the image below

summary

Half of all graded employees make less then $17.48 an hour and there is someone making $59.37. There is not a single graded employee making less than $8.75.

Graphs help us visualize data, so let’s create a frequency graph for hourly rate. We need to know the range of the data – the summary told us (take the min and max), but we will use the range function.

range(data[“BASE.HOURLY.RATE”])

We need groups for our counts so we will create breaks using the range. I rounded down on the min and up on the max.

breaks=seq(8,60,by=2)

Now I want to grab the data without the header and cut it in to the breaks(groups).

mydata=data[“BASE.HOURLY.RATE”][,1]

mydata.cut=cut(mydata,breaks,right=FALSE)

Now combine the data and plot it.

cbind(mydata.freq)
plot(mydata.freq)

In the lower right corner of your RStudio window, you will see your chart. You can export it to several formats. It should look like the image below.

RplotFreq

Hourly wages start to fall after about $17 an hour but jump up again at around $27. It looks like most people make between $12 to $22. This is confirmed by our summary earlier (the average is $19.04).

You can easily generate a histogram

hist(mydata)

RplotHistogram

Or a box plot

boxplot(mydata,horizontal=TRUE)

RplotBoxplot

You can graph the density. This will further confirm our belief that the majority of employees make $12-$22.

den=density(mydata)
plot(den, main=”Density of Base Hourly Rate”)

RplotDensity

I can see a tail on the right side of all three graphs. Let’s check the skewness to confirm it. To do so, we need to install a package called moments, import it and then execute the function.

install.packages(“moments”)
library(moments)
skewness(mydata)

The result is 1.100316. A positive number indicates a skew to the right. I was correct.

The last thing I want to do is find the median hourly wage by department. I will grab the department name and hourly wage columns and then aggregate them.

subset <-data[c(“DEPARTMENT”,”BASE.HOURLY.RATE”)]
theMedian <-aggregate(x=subset[“BASE.HOURLY.RATE”],FUN=median,by=list(Group.Department=subset$DEPARTMENT))
theMedian

The results are shown in the image below.

agg

Council Services has the highest median hourly wage and Family and Community Services has the lowest.

R is a very powerful tool for statistical analysis and RStudio helps make it even easier. In two lines of code we had the Min,Median,Mean, Max, 1st and 3rd Quartiles. We also generated graphs with single lines of code and saved them out as images. If you need to perform data wrangling, R could be a great solution for you. I am going to make a point to try to learn more. Can never have enough tools in your belt.

 

ESRI REST and Python: pandas.DataFrame

13 Jan

I have been working in JavaScript a lot recently and have missed coding in python. I wanted to work on something that would have real world value so I decided to see if I could load an ESRI REST endpoint in to a pandas DataFrame. This post will only scratch the surface of what you can do with data in python but should give you an idea of what is possible and allow you to imagine some really interesting possibilities. I have posted all of the code as a static IPython notebook and on Python Fiddle for easy copy+paste.

Getting Data

We are going to start by grabbing some open data from the City of Albuquerque – I will use the Crime Incidents REST service. The first step is to make the request and read it.

import urllib, urllib2
param = {‘where’:’1=1′,’outFields’:’*’,’f’:’json’}
url = ‘http://coagisweb.cabq.gov/…/APD_Incidents/MapServer/0/query ?’ + urllib.urlencode(param)
rawreply = urllib2.urlopen(url).read()

If you print rawreply, you will see a long string. We know it is a string because if you print rawreply[0] you will see { as the result. To confirm it, you can type(rawreply) and get back str.

Working with the Data

Now we need to convert it to JSON so that we can do something with the data.

import simplejson
reply = simplejson.loads(rawreply)
print reply[“features”][0][“attributes”][“date”]
print reply[“features”][0][“attributes”][“CVINC_TYPE”]

The above code will return something like

1405468800000
LARCENY ALL OTHER

How many features do we have? 24,440.

print len(reply[“features”])
reply[“features”][:3]

The above code will give you the count and show you the first three results. You can grab subsets of the data by splicing. If you want records 2 through 4, you can reply[“features”][2:5].

Let’s get to the good stuff.

Pandas DataFrame

JSON is great, but let’s put the data in to a table (DataFrame). First, a disclaimer. ESRI REST services return a deeply nested JSON object. Converting this to a DataFrame is more difficult than:

import pandas.io.json as j
jsondata = j.read_json(the URL)

But it is not much more difficult. We just need to recreate the JSON object grabbing what we are interested in – the attributes of the features. We need a loop that will  create an array of dictionaries. Each dictionary will be the attributes of a single feature.

count = 0
myformateddata=[]
while (count < len(reply[“features”])):
mydict={}
for key, value in reply[“features”][count][“attributes”].iteritems():
mydict[key]= value
myformateddata.append(mydict)
count = count + 1

The code above initializes a counter, array and a dictionary. It then loops until there are no more features. The loop reads the attributes of the feature and creates a dictionary for each attribute and its value. Now we have an array of dictionaries. If you len(myFrame) you will get 24,440. That is the same number of records we had in our JSON object.

We have all our data and it is an array. We can now create a DataFrame from it.

from pandas import DataFrame as df
myFrame = df(data=myformateddata)
myFrame

The code above imports the tools we need, assigns the frame to a variable and then prints it. You should see a table like the image below.

DataFrame with Crime Incidents from ESRI REST endpoint

DataFrame with Crime Incidents from ESRI REST endpoint

Data Manipulation

So what can we do with a DataFrame? First, let’s save it. We can export the DataFrame to Excel (The City of Albuquerque provides open data in several formats, but other sources may only provide you with the REST API so this would be a valuable method for converting the data).

from pandas import ExcelWriter
writer = ExcelWriter(‘CrimeIncidents.xlsx’)
myFrame.to_excel(writer,’Sheet1′,index=False)
writer.save()

Great! the data has been exported. Where did it go? The code below will return the directory with the file.

import os
os.getcwd()

You can open the Excel file and work with the data or send it to someone else. But let’s continue with some basic manipulation in the DataFrame.

How many of each type of incident do we have in our data?

myFrame[“CVINC_TYPE”].value_counts()

Incidents Summarized

Incidents Summarized

Now I would like a barchart of the top 5 most frequent Incident Types.

import matplotlib.pyplot as plt
incidentType=myFrame[“CVINC_TYPE”].value_counts()[:5]
incidentType.plot(kind=’barh’,rot=0)

My Bar Chart of the 5 most common Incidents

My Bar Chart of the 5 most common Incidents

Lastly, let’s filter our results by date.

import datetime
import time
myDate=datetime.datetime(2015,1,10)
milli=time.mktime(myDate.timetuple()) * 1000+ myDate.microsecond / 1000

The above code creates a date of January 10, 2015. It then converts it to milliseconds – 1420873200000. If we pass the date to the DataFrame we can grab all the Incidents after January 10, 2015.

myFrame[(myFrame.date>milli)]

DataFrame for Incidents After January 10, 2015

DataFrame for Incidents After January 10, 2015

Now you know how to connect to an ESRI REST endpoint, grab the data, convert it to JSON and put it in a DataFrame. Once you have it in the DataFrame, you can now display a table, export the data to Excel, plot a barchart and filter the data on any field.

Handling Dates in Open Data with JavaScript and Python

13 Jan

Have you ever opened public data and been greeted by a date field with a value like 1419490800000? If it is not obvious to you that the number represents December 25, 2014 then you are not alone. The date has been returned to you in milliseconds since January 1, 1970 at 00:00:00. It might be difficult for you and me to do the math and get a human readable value but for the computer it is a trivial operation. In this article, I will show you how to manipulate a date in milliseconds using data from cabq.gov.   Albuquerque has a growing collection of open data and one newer data sets is Crime Incidents. For this example. let’s grab the REST endpoint. Scrolling to the bottom of the page and clicking the supported operation link QUERY, we can look at the data before coding against it. In the form, set the where field to 1=1, the out fields to * and click the Query (GET) button. It will take a few minutes, but I received 24,629 records (this data set returns incidents for the last 180 days).

Sample of my query results

Sample of my query results

As you can see, the date field is in milliseconds. We will move on to writing a webpage to grab the data and return the date in a variety of other formats – which will be more useful for building our applications.

JavaScript Date

There are many date libraries available in JavaScript but for this example we will only use the built-in date object. The first thing we need to do is grab some data. I will start by using a query to grab a subset of incidents from 1/9/2015 until today. While the data is returned in milliseconds, we can query it using a human readable date format. Below, I have created a variable params by concatenating several strings. The strings use the escape() function because browsers tend to garble URLs with special characters and the function allows us to pre-encode them so we know what the conversion of the characters will be.

q1=’where=’; q2=escape(‘”date”>date’); q3=escape(” ‘2015-01-09 00:00:00′”) ; q4=”&outfields=*&f=json”; var params=q1.concat(q2,q3,q4);

Now that we have the parameter to pass, the rest is a standard AJAX request to the URL of the service.

var url = “http://coagisweb.cabq.gov/…/query “; http=new XMLHttpRequest(); http.open(“POST”, url, true); http.setRequestHeader(“Content-type”, “application/x-www-form-urlencoded”); http.onreadystatechange = function() { if(http.readyState == 4 && http.status == 200) { //Code goes here }} http.send(params);

Now we have queried the data and have an if statement that will run when the server hands us our results. The first thing to do is grab the response and convert it to JSON. Then we will grab the date from the first result and display it in the console.

theGeom= JSON.parse(http.responseText); adate=new Date(theGeom.features[0].attributes.date); console.log(“RESPONSE FROM ESRI: ” + theGeom.features[0].attributes.date);

If we were successful, you should see a millisecond date in the console. Using the date object, you can convert milliseconds to any of the below formats. Note that months are indexed starting at 0. That means January is month 0 and December is month 11. You need to add 1 to a month for it to be correct.

console.log(“TIME STRING: ” + adate.toTimeString()); //0=january 11 = december console.log(“MONTH: ” + adate.getMonth()+1); console.log(“DAY: ” + adate.getDay()); console.log(“YEAR: ” + adate.getFullYear()); console.log(“HOURS: ” + adate.getHours()); console.log(“MINUTES: ” + adate.getMinutes()); console.log(“SECONDS: ” + adate.getSeconds()); console.log(“ISO 8601: ” + adate.toISOString()); console.log(“UTC: ” + adate.toUTCString()); console.log(“STRING: ” + adate.toString()); console.log(“LOCALE: ” + adate.toLocaleDateString());

Results in the console

Results in the console

Using Python

Using Python, we can do exactly the same thing as the JavaScript example, but this time we will run the code of our desktop. We will need to import the urllibs and datetime. I have chosen to use a third party library simplejson.

import urllib, urllib2, datetime, simplejson

To query a REST endpoint in Python, you will:

  1. create a parameters object
  2. concatenate it to a url
  3. then open the url and read the reply.

param = {‘where’:’1=1′,’outFields’:’*’,’f’:’json’} url = ‘http://coagisweb.cabq.gov/…/MapServer/0/query? ‘ + urllib.urlencode(param) rawreply = urllib2.urlopen(url).read()

If you print rawreply, you will see a large string. You will need to convert the string to JSON. This is where I used simplejson.

reply = simplejson.loads(rawreply) print reply[“features”][0][“attributes”][“date”] print reply[“features”][0][“attributes”][“CVINC_TYPE”]

Your program should have printed out the millisecond date for the first result and the incident type. Now, we can assign the date to a variable and convert it to a readable format.

d=reply[“features”][0][“attributes”][“date”] readable = datetime.datetime.fromtimestamp(d/1000) print readable

The above code should print out something like: 2014-07-14 18:00:00. Now that you have the readable date, you can extract and convert it.

print “YEAR: ” + str(readable.year) print “MONTH: “+ str(readable.month) print “DAY: “+ str(readable.day) print “HOUR: ” + str(readable.hour) print “MINUTE: ” + str(readable.minute) print “SECOND: ” + str(readable.second) print “ISO: ” + str(readable.isoformat())

When all the code is executed, you will see the image below.

Console output of python code

Console output of python code

There you have it, converting milliseconds to a human readable date using two different languages.