Use OpenShift to Map River Levels With Flask, MongoDB, Leaflet, and OpenStreetMaps


The application I’m going to write about here actually had its start a
few years back. I have a number of friends in the whitewater paddling
community around Boston and at the time, I had started playing around
with some of the “Web 2.0″ (now that’s a dated term!) shiny-ness such as
Google Maps and the Google Maps API. What if, one of those friends
asked, I wrote an app that would let us see at a glance the water levels
for the local rivers we paddle?

I crated that application and you can see it running on OpenShift at http://wwos-bitmason.rhcloud.com/

I spent a fair bit of time poking at the idea but I never got to a
working application. There were just too many moving parts. The USGS,
which maintains the gauges, didn’t have an easy way to get at the data;
I would have had to scrape it off Web pages. I would have had to run a
back-end PHP application and setup an associated database to
periodically do the scraping. And the Google Maps API, which I was
planning to use to display the data, was in relatively early days. I
periodically thought about revisiting the idea, but all those moving
parts made it all seem like too much bother just to get to the point
where I could, well, code the app.

Fast forward to OpenShift community day in Boston this past June. If
you’re reading this, you probably know what
OpenShift is. But in case you don’t, it’s
Red Hat’s Platform-as-a-Service, a cloud computing service model that
aims to make life easier for developers by abstracting away underlying
infrastructure details that aren’t typically relevant to writing Web
applications (such as the provisioning and patching of operating
systems). For our purposes here, I’m going to focus on the hosted Online
version of OpenShift, although there are also enterprise and community
(OpenShift Origin) versions.

Anyway, my colleague Steven Citron-Pousty showed me a demo he had put
together that displayed the location of national parks as pins on a map
that could be scaled and zoomed. Jaw drops. This was exactly what I had
been trying to do! I still had some details to work out–such as getting
the data–but SteveCP had shown me a basic approach that also, not
incidentally, used OpenShift to greatly simplify getting a lot of the
infrastructure set up and operating. It doesn’t eliminate the supporting
components–and, indeed, one of the nice things about OpenShift is that
it gives you a rich choice of languages, frameworks, and other tooling.
But it eliminates much of the unproductive busywork associated with
installing and configuring software on bare metal or on a bare VM. 

But enough preliminaries. Onto the app! (I also encourage you to check
out Part
1

and Part
2

of this series of posts by SteveCP which cover overlapping topics. I’ll
move relatively quickly over material detailed in those posts.)

1. Architecture

I’ll start by briefly discussing how everything fits together so that
you can better understand why we’re doing what we’re doing when we get
into the details.

The application runs on a small gear on OpenShift Online. A gear is an
isolated application container and you get three small gears with a free
OpenShift account. The application is written in Python 2.6 and
Flask, a “micro framework” for Python based
on Werkzeug and Jinja 2. (A couple of utility programs I wrote to
manipulate the data are also written in Python.) Flask templating is
used to display an HTML/CSS/JS file that uses Leaflet, a “a modern
open-source JavaScript library for mobile-friendly interactive maps,” to
display the pins and map tiles (which are from OpenStreetMaps). The data
is obtained from a USGS Web service (returning JSON) and stored in a
MongoDB database. 

As you can see, that is a lot of moving parts but it’s surprisingly
manageable with OpenShift. Indeed, I probably spent more time wrestling
with complexities of data formats and retrieval than I did with the rest
of the code. For the purposes of this post, I’m going to focus on the
application and only deal briefly with the particulars of the USGS data
which I’ll cover in more detail in a future post.

2. Building the Foundation

First we setup the infrastructure. I assume that you have the rhc client
tools and git installed and have a basic understanding of working with
and pushing code to the OpenShift service. The name of the application
is wwos. 

SteveCP’s posts listed above provide some more detail, but here are the
basic steps:

Install Python:

rhc app create -t python-2.6 -a wwos

Use the Flask quickstart (or go to the repository and clone the files
manually):

cd wwosgit remote add upstream -m master git://github.com/openshift/openshift-mongo-flask-example.gitgit pull -s recursive -X theirs upstream master

(Ideally Flask would be packaged in a cartridge. We just haven’t done so
yet. However, with the new v2 cartridge architecture, it should now be
quicker and easier to create new cartridges.)

Add MongoDB:

rhc cartridge add mongodb-2.2 -a wwos

Add Cron:

rhc cartridge add cron-1.4 -a wwos

Now… Actually that’s all there is to get things setup. Not bad. Onto
coding.

3. Create a File with the Initial Data

The USGS won’t let you pull data associated with all their gauges at one
time; you have to specify at least one “major filter.” It turns out, for
reasons I won’t delve into in this post, that the best approach seems to
be to create a list of two letter lowercase state abbreviations (plus DC
and Puerto Rico) and iterate over the list thusly:

statelist = ["al","ak","az","ar","ca","co","ct","de","dc","fl","ga","hi","id","il","in","ia","ks","ky","la","me","md","ma","mi","mn","ms","mo","mt","ne","nv","nh","nj","nm","ny","nc","nd","oh","ok","or","pa","ri","sc","sd","tn","tx","ut","vt","va","wa","wv","wi","wy","pr"]

for i in statelist:    requesturl = "http://waterservices.usgs.gov/nwis/iv/?format=json,1.1&stateCd=" + i +"&parameterCd=00060,00065&siteType=ST"    req = urllib2.Request(requesturl)    opener = urllib2.build_opener()    f = opener.open(req)     entry = json.loads(f.read())

Each request returns all the gauges in the state of type “ST” (stream).

For each entry, we then iterate through the individual gauges and save
the values.

count = int (len(entry['value']['timeSeries']) - 1)

while count >= 0:

    agaugenum = entry['value']['timeSeries'][count]['sourceInfo']['siteCode'][0]['value']

    asitename = entry['value']['timeSeries'][count]['sourceInfo']['siteName']     alat = entry['value']['timeSeries'][count]['sourceInfo']['geoLocation']['geogLocation']['latitude']

    along = entry['value']['timeSeries'][count]['sourceInfo']['geoLocation']['geogLocation']['longitude']  

    agauge = {     "sitename": asitename,    "pos": [along, alat],    "flow": 0,    "height": 0,    "timestamp": 0,    "statecode": i     }

#output is an empty dictionary created outside all loops

output[agaugenum] = agauge      count = count - 1

A few comments:

  • Yes, the JSON returned by the USGS is quite baroque. That’s why I’m
    saving it for another post.
  • Count is the number of records to iterate over for each state.
  • Longitude and latitude are saved the way they are for reasons that
    will become clear in the next section.
  • Flow, height, and timestamp are placeholders for time-variant data.

Finally, once we’ve iterated through all the states (and all the gauges
within each state), we write out the data to a JSON-ish file.
(Mongoimport is a bit fussy about formats and I couldn’t get the
built-in Python dump functions to work properly.)

fileout = open('gaugesall.json', 'w')

for k in output:     agaugestr = '{ "_id" : "' + k + '",'    asitenamestr = ' "sitename" : "' + output[k]["sitename"] + '" ,'    astatecodestr = ' "statecode" : "' + output[k]["statecode"] + '" ,'    aposstr = ' "pos" : [' + str(output[k]["pos"][0]) + ', ' + str(output[k]["pos"][1]) + '] ,'    aflowstr = ' "flow" : "' + str(output[k]["flow"]) + '" ,'    aheightstr = ' "height" : "' + str(output[k]["height"]) + '" ,'    atimestampstr = ' "timestamp" : "' + str(output[k]["timestamp"]) + '" }'

    outstr= agaugestr + asitenamestr + astatecodestr + aposstr + aflowstr + aheightstr + atimestampstr

    fileout.write(outstr)    fileout.write("\n") fileout.close()

Your file should look like this but with a whole lot more lines.

{ "_id" : "08072760", "sitename" : "Langham Ck at W Little York Rd nr Addicks, TX" , "statecode" : "tx" , "pos" : [-95.646612, 29.86717035] , "flow" : "0" , "height" : "0" , "timestamp" : "0" }{ "_id" : "11055500", "sitename" : "PLUNGE C NR EAST HIGHLANDS CA" , "statecode" : "ca" , "pos" : [-117.141704, 34.11834458] , "flow" : "0" , "height" : "0" , "timestamp" : "0" }

4. Load the Data into MongoDB

Because OpenShift is platform as a service and not infrastructure as a
service there are limited places where you can write data to the server.
One of those locations is app-root/data (under the applications home
directory). That is where we will scp our data file to. 

scp gaugesall.json YOURSSH@wwos-YOURDOMAIN.rhcloud.com:app-root/data/

Now that we have the file on the server we can SSH in and then run the
mongoimport command:

mongoimport -d gauges -c gaugepoints --type json --file app-root/data/gaugesall.json -h $OPENSHIFT_MONGODB_DB_HOST -u admin -p $OPENSHIFT_MONGODB_DB_PASSWORD

This should import about 9,500 points into a collection called gauge
points in the gauges database.

Now create a 2d index to spatially enable your database.

mongo

>use gauges

>db.gaugepoints.ensureIndex( { pos : "2d" } )

(This is why we stored the longitude and latitude as we did previously.)

5. Create a Utility to Update the Database as a Cron Job

We now want write some code to add the flow, height, and creation time
to the gauges. For our purposes here, I’m going to describe doing so
with a python program that gets executed as a cron job by OpenShift. One
could also use virtually identical code that can be executed through
Flask in response to an HTTP request which allows for manual or
scheduled updates from outside OpenShift. I’ll discuss this briefly in
the next section.

Most of the code in this utility is the same as that used to create the
gaugesall.json file so I’m only going to describe the differences.

First, you need to establish a connection to the database.

conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL']) db = conn.gauges

Then we iterate through the states and get the count for a given state
as before, but now the code within that inner loop is as follows. (The
try/except code was to deal with some weird glitches with some gauges.)

while count >= 0:

    agaugenum = entry['value']['timeSeries'][count]['sourceInfo']['siteCode'][0]['value']

     variablecode = str(entry['value']['timeSeries'][count]['variable']['variableCode'][0]['variableID'])

    try:        variablevalue = str(entry['value']['timeSeries'][count]['values'][0]['value'][0]['value'])    except:        variablevalue = ""

    try:        creationtime = str(entry['value']['timeSeries'][count]['values'][0]['value'][0]['dateTime'])    except:        creationtime = ""

#Gage ht. ft. variableID 45807202

    if variablecode == '45807202':        db.gaugepoints.update({"_id":agaugenum},{"$set":{"height":variablevalue}})

#Discharge cfs variableID 45807197

    if variablecode == '45807197':        db.gaugepoints.update({"_id":agaugenum},{"$set":{"flow":variablevalue}}) 

    db.gaugepoints.update({"_id":agaugenum},{"$set":{"timestamp":creationtime}})

    count = count - 1     conn.close()

A couple of things to note. 

  • When we created the collection, we saved the gauge number
    (agaugenumber) as _id. This automatically makes it the primary
    index in MongoDB, which makes sense because we are making all the
    changes against that index.
  • The business with the variablecode comes about because each JSON
    record returned  by the USGS Web service is actually just for one
    variable. Thus, for most stream gauges, two records are returned:
    one for the height and one for the flow. (Other variables are
    ignored.)

You can now add this utility to your repo/.openshift/cron/daily
directory and push it to OpenShift. (See this post for more information
on
cron
.)

6. Filling out the Flask Framework

SteveCP discusses Flask in one of the earlier linked
posts

and I won’t repeat what he writes here. As a way of testing things out,
you probably want to write a couple of simple Flask functions. For
example, this function would return JSON showing all the records in the
collection if you go to http://wwos-YOURDOMAIN/rhcloud.com/ws/gauges.

@app.route("/ws/gauges")def gauges():    #setup the connection to the gauges database    conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL'])    db = conn.gauges

#query the DB for all the gaugepoints    result = db.gaugepoints.find()

#Now turn the results into valid JSON    return str(json.dumps({'results':list(result)},default=json_util.default))

You can also write a function that will update a single state thusly in
response to, say,

http://wwos-YOURDOMAIN/rhcloud.com/ws/gauges/state?st=ma

@app.route("/ws/gauges/update/state")def updatestate():

   statelist = ["al","ak","az","ar","ca","co","ct","de","dc","fl","ga","hi","id","il","in","ia","ks","ky","la","me","md","ma","mi","mn","ms","mo","mt","ne","nv","nh","nj","nm","ny","nc","nd","oh","ok","or","pa","ri","sc","sd","tn","tx","ut","vt","va","wa","wv","wi","wy","pr"]  #setup the connection to the gauges database    conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL'])    db = conn.gauges

    i = request.args.get('st')

    requesturl = "http://waterservices.usgs.gov/nwis/iv/?format=json,1.1&stateCd=" + i +"&parameterCd=00060,00065&siteType=ST"

Followed by the code we wrote for the earlier update function. (Updates
through the Web service interface seem to be more robust if you can do
them state by state to allow for failures of some requests–easy enough
to do in a script file.)

However, what we really need Flask to do for us is to return the points
within a bounding box defined by longitude/latitude pairs.

#find gauges within a lot/long bounding box passed in as query parameters (within?lat1=45.5&&lon1=-82&lat2=42&lon2=-84)@app.route("/ws/gauges/within")def within():    #setup the connection    conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL'])    db = conn.gauges

    #get the request parameters    lat1 = float(request.args.get('lat1'))    lon1 = float(request.args.get('lon1'))    lat2 = float(request.args.get('lat2'))    lon2 = float(request.args.get('lon2'))

    #use the request parameters in the query    result = db.gaugepoints.find({"pos": {"$within": {"$box" : [[lon1,lat1],[lon2,lat2]]}}})

    #turn the results into valid JSON    return str(json.dumps(list(result),default=json_util.default))

What’s going on here is that the framework is being passed a request
with query parameters and a result is returned using one of MongoDB’s
geo functions (which is why we had to spatially enable the pos field in
our collection earlier). But where does this query come from? To answer
that, I’m going to show you one more very short chunk of code from our
app and follow where that takes us.

If you were to type http://wwos-YOURDOMAIN/rhcloud.com/, you’d end up
here:

@app.route("/")def mainapp():    return render_template("index.html")

7. The Map

Flask uses Jinja2 templating and that’s about all I’m going to say on
that subject as our use of templates here is extremely simple. Suffice
it to say that, under your application’s
wsgi directory (where a file
called application calls your customized code), you create a directory
called templates and you put your index.html file there. This means
that if you type http://wwos-YOURDOMAIN/rhcloud.com into a browser, you
get to index.html (by way of Flask). As mentioned earlier, this
application uses Leaflet to display the map and pins. I’m not going to
go through all the details–Leaflet’s API information is pretty
good–but I’ll walk you through the overall flow in index.html leaving
out a lot of the CSS/styling/etc.

First we run the Leaflet JavaScript

 <script src="http://cdn.leafletjs.com/leaflet-0.5.1/leaflet.js"></script> 

Then we create a map and add a layer group to that map. (I center it
near Boston. You could write code to try to use your location.)

var map = L.map('map').setView([42.35, -71.06], 10);var markerLayerGroup = L.layerGroup().addTo(map);

We use OpenStreetMaps tiles. (Leaflet lets you use a variety of tile
sources.)

L.tileLayer('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', { maxZoom: 18, attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a> Written by <a href="http://www.bitmasons.com">Gordon Haff</a>. Running on OpenShift by Red Hat. <a href="http://bitmason.blogspot.com/p/blog-page_18.html"> About.</a>' }).addTo(map);

Now–drumroll please–we tie the whole application together. In response
to appropriate events (the map.on lines), we call getPins , which is
sending an http request back to the application which got us to
index.html in the first place.

function getPins(e){   bounds = map.getBounds(); url = "/ws/gauges/within?lat1=" + bounds.getNorthEast().lat + "&lon1=" + bounds.getNorthEast().lng + "&lat2=" + bounds.getSouthWest().lat + "&lon2=" + bounds.getSouthWest().lng; $.get(url, pinTheMap, "json")

Finally, we draw the pins using the data retrieved from that request.
(We check how far out the map is zoomed because, with over 9,000 points,
the performance becomes unacceptable if you try to draw too many on the
screen. 8–which corresponds to an area covering a couple of
average-sized states on most devices–seemed about right.) 

River Gauges Image

8. An Application is Never Really Done

As I mentioned earlier, I’ll be writing another post that gets into some
details about the USGS data and some of the challenges I encountered and
(mostly) overcame dealing with it. Beyond this particular dataset, you
could also use this same basic code to write applications that showed
many manner of places, whether points of interest or something else.

One thing that I should add to the application is the ability to
optionally geolocate to where the user is located. For reasons related
to how some browsers don’t always throw locationerror events, this
turned out to be harder to do well than anticipated so I left it out for
now.

Coming back to OpenShift, one thing I hope you take away is that–while
this app certainly has some complexities–setting up the basic
infrastructure was pretty straightforward and let you pretty much jump
right into retrieving data and coding. At the same time, OpenShift gave
me lots of options about how to go about developing this applications,
from language to database to framework.

Here’s the code on GitHub.

What’s Next?

Categories
MongoDB, OpenShift Online, Python
Tags
Comments are closed.