Free Text and Spatial Search with Spatial4J and Lucene Spatial - Archived

Lucene gives you a better way to spatially index  your document than this picture
Hey there, Shifters. One of my talks at FOSS4G 2013 covered Lucene Spatial. Todays post is going to follow up on my post about creating Lucene Indices by adding spatial capabilities to the index. In the end you will have a a full example on how create a fast and full featured full text spatial search on any documents you want to use.

How to add spatial to your Lucene index

In the last post I covered how to create a Lucene index so in this post I will just cover how to add spatial. The first thing you need to understand are the two pieces of how spatial is handled by Lucene. A lot of this work is done by Dave Smiley. He gave a great presentation on all this technology at Lucene/Solr Revolution 2013. If you really want to dig in deep, I suggest you watch his 1:15 h:m long video – my blog post is more the Too Long Didn’t Listen (TL;DL) version.

  • Spatial4J: This Java library provides geospatial shapes, distance calculations, and importing and exporting shapes. It is Apache Licensed so it can be used with other ASF projects. Lucene Spatial uses Spatial4J to create the spatial objects that get indexed along with the documents. It will also be used when calculating distances in a query or when we want to convert between distance units. Spatial4J is able to handle real-world on a sphere coordinates (what comes out of a GPS unit) and projected coordinates (any 2D map) for both shapes and distances.

Short aside: The oldest Java based spatial library is JTS and is used in many other Open Source Java geospatial projects. Spatial4J uses JTS under the hood if you want to work with Polygon shapes. Unfortunately, until recently it was LGPL and so could not be included in Lucene. JTS has announced it’s intention to go to a BSD type license which should allow Spatial4J and JTS to start working together for more Java Spatial goodness for all. One of the beauties of FOSS is the ability to see development discussions happen in the open.

  • Lucene Spatial After many different and custom iterations – there is now lucene spatial built right into Lucene as a standard library. It is new with the 4.x releases of Lucene. What Lucene spatial does is provide the indexing and search strategies for spatial4j shapes stored in a Lucene index. It has SpatialStrategy as the base class to define the signature that any spatial strategy must fulfill. You then use the same strategy for the index writing and reading.

Today I will show the code to use spatial4j with Lucene Spatial to add a spatially indexed field to your lucene index.

What is the flow to add spatial

  1. You create a spatial strategy which determines how things will be indexed
  2. Then for each “document”, you create one (or more) spatial objects to store in a field
  3. Then you use the spatial strategy to transform the object into a indexable field
  4. Add the field to the document (and any other fields you want to store or index)
  5. Add the document to the index
  6. Commit and Save your index

Let’s see the code

All of my code to make a lucene spatial index is up in github and it is just a modification of my original Lucene index builder which is explained in my previous blog post.

The only Class that has changed is because that is the only code that creates fields and documents. Minimal code change around your project is one of the benefits of loose coupling and good encapsulation.

At line 52 you can see the code where we set up the spatial pieces. The SpatialContext comes from spatial4j and is a factory used for generating spatial objects. So first we tell it that we are using unprojected coordinates and that we want submeter level of precision by having a precision level with 11 characters long geohash. We also create our SpatialStrategy to be a RecursivPrefixTree which seems to be the preferred tree in most of the general cases.

The only other changes happen to start at line 100. First we make a point using spatial4j with the coordinates parsed from the JSON file. Since it is quite possible that there can be multiple shapes in a Shape (think of multiple wells being drilled at a site but all with the same name) we iterate through the shapes. We create an indexable field and then add it to the document. This field will not return a value so we also add the coordinates as a string field. In this way we can return the coordinates as JSON to help with map making.

That’s it. We have now created a Lucene index which has “spatial smarts”. It took a bit of explaining but the actual code to carry this out is quite simple.

What’s Next

The next post in this series will create a spatial REST style web service with the same signatures as my MongoDB REST services. This is also the beauty of the “small w” web services that use REST. The entire implementation under the hoods can be different but the client consuming the JSON or XML doesn’t have to care. I love me some fun spatial coding!

Java, OpenShift Online
, ,
Comments are closed.