Geospatial data management is one of the flavors of the month:
- Last week, Teradata claimed it has the most sophisticated analytic geospatial data management capability.
- Also last week, Netezza’s newly acquired Netezza Spatial technology attracted a lot of attention.
- This week, Oracle called attention to its geospatial capabilities.
So I asked Netezza and Teradata what this geospatial analytics stuff is all about.
The first thing to note is that OLTP/general-purpose DBMS and analytic DBMS handle geospatial data differently. That is, most serious general-purpose RDBMS use an indexing scheme like r-trees, which excel at management of individual geographic-coordinate database records. Analytic DBMS vendors, however, who focus on focusing large sets of data, implement geospatial datatypes as user-defined functions (UDFs) or the equivalent, with no special indexing. Instead, they focus on parallel execution of analytics, and integration with other analytic parallel processing.
Therefore — IBM customers perhaps aside — nobody is using geospatial indexes integrated into a high-performance parallel data warehouse DBMS. Netezza claims that it turns out to be a little faster to do geospatial analytics in standalone “silos” than integrated into OLTP DBMS – or at least, that’s the choice most users have made to date. I suspect that Netezza’s view of the market is naturally biased towards those who have already determined general-purpose DBMS’ geospatial analytics capabilities aren’t fast enough. But anyhow, it’s perfectly plausible that MPP geospatial queries run a whole lot faster than SMP ones.
Except in the one marketing claim noted above, Teradata has been pretty quiet about its geospatial capabilities. The Teradata Geospatial Extension product page is laughably sparse. When pressed, Teradata grudgingly confessed to having several deployed geospatial customers, but offered no details as to industry, uses, etc., or even whether the customers were classified – and by the way, would I please be so kind as not to identify the Teradata person or people who told me even that much?
As you might guess, I also had trouble getting a clear sense of why Teradata thinks its geospatial capabilities are more “sophisticated” than Netezza’s. But I gathered it had something to do with a really cool way of parallelizing UDFs, and perhaps also of integrating UDFs with each other or with normal database operations.
Netezza, by contrast, has been quite visible on the geospatial front. According to Phil Francisco and Razi Raziuddin of Netezza, business highlights of the Netezza Spatial story include:
- Last year, when Netezza announced the Netezza Developer Network, one of the members was building geospatial datatype support. Netezza recently acquired that technology, brought it inhouse, and called in Netezza Spatial.
- Like other NDN technology, Netezza Spatial is written in C and compiled onto the FPGA.
- Netezza Spatial is in general availability
- There is one Netezza Spatial customer, Guy Carpenter Insurance.
- Netezza Spatial has also been tested by retailers and telecommunication companies.
Technically, it seems like most geospatial vendors support pretty similar functionality, in line with a standard called OGC (for Open Geospatial Consortium). Geospatial deals with three fundamental kinds of objects:
- Points (e.g., with latitude/longitude coordinates)
- Line segments (basically, pairs of endpoints)
- Polygons (basically, ordered sets of vertices)
Examples of functions that can be computed on these objects includes:
- Distance (between any kinds of object, not just points). Common examples include:
- Figuring out which points are within a given distance of a given polygon
- Finding a nearest neighbor
- Set operations on polygonal regions, especially but not only intersections. Applications of these include considering:
- How many customers are served by multiple stores in a chain (i.e., where stores’ area of service overlap)
- Where cellular telephone towers’ service areas overlap
- Where different kinds of insurance high-risk zones overlap
- Geometric calculations such as area or perimeter.
If you’ve followed along this far, you may be thinking something like “Wait a moment! Map-oriented GUIs became staples of BI dashboard demos years ago! Surely this stuff isn’t new.” (E.g., I recall Mike Stonebraker showing me something along those lines when he was still at Informix.) But according to Phil and Razi geography-in-BI hasn’t actually been based on latitude/longitude coordinates, but rather on conventional tabular fields like zip code or state/province. My observations over the years are consistent with that claim.
So who might actually use this stuff? Obvious vertical markets include:
- Land use planning and similar civilian government functions
- Earth science
- Electric utilities (major users of transactional geospatial datatypes)
- Telcos – huge users of data warehousing, and they somewhat resemble electric utilities
- Retailers – highly concerned with location
- Property/casualty insurers – ditto
- Direct snailmailers – if any survive in this era of cheap electronic communication