GIS and geospatial
Analysis of data management technology optimized for geospatial data, whether by specialized indexing or user-defined functions
I have long complained about difficulties in discussing Netezza’s TwinFin i-Class analytic platform. But I’m ready now, and in the grand sweep of the product’s history I’m not even all that late. The Netezza i-Class timing story goes something like this:
- Netezza i-Class was first foreshadowed in February, 2010.
- Netezza i-Class customer testing started in October, 2010 or so. Netezza i-Class evidently has been shipped to 4-5 partners and a single-digit number of end-user organizations, spread across some usual-suspect industries (financial services, telecom, and so on).
- Netezza i-Class 1.0 general availability is still in the (near) future.
My advice to Netezza as to how it should describe TwinFin i-Class boils down to: Read more
|Categories: Cloudera, Data warehouse appliances, Data warehousing, GIS and geospatial, Hadoop, IBM and DB2, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics||5 Comments|
I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)
First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.
The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.
One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more. Read more
|Categories: About this blog, Analytic technologies, Data warehousing, GIS and geospatial, Investment research and trading, MongoDB, OLTP, Open source, PostgreSQL, Vertica Systems||4 Comments|
This post is the first of a series. The second one delves into the technology behind the most serious electronic privacy threats.
The privacy discussion has gotten more active, and more complicated as well. A year ago, I still struggled to get people to pay attention to privacy concerns at all, at least in the United States, with my first public breakthrough coming at the end of January. But much has changed since then.
On the commercial side, Facebook modified its privacy policies, garnering great press attention and an intense user backlash, leading to a quick partial retreat. The Wall Street Journal then launched a long series of articles — 13 so far — recounting multiple kinds of privacy threats. Other media joined in, from Forbes to CNet. Various forms of US government rule-making to inhibit advertising-related tracking have been proposed as an apparent result.
In the US, the government had a lively year as well. The Transportation Security Administration (TSA) rolled out what have been dubbed “porn scanners,” and backed them up with “enhanced patdowns.” For somebody who is, for example, female, young, a sex abuse survivor, and/or a follower of certain religions, those can be highly unpleasant, if not traumatic. Meanwhile, the Wikileaks/Cablegate events have spawned a government reaction whose scope is only beginning to be seen. A couple of “highlights” so far are some very nasty laptop seizures, and the recent demand for information on over 600,000 Twitter accounts. (Christopher Soghoian provided a detailed, nuanced legal analysis of same.)
At this point, it’s fair to say there are at least six different kinds of legitimate privacy fear. Read more
|Categories: Analytic technologies, Facebook, GIS and geospatial, Health care, Surveillance and privacy, Telecommunications, Web analytics||6 Comments|
Some notes, follow-up, and links before I head out to California: Read more
|Categories: GIS and geospatial, Google, HP and Neoview, Humor, Kickfire, Netezza, Solid-state memory, Teradata, Web analytics||3 Comments|
As you might imagine, there are a lot of blog posts I’d like to write I never seem to get around to, or things I’d like to comment on that I don’t want to bother ever writing a full post about. In some cases I just tweet a comment or link and leave it at that.
And it’s not going to get any better. Next week = the oft-postponed elder care trip. Then I’m back for a short week. Then I’m off on my quarterly visit to the SF area. Soon thereafter I’ve have a lot to do in connection with Enzee Universe. And at that point another month will have gone by.
Anyhow: Read more
|Categories: Analytic technologies, Business intelligence, Data warehousing, Exadata, GIS and geospatial, Google, IBM and DB2, Netezza, Oracle, Parallelization, SAP AG, SAS Institute||3 Comments|
I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.
|Categories: Analytic technologies, Data warehousing, eBay, GIS and geospatial, Microsoft and SQL*Server, SciDB, Scientific research, Web analytics||5 Comments|
Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what’s going on, although names, dates, and details will have to await conversations and press releases this week.
- Teradata is productizing “private cloud,” under names including “Teradata Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” and “Teradata Elastic Mart Builder.” I.e., Teradata hopes to leapfrog Greenplum in its “Enterprise Data Cloud” strategy. This is only fair, in that Greenplum lifted the idea from Teradata and eBay in the first place. It also provides major support for what I think is an extremely sensible trend. Give or take issues of who announces and ships what a couple months before or after a competitor, my early thinking is that the main differences between Greenplum and Teradata in this regard will be:
- Virtual as opposed to just physical data marts, based on robust workload management software. (Advantage: Teradata)
- Pricing, deployment options. (Advantage: Greenplum)
- Features that don’t directly relate to enterprise/private cloud. (Advantage: Either, often Teradata.)
- Teradata is generally strengthening its data movement technology, e.g. for making various appliances work in sync. I’m not too clear yet on the details of that. I think this is what Teradata’s phrase “ecosystem management” refers to.
- Teradata is (pre-)announcing – at least as a statement of direction — an appliance based on solid-state drives (SSDs). I’ve thought for a while that Teradata was a leader in thinking through the issues around solid-state memory in data warehousing, so it makes sense that they’re among the leaders in actually coming to market as well. I plan to say more after meeting with, e.g., Carson Schmidt.
- Teradata has achieved a 300%ish speed-up in geospatial processing. I gather this is largely a byproduct of the parallel analytics work Teradata did around strengthening its SAS integration. However, there don’t seem to be a lot of Teradata geospatial users yet.
- Teradata Express, Teradata’s free Windows-based crippleware, is being ported to Amazon EC2 and VMware as well. Presumably to avoid cannibalizing Teradata product sales, there are quite a few limitations on Teradata Express, including system capacity, database size, and “no production use.”
- Teradata continues to extend its optimizations to handle queries issued by business intelligence tools. Previously, the focus of what Teradata discussed in this regard was query rewrite. But soon automatic recommendation and creation of Aggregate Join Indexes – i.e.., materialized views – will be included as well.
- A data model based on multidimensional arrays, not sets of tuples
- A storage model based on versions and not update in place
- Built-in support for provenance (lineage), workflows, and uncertainty
- Scalability to 100s of petabytes and 1,000s of nodes with high degrees of tolerance to failures
- Support for “external” data objects so that data sets can be queried and manipulated without ever having to be loaded into the database
- Open source in order to foster a community of contributors and to insure that data is never “locked up” — a critical requirement for scientists
However: Read more
|Categories: Analytic technologies, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Facebook, GIS and geospatial, Hadoop, Open source, SciDB, Scientific research, Specific users, Web analytics||7 Comments|
Last October I wrote about the Teradata 13 release of Teradata’s database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:
- Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
- UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.
To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well. Read more
|Categories: Analytic technologies, Data types, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, GIS and geospatial, Parallelization, SAS Institute, Teradata, Theory and architecture||6 Comments|
Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its. It’s called Teradata Developer Exchange — DevX for short. Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so. Major elements are about what one would expect:
- Surprisingly, so far as I can tell, no forums
If you’re a Teradata user, you absolutely should check out Teradata DevX. If you just research Teradata — my situation — there are some aspects that might be of interest anyway. In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility. Mainly, these are UDFs (User-Defined Functions), in areas such as:
- Geospatial data
- Imitating Oracle or DB2 UDFs (as migration aids)
Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint. A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard. A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.
|Categories: Database compression, Emulation, transparency, portability, GIS and geospatial, Teradata||2 Comments|