My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the union of database and analytic technologies – the intersection of those two sectors is an area of particular focus, but is far from the whole of my coverage.)
One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including:
Simpler database technology, by which I mean DBMS that are:
- Easier to administer than market-leading systems …
- … even if at the cost of being special-purpose
- MySQL and older mid-tier RDBMS such as Progress
- Many analytic DBMS and appliances, most notably Netezza’s
For general purpose or OLTP uses, I’m not a big fan of MySQL (not enough progress in making it industrial-strength), PostgreSQL (no good company behind it – I’m a non-fan of EnterpriseDB), or Ingres (open source or not, it’s an antiquated system that hasn’t been invested in as much as Oracle, DB2 or SQL Server).
But I get the impression there are a lot of contenders among small startups, featuring very new architectures for OLTP or general-purpose database management. VoltDB comes to mind. NimbusDB is finally within range of getting funded. Dan Weinreb told me Friday he knows of a bunch of others as well. And that’s all before we even get into the NoSQL kind of alternative.
Flexible storage architectures. That’s starting out with an emphasis on hybrid columnar, as in the examples of Vertica and Greenplum. Oracle (to whom I’m under no NDA obligation) and other vendors (to whom I am) are going that way as well.
Multi-tier database architectures, by which I mean at least two things:
- The database tier/server tier split of Exadata
- Hybrid RAM/disk architectures, examples of which include
- Vertica’s RAM-based write-optimized store
- Sensage’s CEP-in-the-DBMS
- This in-memory analytics stuff we keep hearing about from the BI vendors
- Any true in-memory/disk hybrid, such as the regrettably sidelined solidDB
- Smart thinking by numerous DBMS vendors about optimizing the use of RAM and/or Level 2 cache
Netezza is particularly interesting to watch in this regard because it:
- Had a pretty strict storage/other processing split in prior product generations and …
- … ditched that in its latest generation …
- … which however is focused on optimizing the use of RAM cache
Also noteworthy is Petascan, the stealth-mode –and therefore harder to watch right now – company I keep teasing about, which makes a strong case for carrying the database/storage tier split into the flash/solid-state memory technology generation. Calpont also has a server/storage tier split, but that’s of mainly theoretical interest unless and until Calpont actually ships an MPP version of InfiniDB.
An ever-better understanding of scale-out technology, in several respects, including:
- Query, notably data movement for MPP DBMS
- Update, especially minimalistic DBMS approaches, be they sharded MySQL or more NoSQLish
- Number-crunching, especially via MapReduce and/or parallel analytic libraries integrated into DBMS
Cool trends I touched on more briefly include:
- More data being available for analysis. This was a core theme of my Enzee Universe keynote speeches; there are also some notes on it in my post based on my Boston Big Data Summit talk.
- More users being served by analytics. Ditto.
- Data exploration/visualization, ala QlikView, Spotfire, or Tableau, and also the faceted stuff.
- The democratization of data mining. But I’m not as sure of that one as of the others…
One area I flat-out forgot to mention is easy data mart spin-out.
Other posts based on my January, 2010 New England Database Summit keynote address
- Data-based snooping — a huge threat to liberty that we’re all helping make worse
- Flash, other solid-state memory, and disk
- Open issues in database and analytic technology