StreamBase is a decently-established startup, possibly the largest company in its area. Truviso, in the process of changing its name from Amalgamated Insight, has a dozen employees, one referenceable customer, and a product not yet in general availability. Both have ambitious plans for conquering the world, based on similar stories. And the stories make a considerable amount of sense.
Both companies’ core product is a memory-centric SQL engine designed to execute queries without ever writing data to disk. Of course, they both have persistence stories too — Truviso by being tightly integrated into open-source PostgreSQL, StreamBase more via “yeah, we can hand the data off to a conventional DBMS.” But the basic idea is to route data through a whole lot of different in-memory filters, to see what queries it satisfies, rather than executing many queries in sequence against disk-based data.
The basic paradigm for filtering – certainly in Truviso’s case, and I think in StreamBase’s as well – is from the columnar/bitmap school. A record is a vector or a set of vectors. A query or filter is a vector or set of vectors. So take some dot products, see where you are, and you’ll know if the query is satisfied.
Not coincidentally, I would think, this is very similar to the approach taken by various memory-centric BI vendors, such as QlikTech, or SAP in its BI Accelerator. It’s also the approach taken by text indexers. Indeed, the first product I ever heard of of this kind was actually in the text area, from Verity, a decade ago, notwithstanding that Verity had a very small team for sophisticated DBMS types of things. (The whole kernel group was 6-7 people.) And also perhaps not coincidentally, StreamBase reports that despite a lack of text processing primitives, a lot of the use of their technology is for text filtering, I presume in national-security kinds of applications.
The core market for this stuff is financial trading, where the rule of thumb is that a complete query, decision, and transaction has to be done in 30 milliseconds, and database lookup gets to use less than 10% of that time. (Hence the memory-centric requirement; there’s simply no way to search a disk usefully in less than 4-7 milliseconds.) That’s also where Progress Apama (which has a rules-based approach to the same problems) is focused.
The two other obvious and traditional markets for memory-centric technology are national security and telecommunications. (E.g., there’s no prize for guessing which three industries are called out on StreamBase’s website.) But it goes further than that. For example,
- MMORPGs (Massively MultiPlayer Online Role-Playing Games). I forget how many details StreamBase has disclosed by now, but they have something going in that space. By the way, I also have a research project ongoing about the technology of MMORPGs, for an eventual Network World column. I love my work!
- Online travel/reservations demand management. StreamBase is active there too.
- Logistics. UPS is an investor in Truviso.
- General real-time BI.
Adopting this technology is easier than one might at first think. StreamBase has a nice Eclipse-based development tool. Truviso has some packaged BI/visualization. StreamBase notes that one use of its technology is preprocessing input streams for existing or conventional applications. Truviso claims to make ETL easier as well, although I confess to so far only having seen their smoke on that subject, and not yet detected the associated fire.
One point StreamBase noted is that this preprocessing is not just for speed/latency. Especially in the intelligence business, the raw data is huge. Vast amounts of data reduction are an absolute requirement. As we go to civilian deployments of widespread sensor technology – RFID, GPS/presence, whatever – similar needs may well arise.
I could and probably at some point will say a lot more on these subjects. There are technical details (e.g., Truviso fondly claims to have a more complete product, despite being a much smaller company that’s been in business for a much shorter length of time, apparently because it bundles in more SQL of the nature suited for disk-centric systems). There’s how this all fits into my thesis on the disruption/reinvention of BI, and what’s still lacking – some more personal alert/KPI management, guys, if you please! And so on. Please stay tuned.
And please let me know what you think based on what you know today.