Well-resourced Silicon Valley start-ups typically announce their existence multiple times. Company formation, angel funding, Series A funding, Series B funding, company launch, product beta, and product general availability may not be 7 different “news events”, but they’re apt to be at least 3-4. Platfora, no exception to this rule, is hitting general availability today, and in connection with that I learned a bit more about what they are up to.
In simplest terms, Platfora offers exploratory business intelligence against Hadoop-based data. As per last weekend’s post about exploratory BI, a key requirement is speed; and so far as I can tell, any technological innovation Platfora offers relates to the need for speed. Specifically, I drilled into Platfora’s performance architecture on the query processing side (and associated data movement); Platfora also brags of rendering 100s of 1000s of “marks” quickly in HTML5 visualizations, but I haven’t a clue as to whether that’s much of an accomplishment in itself.
Platfora’s marketing suggests it obviates the need for a data warehouse at all; for most enterprises, of course, that is a great exaggeration. But another dubious aspect of Platfora marketing actually serves to understate the product’s merits — Platfora claims to have an “in-memory” product, when what’s really the case is that Platfora’s memory-centric technology uses both RAM and disk to manage larger data marts than could reasonably be fit into RAM alone. Expanding on what I wrote about Platfora when it de-stealthed:
- Platfora incrementally batch-loads data from Hadoop into its own bare-bones SQL data store, and does BI against that. That data store:
- Of course wants to run in-memory whenever possible …
- … but also has a significant disk-based aspect.
- Is true-columnar on disk and in memory alike.
- Stores all columns from a given row on the same nodes.
- Specifically, Platfora builds star-schema data marts, called “lenses”. To avoid data bloat on the Platfora servers:
- Two lenses with the same data often only store it once.
- The data for a given lens can be “evicted” if it won’t be needed for a while. (But the specifications for the lens are of course kept in case you want to rebuild it later.)
Notes on Platfora’s Hadoop ETL (Extract/Transform/Load) include:
- The basic idea is that you periodically re-run a job to pick up incremental changes since the last load.
- Right now that’s just a cron job or something. Platfora plans to add scheduling features imminently.*
- Platfora is sensitive to Hive partitioning.
- Platfora can run filters and so on to extract non-Hive data (the more common case).
*But in a sad comment on Hadoop’s workload management capabilities, Platfora doesn’t expect these features to be much used, at least at first.
Platfora’s aggregation story goes something like this:
- If an aggregate can be updated incrementally — for example a count or sum — Platfora probably will maintain it for you and update it on load.
- Ditto if it can be maintained almost incrementally — for example an average.
- Platfora also does Distinct calculations, even though those have to be worked through on its own servers.
As you would expect, Version 1 of the Platfora data store has various limitations, such as:
- Platfora Version 1 can’t do much with arrays or (other) nested data structures — it just transforms them into JSON strings.
- Platfora’s SQL support is limited.
- The Platfora data store has a “fat head” master (but at least that head is multi-node).
Naturally, Platfora hopes to fix these issues down the road.
Finally, a few company notes:
- Platfora has had 20 beta users, mainly but not entirely among online businesses.
- Platfora has close to 50 people.
- Platfora is currently focused on US direct sales, relying on inbound leads.