Talking with my clients at SAND can be confusing. That said:
- I need to revise my figures for SAND’s customer count way downward.
- SAND finally has a reasonably clear positioning.
- SAND’s product actually seems to have a lot of features.
A few months ago, I wrote:
SAND Technology reported >600 total customers, including >100 direct.
Upon talking with the company, I need to revise that figure downward, from > 600 to 15.
One embarrassing point: SAND is a client, and I view it as part of my job to save clients from that kind of inadvertent misstatement.
It turns out that SAND has a very impressive customer — Dunnhumby, a data mart outsourcer with 200 terabytes of data in SAND, 30 or so incoming data streams, 400 or so nodes … and 600 or so end customers, all of which SAND was counting as OEM end customers for its DBMS. But I, other industry observers, and other vendors generally don’t count that way.
Besides Dunnhumby, SAND has 14 other customers on maintenance, with < 1 terabyte of data each. Until recently, SAND had a couple dozen more customers than that, but it sold its SAP-oriented archiving/near-line storage product line to Informatica.
I still don’t know where the “> 100 direct” part came from.
After the sale of its other product line, SAND is squarely in the market for analytic DBMS. SAND’s sales efforts seem to be focused on investigative analytics, although some of its existing users seem to be more focused on operational analytics. Most specifically, SAND is trying to focus on “people data” — customer loyalty, health care, etc . — rather than purely machine-generated data, with the paradigmatic target application being personalized marketing.
SAND technical highlights include:
- SAND sells a columnar analytic DBMS.
- The SAND DBMS operates on bitmaps, with heavy use of run-length encoding on the bitmaps. Bitmaps are used for everything except BLOBs (Binary Large OBjects).
- Actual data compression also comes into play, e.g. as result sets are being assembled. This is based on a true global dictionary — multiple columns are tokenized together.
- Indeed, SAND can decompose columns and tokenize their parts (e.g. time stamps).
- SAND’s workload management sees RAM and CPU, but not explicitly I/O.
- SAND lets you pin certain tables or even table segments in RAM if you want to.
SAND’s update story is straightforward — when data comes in, all the columns and bitmaps are updated as needed. Still, since SAND is columnar, you wouldn’t expect true updates in place, and you’d be right. Rather, there’s a story with MVCC (MultiVersion Concurrency Control) and garbage collection, lock-free. The MVCC is also exploited for a kind of time travel, and further for some kind of virtual data mart capability.
SAND’s parallelization story is a bit complicated.
- SAND has, or at least has the potential for, node specialization, with database and storage nodes being different.
- In principle, disks are specific to storage nodes, and it’s a configuration option as to whether a database node sees one, some, or all storage nodes.
- In practice, only Dunnhumby among SAND’s customers operates on other than a shared-disk basis. Dunnhumby’s configuration is mixed/matched among various SAND sharing options.
SAND is proud of its PMML (Predictive Modeling Markup Language) scoring capabilities, but otherwise hasn’t shipped much in the way of analytic platform capabilities. That said, work is underway on a user-defined table function capability that can also query external tables, fire off MapReduce jobs, and so on, under the code name UQL.