May 12, 2010
After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included:
- Nothing in my original short post about Clustrix was actually incorrect.
- Clustrix plans to reveal actual production “name-brand” customers soon.
- The name of Clustrix’s software, or at least the guts thereof, is Sierra.
- Clustrix’s products have actually been in general availability since last quarter, with some versions at customer sites for 2 years. Development started 3 ½ years ago.
- Clustrix says its technology is for OLTP systems, which it calls “non-batch/non-analytic,” with mixed read/write workloads. All Clustrix’s example target markets are “internet verticals,” such as photo sharing, gaming, social media, e-commerce, etc.
- Clustrix’s heart is in SQL, as is most of its customer base. Clustrix Sierra’s key-value-store option has little or no performance advantage over Clustrix Sierra’s SQL option, nor any other advantage over SQL that came up in discussion.
- Clustrix Sierra is “wire-compatible” with MySQL, but doesn’t use MySQL code; Clustrix wrote all the code itself.
- Clustrix asserts that Clustrix Sierra supports the “vast majority” of MySQL features. Examples of MySQL features Clustrix doesn’t support at this time are full-text search and geospatial indexing.
- Indeed, Clustrix claims Clustrix Sierra can be used to replace MySQL with few or zero changes to existing applications.
- I specifically asked about referential integrity, which has a poor performance reputation in MySQL. Besides saying they supported it, Clustrix said that some customers actually use referential integrity in some of their less active tables.
- Clustrix Sierra is fully ACID-compliant, with no eventual consistency or RYW consistency story. The default number of copies of each datum is two, and they’re kept consistent via two-phase commit.
- Clustrix Sierra is fully parallel, with no “head” node. I forgot to ask how it was determined which queries would be addressed to and/or controlled by which nodes, but I presume there’s some sort of a load-balancing scheme.
- Clustrix says that because Clustrix Sierra uses MVCC (Multi-Version Concurrency Control), and thus reads and writes don’t block each other, global locks aren’t a major issue. (They’re rare or short or something – I have trouble seeing why they would be non-existent.)
- Clustrix says there’s a second class of locks and latches that are purely local and short-lived, for B-tree indexes and the like. (I didn’t drill down into those either.) I guess this means Clustrix Sierra is B-tree-centric, which makes sense for an OLTP-oriented system.
- Clustrix Sierra distributes data among nodes via consistent hashing (default), range partitioning, or “full distribution”(i.e., copying a – presumably small – table to each node). The choice of distribution plans is manual now; more automation is a future feature.
- Clustrix Sierra’s CBO (Cost-Based Optimizer) is, as one would hope, distribution-aware.
- Clustrix Sierra compiles query fragments and ships them off to the relevant nodes. A fragment might contain both instructions for SQL to be executed locally and for where data is to be sent next.
- Clustrix says that Clustrix Sierra does data migration and redistribution (e.g., when you add a node) transparently online, and further says that in practice this doesn’t cause a performance hit.
- As for Clustrix hardware:
- Clustrix makes Type I appliances.
- A Clustrix node contains 2 quad-core chips, 32 gigs of RAM, and 7 160 GB solid-state drives.
- Specifically, Clustrix is using Intel SSDs, with a SAS interface.
- Clustrix says solid-state memory isn’t really essential to the product design; it’s just cheap in terms of $/IOPS (I/O Per Second).
- A minimum Clustrix configuration is 3 nodes, for redundancy. After that you can add nodes one at a time. Clustrix says it built a 20-node system in-house, leading me to suspect that customers don’t have anything bigger than 20 nodes either.
- That 20-node Clustrix system was tested to show near-linear scalability. (In discussing this, Clustrix tends to forget to use the word “near”.)
- Clustrix has partnered with somebody to provide global 4-hour-response support. As of now Clustrix seems to be active mainly in North America and Europe.
- Clustrix is formed from the combination of two startups, which I’ve heard elsewhere were called Clustrix and Sprout. Exactly when the combination happened sounds a little different depending on who’s telling the story (one version has the predecessors still being separate well into 2008, but Clustrix implies the combination happened pretty much on Day 1).
Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory
Subscribe to our complete feed!