Citrusleaf
Discussion of Citrusleaf and its product, also called Citrusleaf.
Citrusleaf RTA
Citrusleaf has released an add-on product called Citrusleaf RTA (Real-Time Attribution). It’s to be used when:
- You want to update dashboards within a minute.
- You want to update predictive models fairly quickly (within the hour?), although it’s not clear to me how much the models are being updated or changed with that latency.
The metrics envisioned are:
- 100 or so ad impressions per person …
- … for 1 billion or so people …
- … stored for 30-90 days …
- … where each ad impression is a fairly short record …
- … stored on disk …
- … but indexed in a way so that the index can fit into RAM.
- 50-100,000 writes per second. (I didn’t ask on what amount of hardware.)
- Several hundred reads per second.
A consistent relational schema is NOT assumed.
Citrusleaf’s solution is:
- Have one index entry for each of the 1 billion people.
- Bang each new object/record to disk. Include in it a pointer to the previous object/record for the same person.
- Each time a new object/record is added, update the index in place so that it now points to the new once. Hence, the index is sized according to the number of people, not according to the total number of objects/records.
- Eventually let objects/records age off in the obvious way.
The downside is that when you do read 100 objects/records per person, you might need to do 100 seeks.
Introduction to Citrusleaf
Citrusleaf is the vendor of yet another short-request/NoSQL database management system, conveniently named Citrusleaf. Highlights for Citrusleaf the company include:
- 8 employees.
- $2 million in recently acquired venture capital.
- 1 1/2 – 2 1/2 years of total company history, depending on how you count.
- An undisclosed but nonzero number of paying customers, concentrated in the real-time advertising market, with a typical application being cookie management.
Citrusleaf the product is a kind of key-value store; however, the values are in the form of rows, so what you really look up is (key, field name, value) triples. Right now only the keys are indexed; futures include indexing on the individual fields, so as to support some basic analytics. SQL support is an eventual goal. Other Citrusleaf buzzword basics include:
- ACID-compliant.
- Log-structured.
- Tunable consistency model.
To date, Citrusleaf customers have focused on sub-millisecond data retrieval, preferably .2-.3 milliseconds. Accordingly, none has chosen to put the primary Citrusleaf data store on disk. Rather:
- Citrusleaf indexes are always in RAM. (Citrusleaf forces this, actually.)
- You can keep data in RAM and copy it to disk.
- You can keep data on solid-state drives. (Just A Bunch Of Flash or Fusion I/O.)
I don’t have a good grasp on what the data structure for those indexes is.
Citrusleaf characterizes its customers as firms that have “a couple of KB” of data on “every” person in North America. Naively, that sounds like a terabyte or less to me, but Citrusleaf says 1-3 terabytes is most common. Or to quote the press release, “The most common deployments for Citrusleaf 2.0 are terabytes of data, billions of objects, and 200K plus transactions per second per node, with sub-millisecond latency.” 4-8 nodes seems to be typical for Citrusleaf databases (all figures pre-replication). I didn’t ask what kind of hardware is at each node.
Citrusleaf data distribution features include: Read more
| Categories: Citrusleaf, NoSQL, Parallelization | 3 Comments |
