Citrusleaf has released an add-on product called Citrusleaf RTA (Real-Time Attribution). It’s to be used when:
- You want to update dashboards within a minute.
- You want to update predictive models fairly quickly (within the hour?), although it’s not clear to me how much the models are being updated or changed with that latency.
The metrics envisioned are:
- 100 or so ad impressions per person …
- … for 1 billion or so people …
- … stored for 30-90 days …
- … where each ad impression is a fairly short record …
- … stored on disk …
- … but indexed in a way so that the index can fit into RAM.
- 50-100,000 writes per second. (I didn’t ask on what amount of hardware.)
- Several hundred reads per second.
A consistent relational schema is NOT assumed.
Citrusleaf’s solution is:
- Have one index entry for each of the 1 billion people.
- Bang each new object/record to disk. Include in it a pointer to the previous object/record for the same person.
- Each time a new object/record is added, update the index in place so that it now points to the new once. Hence, the index is sized according to the number of people, not according to the total number of objects/records.
- Eventually let objects/records age off in the obvious way.
The downside is that when you do read 100 objects/records per person, you might need to do 100 seeks.