I find myself in need of a word or phrase that means bring data together from various sources so that it’s ready to be used, where the use can be analysis or operations. The first words I thought of were “aggregation” and “collection,” but they both have other meanings in IT. Even “data marshalling” has a specific meaning different from what I want. So instead, I’ll go with data mustering.
I mean for the term “data mustering” to encompass at least three scenarios:
- Integrated (relational) data warehouse.
- Big bit bucket.
- Big bit stream.
Let me explain what I mean by each.
“Integrated data warehouse” is a phrase Teradata has started using for enterprise data warehouses that, like approximately every other EDW in the entire history of data warehousing, aren’t truly enterprise-wide. In other words, it means “not just a data mart”. No category name is perfect, but I think that one works reasonably well.
I previously described the big bit bucket use case as
Users take a whole lot of data, often machine-generated data in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing.
and quickly added
Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. Contending technologies include Hadoop appliances (which I don’t believe in), Splunk (which in many use cases I do), and MarkLogic (ditto, but often the cases are different from Splunk’s). Cloudera and IBM, among other vendors, would also like to sell you some proprietary software to go with your standard Apache Hadoop code.
I think I’ll stand pat on that explanation.
By analogy, a big bit stream is various streams of data, assembled in the custody of a streaming engine. Sybase told me Wednesday that this scenario appears in both of the traditional markets for CEP/streaming — national intelligence, where it is a major use of streaming, and capital markets in some use cases as well. And it’s consistent with what I’ve heard from other CEP/streaming vendors as well.
As for where I got the word “mustering” — it’s a military term, for when you assemble your troops and their gear either for inspection or for actual use. The main modern usage I know of the word is as part of the phrase “pass muster”, which originally referred to the concept that the person being paid to put a regiment together should from time to time demonstrate that the regiment physically existed in the form that regimental records seemed to show.