In surveying MapReduce applications to date, I said that they fell mainly into three overlapping categories:
- Text tokenization, indexing, and search
- Creation of other kinds of data structures (e.g., graphs)
- Data mining and machine learning
and really should have included a fourth:
- Data transformation
Nokia just released another MapReduce implementation, Disco, and its list of applications to date fits right into that template. The relevant quote is:
This far Disco has been succesfully used, for instance, in parsing and reformatting data, data clustering, probabilistic modelling, data mining, full-text indexing, and log analysis with hundreds of gigabytes of real-world data.