May 6, 2014

Notes and comments, May 6, 2014

After visiting California recently, I made a flurry of posts, several of which generated considerable discussion.

Here is a catch-all post to complete the set. 

1. The recently-announced Cloudera/MongoDB relationship* is still at the Barney stage. That said, I’m optimistic that their stated intention to add substance to the relationship will eventually come to fruition. If nothing else, the two companies have high regard for each other, at least at the Mike Olson/Max Schireson level.

*That’s one of numerous deals with my fingerprints on it, but in this case only lightly. It was probably on track to happen even without my nudges.

2. Most of what I talked about when I visited MongoDB is confidential; the public stuff was mainly in my recent MongoDB technology post. But in one exception, I asked Max for an update as to MongoDB enterprise use cases. He reported a cluster in data combination, especially but not only in use cases which have both high-volume part and dynamic-schema aspects. Specific examples Max cited included:

3. I didn’t ask everybody I saw in California about business trends, and much of what we did discuss was confidential. That said:

4. Platfora is focusing a bit, starting with clickstream and security — i.e., event series stuff. And by the way, they report that the term “event series” is working well for them.

5. I gather from a variety of comments and conversations that Amazon Redshift has achieved considerable traction.

6. Something I can’t find evidence of having posted before: I think multiple businesses monitor online sales or similar business successes as a guide to network problems. eBay did this via a custom in-memory MOLAP (Multidimensional Online Analytic Process) system years ago. Best evidence that this is hardly restricted to eBay: all the “me-too” responses I get from telling that story.

7. Citus Data tells me that as of PostgreSQL 9.4, Postgres will be able to return just the part of a JSON column needed for a query. This is as opposed to storing the whole thing as text and only retrieving it in its entirety.

8. In the comments to my “Spark on fire” post, Patrick McFadin pointed out that Mahout is transitioning from MapReduce to Spark. (All new work will be on Spark, although old MapReduce-based routines will continue to be supported.) It turns out that Derrick Harris wrote about that over a month ago, and I just missed the news.

9. Also in predictive analytics — there are rumblings that R could eventually be supplanted by Julia, although R’s massive libraries of algorithms still give it the advantage now.

10. Multiple vendors, fed up with the intermittent slowdowns from garbage collection, are moving some processing off the Java heap. Unfortunately, I neglected to ask any of them what the remaining differences then were between Java and C++ programming.

11. And to finish on a light note: BDAS — the project of which Spark is only a part — is pronounced “bad-ass”, something I first heard from Dave Patterson.

Comments

4 Responses to “Notes and comments, May 6, 2014”

  1. Ariel Weisberg on May 6th, 2014 10:59 am

    Flexible schema has to be one of the worst and most easily co-opted differentiators that MongoDB has.

    With Postgres on board it kind of surprises me that MySQL doesn’t have an answer for a flexible schema column type. It seems like everything that isn’t Postgres or MySQL (or old school RDBMS) got on the flexible schema train post haste.

  2. Mark Callaghan on May 6th, 2014 11:47 am

    MariaDB has some of it. More is on the way in all variants of MySQL. I have begun reading about the PG features and they are impressive.

  3. Rules for names | Strategic Messaging on May 11th, 2014 1:46 am

    […] Platfora’s latest release focused on data sets that — after Platfora assembles them for you — are sort of like time series but also somewhat like event streams. “Event series” was the winning name. Edit (May 2014): Platfora reports that that choice worked out well. […]

  4. clive boulton on May 14th, 2014 6:00 pm

    On 10. Multiple vendors, fed up with the intermittent slowdowns from GC:

    * download.Google.com in C++ rewritten in Go.
    * office.microsoft.com jobs in C# rewritten in C++
    * spacecurve.com CTO use a barrel processor in C++

    Has concurrency in multi-core / multi-data center arrived?

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.