- Kafka has gotten considerable attention and adoption in streaming.
- Kafka is open source, out of LinkedIn.
- Folks who built it there, led by Jay Kreps, now have a company called Confluent.
- Confluent seems to be pursuing a fairly standard open source business model around Kafka.
- Confluent seems to be in the low to mid teens in paying customers.
- Confluent believes 1000s of Kafka clusters are in production.
- Confluent reports 40 employees and $31 million raised.
At its core Kafka is very simple:
- Kafka accepts streams of data in substantially any format, and then streams the data back out, potentially in a highly parallel way.
- Any producer or consumer of data can connect to Kafka, via what can reasonably be called a publish/subscribe model.
- Kafka handles various issues of scaling, load balancing, fault tolerance and so on.
So it seems fair to say:
- Kafka offers the benefits of hub vs. point-to-point connectivity.
- Kafka acts like a kind of switch, in the telecom sense. (However, this is probably not a very useful metaphor in practice.)
|Categories: Data integration and middleware, Humor, Kafka and Confluent, Market share and customer counts, Microsoft and SQL*Server, Open source, Specific users, Streaming and complex event processing (CEP)||10 Comments|
My favorite educational video growing up, by far, was a 1960 film embedded below. I love it because it pranks its viewers, starting right in the opening scene. (Start at the 0:50 mark to see what I mean.)
If you’re ever in the position of helping a kid or young adult understand physics, this video could be a great help. Frankly, it could help in political discussions as well …
- Question: Why do policemen work in pairs?
- Answer: One to read and one to write.
A lot has happened in MongoDB technology over the past year. For starters:
- The big news in MongoDB 3.0* is the WiredTiger storage engine. The top-level claims for that are that one should “typically” expect (individual cases can of course vary greatly):
- 7-10X improvement in write performance.
- No change in read performance (which however was boosted in MongoDB 2.6).
- ~70% reduction in data size due to compression (disk only).
- ~50% reduction in index size due to compression (disk and memory both).
- MongoDB has been adding administration modules.
- A remote/cloud version came out with, if I understand correctly, MongoDB 2.6.
- An on-premise version came out with 3.0.
- They have similar features, but are expected to grow apart from each other over time. They have different names.
*Newly-released MongoDB 3.0 is what was previously going to be MongoDB 2.8. My clients at MongoDB finally decided to give a “bigger” release a new first-digit version number.
To forestall confusion, let me quickly add: Read more
|Categories: Database compression, Hadoop, Humor, In-memory DBMS, MongoDB, NoSQL, Open source, Structured documents, Sybase||9 Comments|
If the makers of MMO RPGs (Massive Multi-Player Online Role-Playing Games) aren’t quite the worst database application developers in the world, they’re at least on the short list for consideration. The makers of Guild Wars didn’t even try to have decent database functionality. A decade later, when they introduced Guild Wars 2, the database-oriented functionality (auction house, real-money store, etc.) would crash for days at a time. Lord of the Rings Online evidently had multiple issues with database functionality. Now I’m playing Elder Scrolls Online, which on the whole is a great game, but which may have the most database screw-ups of all.
ESO has been live for less than 3 weeks, and in that time:
2. Guild functionality has at times been taken down while the rest of the game functioned.
3. Those problems aside, bank and guild bank functionality are broken, via what might be considered performance bugs. Problems I repeatedly encounter include:
- If you deposit a few items, the bank soon goes into a wait state where you can’t use it for a minute or more.
- Similarly, when you try to access a guild — i.e. group — bank, you often find it in an unresponsive state.
- If you make a series of updates a second apart, the game tells you you’re doing things too quickly, and insists that you slow down a lot.
- Items that are supposed to “stack” appear in 2 or more stacks; i.e., a very simple kind of aggregation is failing. There are also several other related recurring errors, which I conjecture have the same underlying cause.
In general, it seems like that what should be a collection of database records is really just a list, parsed each time an update occurs, periodically flushed in its entirety to disk, with all the performance problems you’d expect from that kind of choice.
Before the advent of cheap computing power, statistics was a rather dismal subject. David Lax scared me off from studying much of it by saying that 90% of statistics was done on sets of measure 0.
The following cautionary tale also dates to that era. Other light verse below. Read more
Modern analytics described in three old jokes.
The drunk under the lamppost
A man is on his hands and knees, looking for something under a lamppost and obviously not finding it. The neighborhood policeman asks what he is doing.
“I’m looking for my keys.”
“Did you lose them around here?”
“Not exactly; I think they fell out of my pocket down the street a bit.”
“Then why aren’t you looking for them down the street?”
“The light is better over here.”
Some people use statistics the way a drunk uses a lamppost — more for support than for illumination.
Seek and …
A family that’s looking to start organic gardening has a large pile of manure dumped into their backyard. Their daughter grabs a shovel and digs in excitedly, shouting:
“Look at all this … stuff! There must be a pony in here somewhere!!”
Various reporters have asked me about Oracle’s third quarter 2012 earnings conference call. Specific Q&A includes:
What did Oracle do to have its earnings beat Wall Street’s estimates?
Have a bad second quarter and then set Wall Street’s expectations too low for Q3. This isn’t about strong results; it’s about modest expectations.
Can Oracle be a leader in both hardware and software?
- It’s not inconceivable.
- The observation that Oracle, IBM, and Teradata all are pushing hardware-software combinations has been intriguing ever since IBM bought Netezza. (SAP really isn’t, however; ditto Microsoft.)
- I do think Oracle may be somewhat overoptimistic as to how cooperative the Sun user base will be in buying more high-end product and in paying more in maintenance for the gear they already have.
Beyond that, please see below.
What about Oracle in the cloud?
MySQL is an important cloud supplier. But Oracle overall hasn’t demonstrated much understanding of what cloud technology and business are all about. An expensive SaaS acquisition here or there could indeed help somewhat, but it seems as if Oracle still has a very long way to go.
|Categories: Cloud computing, Exadata, Humor, In-memory DBMS, Oracle, SAP AG, Software as a Service (SaaS)||5 Comments|
It all started when I disputed James Kobielus’ blogged claim that Hadoop is the nucleus of the next-generation cloud EDW. Jim posted again to reiterate the claim, only this time he wrote that all EDW vendors [will soon] bring Hadoop into their heart of their architectures. (All emphasis mine.)
That did it. I tweeted, in succession:
- Actually, I vote for Hadoop as the lungs of the EDW — first place of entry for essential nutrients.
- Data integration can be the heart of the EDW, pumping stuff around. RDBMS/analytic platform can be the brain.
- iPad-based dashboards that may engender envy, but which actually are only used occasionally and briefly … well, you get the picture.*
*Woody Allen said in Sleeper that the brain was his second-favorite organ.
Of course, that body of work was quickly challenged. Responses included: Read more
|Categories: Analytic technologies, Business intelligence, Data warehousing, EAI, EII, ETL, ELT, ETLT, Fun stuff, Hadoop, Humor, MapReduce||Leave a Comment|
The competition for April Fool’s Day humor is brisk, as I documented in 2010 with two lists of excellent pranks. So I went against the grain that year, offering a collection of strange-but-true stories — such as how I came to have heartthrob James Marsters autograph a shirtless picture of himself, why I regretted that graduating athletic powerhouse Ohio State University at age 16 cost me my NCAA eligibility, and the sore butt I got from spending an afternoon with Bill Gates’ girlfriend, herself a well-known industry figure.
That post seemed to go over well, even if I’m a little disappointed at how few people joined in with stories of their own. So I’m opting for strange-but-true this year also, just more aligned with the usual subjects of my blogging. And thus, without further ado, here’s the story of
The client that was confused about security
Some notes, follow-up, and links before I head out to California: Read more
|Categories: GIS and geospatial, Google, HP and Neoview, Humor, Kickfire, Netezza, Solid-state memory, Teradata, Web analytics||3 Comments|