Examples of machine-generated data
Not long ago I pointed out that much future Big Data growth will be in the area of machine-generated data, examples of which include: Read more
Categories: Analytic technologies, Data warehousing, Games and virtual worlds, Investment research and trading, Log analysis, Oracle, Telecommunications, Web analytics | 28 Comments |
Information found in public-facing social networks
Here are some examples illustrating two recent themes of mine, namely:
- Easily-available information reveals all sorts of things about us.
- Graph-based analysis is on the rise.
Pete Warden scraped all of Facebook’s social graph (at least for the United States), and put up a really interesting-looking visualization of same. Facebook’s lawyer’s came down on him, and he quickly agreed to destroy the data he’d scraped, but also published ideas on how other people could duplicate his work.
Warden has since given an interview in which he outlines some of the things researchers hoped to do with this data: Read more
Categories: Analytic technologies, Facebook, RDF and graphs, Surveillance and privacy | 1 Comment |
Thoughts on IBM’s anti-Oracle announcements
IBM is putting out a couple of press releases today that are obviously directed competitively at Oracle/Sun, and more specifically at Oracle’s Exadata-centric strategy. I haven’t been briefed, so I just have those to go on.
On the whole, the releases look pretty lame. Highlights seem to include:
- Maybe a claim of enhanced data compression.
- Otherwise, no obvious new technology except product packaging and bundling.
- Aggressive plans to throw capital at the Sun channel to convert it to selling IBM gear. (A figure of $1/2 billion is mentioned, for financing.
Disappointingly, IBM shows a lot of confusion between:
- Text data
- Machine-generated data such as that from sensors
While both highly important, those are very different things. IBM has not in the past shown much impressive technology in either of those two areas, and based on these releases, I presume that trend is continuing.
Edits:
I see from press coverage that at least one new IBM model has some Fusion I/O solid-state memory boards in it. Makes sense.
A Twitter hashtag has a number of observations from the event. Not much substance I could detect except various kind of Oracle bashing.
Categories: Database compression, Exadata, IBM and DB2, Oracle, Solid-state memory | 14 Comments |
Notes on the evolution of OLTP database management systems
The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.
Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:
- Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
- By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
- Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
- It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
- The era of silicon-centric relational DBMS is coming.
- The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
- Users’ instance on “free” could be a major problem for OLTP DBMS innovation.
I shall explain. Read more
The retention of everything
I’d like to reemphasize a point I’ve been making for a while about data retention: Read more
Categories: Archiving and information preservation, Surveillance and privacy, Web analytics | 3 Comments |
Liberty and privacy, once again
I’ve long argued three points:
- It is inevitable* that governments and other constituencies will obtain huge amounts of information, which can be used to drastically restrict everybody’s privacy and freedom.
- To protect against this grave threat, multiple layers of defense are needed, technical and legal/regulatory/social/political alike.
- One particular layer is getting insufficient attention, namely restrictions upon the use (as opposed to the acquisition or retention) of data.
*And indeed in many ways even desirable
I surprised people by leading with the liberty/privacy subject at my New England Database Summit keynote; considerable discussion ensued, largely supportive. I hope for a similar outcome when I keynote the Aster Big Data Summit in Washington, DC in May. And I expect to do even more to advance the liberty/privacy discussion as 2010 unfolds.
Fortunately, I’m not the only only thinking or talking about these liberty/privacy issues. Read more
Akiban highlights
Akiban responded quickly to my complaints about its communication style, and I chatted for a couple of hours with senior Akiban techies Ori Herrnstadt, Peter Beaman and Jack Orenstein. It’s still early days for Akiban product development, so some details haven’t been determined yet, and others I just haven’t yet pinned down. Still, I know a lot more than I did a day ago. Highlights of my talk with Akiban included: Read more
Categories: Akiban, MySQL, Object, OLTP, Software as a Service (SaaS) | 4 Comments |
Netezza nails April Fool’s Day
Netezza has nailed April Fool’s Day this year. 🙂 (Their site will revert to normal after April 1, so I may later edit this post accordingly.)
Related links
Categories: Data warehouse appliances, Data warehousing, Fun stuff, Humor, Netezza | Leave a Comment |
Pranks, apocryphal and otherwise
I’ve been posting a bit about pranks of various kinds, mainly geeky ones. But so far I’ve only covered real pranks, rather than the much funnier imaginary ones.
The classic of that genre, of course, is a certain database-oriented xkcd comic strip. (If you haven’t instantly guessed what I’m talking about, you must see that strip.) And in a similar vein, I further offer six examples of xkcd’s “My Hobby” strips. (The last two are not for the sexually squeamish, but the others are pretty G-rated.)
One thing I just learned about xkcd — if you mouse over the strip, you get another joke. Some are almost as funny as the main strip. So even if you have already seen the database-classic xkcd linked above, you might want to revisit it. 😉
In a very different vein is Dadhacker’s list of real or imaginary past shenanigans, (Edit: The original link is fried, but here’s a partial replacement) which starts:
I am not permitted to replace a coworker’s reference books (including his Knuth, Sedgewick, and C++ reference manuals) with several linear feet of steamy hardback romance novels.
I will not name my variables after nasty tropical diseases, or executives who are under indictment for fraud.
Elevators are not toys, nor should they ever be wired into the corporate net.
Funny and vaguely prankish (and not for the language-squeamish) is this non-xkcd comic about NoSQL. And finally (definitely also for the non-squeamish), see the first long comment in this Reddit thread, which seems to have successfully pranked a whole lot of readers.
Categories: Fun stuff, Humor, NoSQL | 3 Comments |
Quick news, links, comments, etc.
Some notes based on what I’ve been reading recently: Read more