April 20, 2009

MySQL storage engine round-up, with Oracle-related thoughts

Here’s what I know about MySQL storage engines, more or less.

MySQL with MyISAM is fast. But it’s not transactional. Except for limited purposes, MySQL with MyISAM is a pretty crummy DBMS. Nothing can change that.
MySQL with InnoDB is transactional. But it’s not particularly fast. MySQL with InnoDB is a pretty mediocre DBMS. Oracle could fix that, at least partially, over time.
I don’t know much about Falcon, Maria, and so on. With Oracle winding up owning both MySQL and InnoDB, the motivation for those engines (except as Oracle-free forks) might fade.
Infobright is the most established of the rest. At the moment I’m not recommending it for most industrial-strength uses unless the user is particularly cash-constrained. But I wouldn’t be surprised if that changed soon. A cheap, fast, simple columnar analytic DBMS has a place in the world.
Kickfire is next in line, offering a hardware-based growth path for users who’ve maxed out on what unaided MySQL can do. It remains to be seen for how many users the desire to keep things simple and stay with MySQL outweighs the desire to avoid custom hardware. Having Oracle salespeople all over those accounts surely wouldn’t help. Kickfire also has a second market, namely OEM vendors who are mainly interested in the superfast chip. That would probably be pretty unaffected by Oracle.
Tokutek offers a technical proposition that’s hard to match head-on without going the CEP route. Users who care are likely to be MySQL shops. Tokutek’s main challenge is to prove that it sufficiently outdoes competing technical strategies for sufficiently many users. Oracle ownership of MySQL seems pretty irrelevant to Tokutek’s success or failure.
Calpont offers a kind of lightweight Exadata alternative. With Calpont’s packaging and positioning perennially unclear, it’s difficult to predict the effect of a particular change — i.e., Oracle buying MySQL — in Calpont’s market environment.
I haven’t heard from transactionally-oriented ScaleDB since I wrote about them a year ago. Apparently, they’re rolling out beta product this week, and their venerable techie guru sadly passed away earlier this month.

Categories: Calpont, Columnar database management, Data warehousing, Exadata, Infobright, Kickfire, MySQL, Open source, Oracle, Tokutek and TokuDB

14 Comments

April 20, 2009

Should the Oracle/MySQL combo face antitrust opposition?

Oracle is a powerhouse in database management systems, but it’s hardly a monopolist. IBM revels in contriving figures that show it to have market share comparable to Oracle’s, and Microsoft has a very solid position as well. Smaller players like Teradata, Sybase, and MySQL are also thriving. And of course there’s a whole wave of newer DBMS companies, from Netezza on, showing that the DBMS industry isn’t even the secure oligopoly it appeared to be earlier this decade.

However, it’s certainly legitimate to define a product category of “real” DBMS that includes everything from MySQL on up, but not Microsoft Access and other low-end data management products. In that universe, while MySQL is a trivial addition to Oracle’s revenue, it’s a huge increment to Oracle’s unit market share. A merged Oracle/MySQL will dwarf the competition in ways that Oracle or MySQL alone don’t. Read more

Categories: MySQL, Open source, Oracle

10 Comments

April 20, 2009

First thoughts on Oracle acquiring Sun

Wow.
And during the week of the MySQL conference, too.
In the must-read slide presentation, Oracle’s says all the right things about being committed to all product lines and technologies. On the whole, this is believable.
Oracle says it’s focusing Sun hardware sales on existing Oracle/Sun customers. Makes sense.
Oracle mentions OpenStorage prominently. Makes sense. Integrating DBMS with storage is Oracle’s high-end DBMS future. (E.g., Exadata.)
HP can’t be happy.
MySQL and InnoDB are reunited.
MySQL is apt to get decent, much as it would have under IBM.
Even so, if you really believe in open source’s freedom, it’s time to look at PostgreSQL …
… or EnterpriseDB’s Postgres Plus, although my recent dealings with EnterpriseDB underscore the importance of being VERY careful about counting your fingers after you shake hands with that company.
And I wouldn’t be surprised if another shoe dropped soon on the EnterpriseDB front. (Please excuse the mixed metaphor.)
I used to laugh at how many different app servers Sun had acquired. Oracle acquired a number too. Together it’s quite a pile of them.
Oracle says acquiring Java is a great big deal. I’m not sure I see why that would really be true.

More later. I have a radio interview in a few minutes on a very different subject.

Categories: EnterpriseDB and Postgres Plus, HP and Neoview, MySQL, Open source, Oracle, PostgreSQL

20 Comments

April 20, 2009

Calpont update — you read it here first!

Calpont has gone through a lot of strategy iterations since its founding. The super-short version is that Calpont originally planned an appliance built around a SQL chip, much like Kickfire. But after various changes in management and venture backing, Calpont turned itself into a software-only analytic DBMS vendor relying on a MySQL front end. Calpont is now at the stage of announcing an Early Adopter program at the MySQL conference on Wednesday, although details of Calpont’s product release timing, pricing, feature set, etc. are all To Be Determined.

Minor highlights of the Calpont technical story include: Read more

Categories: Calpont, Columnar database management, Data warehousing, MySQL, Open source, Parallelization, Theory and architecture

Infobright update

For the past couple of quarters, Infobright has been MySQL’s partner of choice for larger data warehousing applications. Infobright’s stated business metrics, and I quote, include:

> 50 Customers in 7 Countries

> 25 Partners on 4 continents

A vibrant open source community

+1 million visitors

Approaching 10,000 downloads

2,000 active community participants

These may be compared with analogous metrics Infobright offered in February.

Infobright has also made or promised a variety of technological enhancements. Ones that are either shipping now or promised soon include: Read more

Categories: Columnar database management, Data warehousing, Infobright, MySQL, Open source

6 Comments

April 16, 2009

Introduction to Tokutek

Tokutek has a paradoxical pitch: Tokutek writes data particularly quickly, and therefore you’re supposed to buy Tokutek for query-oriented uses. Highlights of the Tokutek story include:

Tokutek is a MySQL storage engine.
MySQL/Tokutek writes indexed data a lot faster than B-tree-based alternatives. (The claim is 10s of 1000s of rows per second on a single server.)
MySQL/Tokutek reads data at B-tree speeds. (But not, I presume, at the speed of specialized analytic DBMS.)
Tokutek is not yet ACID-compliant. They’re working on that, but we don’t know what the performance implications will be when they achieve it. ACID compliance won’t come as soon as the May release (Tokutek Version 2.0).
Tokutek has made one sale. Others are in the pipeline.

Tokutek’s initial target market is the usual combination of clickstream/personalization/other network management. The idea is that many data warehouse technologies have trouble getting latency below, say, 15 seconds to 5 minutes, at least at very high update volumes. So if immediacy is more important than raw complex query performance, Tokutek’s performance profile could be attractive. Read more

Categories: Data warehousing, MySQL, Tokutek and TokuDB, Web analytics

14 Comments

April 15, 2009

Cloudera presents the MapReduce bull case

Monday was fire-drill day regarding MapReduce vs. MPP relational DBMS. The upshot was that I was quoted in Computerworld and paraphrased in GigaOm as being a little more negative on MapReduce than I really am, in line with my comment

Frankly, my views on MapReduce are more balanced than [my] weary negativity would seem to imply.

Tuesday afternoon the dial turned a couple notches more positive yet, when I talked with Michael Olson and Jeff Hammerbacher of Cloudera. Cloudera is a new company, built around the open source MapReduce implementation Hadoop. So far Cloudera gives away its Hadoop distribution, without charging for any sort of maintenance or subscription, and just gets revenue from professional services. Presumably, Cloudera plans for this business model to change down the road.

Much of our discussion revolved around Facebook, where Jeff directed a huge and diverse Hadoop effort. Apparently, Hadoop played much of the role of an enterprise data warehouse at Facebook — at least for clickstream/network data — including:

2 1/2 petabytes of data managed via Hadoop
10 terabytes/day of data ingested via Hadoop (Edit: Some of these metrics have been updated in a subsequent post about Facebook.)
Ad targeting queries run every 15 minutes in Hadoop
Dashboard roll-up queries run every hour in Hadoop
Ad-hoc research/analytic Hadoop queries run whenever
Anti-fraud analysis done in Hadoop
Text mining (e.g., of things written on people’s “walls”) done in Hadoop
100s or 1000s of simultaneous Hadoop queries
JSON-based social network analysis in Hadoop

Some Facebook data, however, was put into an Oracle RAC cluster for business intelligence. And Jeff does concede that query execution is slower in Hadoop than in a relational DBMS. Hadoop was also used to build the index for Facebook’s custom text search engine.

Jeff’s reasons for liking Hadoop over relational DBMS at Facebook included: Read more

Categories: Analytic technologies, Cloudera, Data warehousing, EAI, EII, ETL, ELT, ETLT, Facebook, Hadoop, MapReduce, Petabyte-scale data management, RDF and graphs, Specific users, Web analytics

27 Comments

April 14, 2009

Maybe Amazon should be using a real DBMS after all

Supposedly

Amazon managers found that an employee who happened to work in France had filled out a field incorrectly and more than 50,000 items got flipped over to be flagged as “adult,” the source said. (Technically, the flag for adult content was flipped from ‘false’ to ‘true.’)

“It’s no big policy change, just some field that’s been around forever filled out incorrectly,” the source said.

Amazon employees worked on the problem well past midnight, and then handed it over to an international team, he said.

This was the best practice for reversing an error — how? Is SimpleDB somehow implicated? If this story is remotely true, and if there’s a sensible database architecture, I can’t imagine why there wouldn’t be a faster fix.

Categories: Amazon and its cloud

7 Comments

April 14, 2009

There always seems to be a fire drill around MapReduce news

Last August I flew out to see my new clients at Greenplum. They told me they planned to roll out MapReduce in a few weeks, and asked for my help in publicizing it. From their offices I went to dinner with non-clients Aster Data, who told me they’d gotten wind of a Greenplum MapReduce announcement and planned to come out ahead of it. A couple of hours later, Aster signed up as a client. In something of a pickle — but not one of my own making — I knocked heads, and persuaded both vendors to announce MapReduce at the same time, namely the following Monday. Lots of publicity ensued for both vendors, and everybody was reasonably satisfied. Read more

Categories: About this blog, Analytic technologies, Aster Data, Greenplum, MapReduce, Michael Stonebraker, Vertica Systems

1 Comment

April 14, 2009

eBay thinks MPP DBMS clobber MapReduce

I talked with Oliver Ratzesberger and his team at eBay last week, who I already knew to be MapReduce non-fans. This time I added more detail.

Oliver believes that, on the whole, MapReduce is 6-8X slower than native functionality in an MPP DBMS, and hence should only be used sporadically. This view is based on part on simulations eBay ran of the Terasort benchmark. On 72 Teradata nodes or 96 lower-powered nodes running another (currently unnamed, as per yet another of my PR fire drills) MPP DBMS, a simulation of Terasort executed in 78 and 120 secs respectively, which is very comparable to the times Google and Yahoo got on 1000 nodes or more.

And by the way, if you use many fewer nodes, you also consume much less floor space or electric power.

Categories: Analytic technologies, eBay, Hadoop, MapReduce, Parallelization, Teradata

11 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

MySQL storage engine round-up, with Oracle-related thoughts

Should the Oracle/MySQL combo face antitrust opposition?

First thoughts on Oracle acquiring Sun

Calpont update — you read it here first!

Infobright update

Introduction to Tokutek

Cloudera presents the MapReduce bull case

Maybe Amazon should be using a real DBMS after all

There always seems to be a fire drill around MapReduce news

eBay thinks MPP DBMS clobber MapReduce

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin