MySQL, hash joins and Infobright
Over a 24 hour or so period, Daniel Abadi, Dmitriy Ryaboy and Randolph Pullen all remarked on MySQL’s lack of hash joins. (It relies on nested loops instead, which were state-of-the-art technology around the time of the Boris Yeltsin administration.) This led me to wonder — why is this not a problem for Infobright?
Per Infobright chief scientist Dominik Slezak, the answer is
Infobright perform joins using its own optimization/execution layers (that actually include hash join algorithms and advanced knowledge-grid-based nested loop optimizations in particular).
Categories: Infobright, MySQL, Theory and architecture | 4 Comments |
DataStax introduces a Cassandra-based Hadoop distribution called Brisk
Cassandra company DataStax is introducing a Hadoop distribution called Brisk, for use cases that combine short-request and analytic processing. Brisk in essence replaces HDFS (Hadoop Distributed File System) with a Cassandra-based file system called CassandraFS. The whole thing is due to be released (Apache open source) within the next 45 days.
The core claims for Cassandra/Brisk/CassandraFS are:
- CassandraFS has the same interface as HDFS. So, in particular, you should be able to use most Hadoop add-ons with Brisk.
- CassandraFS has comparable performance to HDFS on sequential scans. That’s without predicate pushdown to Cassandra, which is Coming Soon but won’t be in the first Brisk release.
- Brisk/CassandraFS is much easier to administer than HDFS. In particular, there are no NameNodes, JobTracker single points of failure, or any other form of head node. Brisk/CassandraFS is strictly peer-to-peer.
- Cassandra is far superior to HBase for short-request use cases, specifically with 5-6X the random-access performance.
There’s a pretty good white paper around all this, which also recites general Cassandra claims — [edit] and here at last is the link.
Categories: Cassandra, DataStax, Hadoop, HBase, MapReduce, Open source | 3 Comments |
Hadapt (commercialized HadoopDB)
The HadoopDB company Hadapt is finally launching, based on the HadoopDB project, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you get if you use Hive, but with much better performance* and perhaps somewhat better SQL functionality.** Advantages vs. a DBMS-based analytic platform that includes MapReduce — e.g. Aster Data — are less clear. Read more
MySQL soundbites
Oracle announced MySQL enhancements, plus intentions to use MySQL to compete against Microsoft SQL Server. My thoughts, lightly edited from an instant message Q&A, include:
- Given how hard Oracle fought the antitrust authorities to keep MySQL around the time of the acquisition, we always knew they were serious about the business.
- We’ll know they’re even more serious if they buy MySQL enhancements such as Infobright, dbShards, or Schooner MySQL.
- Oracle-quality MySQL’s most obvious target is SQL Server.
- But if you’ve bought into the Windows stack, why not stay bought-in?
- MySQL vs. SQL Server competition is mainly about new applications; few users will actually switch.
- A lot of SaaS vendors use Oracle Standard Edition, and have some MySQL somewhere as well. They don’t want to pay up for Oracle Enterprise Edition or Exadata. Good MySQL could suit them.
- Mainly, I see the Short Request Processing market as being a battle between MySQL versions and NoSQL systems. (I’m a VoltDB pessimist.)
The last question was “Is there an easy shorthand to describe how Oracle DB is superior to MySQL even with these improvements?” My responses, again lightly edited, were: Read more
Categories: Analytic technologies, Exadata, MySQL, NoSQL, Oracle, Software as a Service (SaaS) | 2 Comments |
Three ways Fedex is a metaphor for data integration
It occurs to me that there are three reasons why Federal Express, aka Fedex, is a great metaphor for data integration. Read more
Teradata, Aster Data, and Teradata/Aster
Teradata is acquiring Aster Data. Naturally, the deal is being presented with a Treaty of Tordesillas kind of positioning — Teradata does X, Aster Data does Y, and everybody looks forward to having X and Y in the same product portfolio. That said, my initial positioning and product strategy thoughts on the Teradata/Aster combination go something like this. Read more
Categories: Analytic technologies, Aster Data, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, RDF and graphs, Specific users, Teradata | 9 Comments |
Updating our vendor client disclosures
Edit: This disclosure has been superseded by a March, 2012 version.
From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:
- This is a list of Monash Advantage members.
- All our vendor clients are Monash Advantage members, unless …
- … we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
- We do not usually disclose our user clients.
- We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
- Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)
With that said, our vendor client disclosures at this time are:
- Aster Data
- Cloudera
- CodeFutures/dbShards
- Couchbase
- EMC/Greenplum
- Endeca
- IBM/Netezza
- Infobright
- Intel
- MarkLogic
- ParAccel
- QlikTech
- salesforce.com/database.com
- SAND Technology
- SAP/Sybase
- Schooner Information Technology
- Skytide
- Splunk
- Teradata
- Vertica
Some quick notes on HP-Vertica
HP is acquiring Vertica. Read more
Categories: In-memory DBMS, Investment research and trading, Memory-centric data management, StreamBase, Streaming and complex event processing (CEP), VoltDB and H-Store | 13 Comments |
Now we know why Vertica has been so weirdly evasive
Communicating with Vertica has been tricky recently. But HP is now announced to be buying Vertica, which pretty much forces me to comment about Vertica. 🙂 So I’ll indulge in a little bit of explanation as to what I know about Vertica, whether for publication or under NDA. My analysis of the HP/Vertica combination, and expectations for same, will go into another post. Read more
Categories: Analytic technologies, Data warehousing, HP and Neoview, Market share and customer counts, Michael Stonebraker, Vertica Systems | 10 Comments |
Upcoming webinar on investigative analytics
I recently coined the phrase investigative analytics to conflate
- Statistics, data mining, machine learning, and/or predictive analytics.
- The more research-oriented aspects of business intelligence tools:
- Ad-hoc query.
- Drilldown.
- Most things done by BI-using “business analysts”
- Most things within BI called “data exploration.”
- Analogous technologies as applied to non-tabular data types such as text or graph.
This will be be basis for my part of a webcast on March 10 at 11 am Pacific/2 pm Eastern time. The other main part of the webcast will be a demo by the webcast’s joint sponsors Aster Data and Tableau Software.
Some of Aster’s verbiage in describing and titling the webinar is so hyperbolic that I do not want to give the impression of endorsing it. But I am very hopeful that the webinar itself will be interesting and informative, and will point people at least somewhat in the direction of the benefits Aster is claiming.