March 24, 2011

MySQL, hash joins and Infobright

Over a 24 hour or so period, Daniel Abadi, Dmitriy Ryaboy and Randolph Pullen all remarked on MySQL’s lack of hash joins. (It relies on nested loops instead, which were state-of-the-art technology around the time of the Boris Yeltsin administration.) This led me to wonder — why is this not a problem for Infobright?

Per Infobright chief scientist Dominik Slezak, the answer is

Infobright perform joins using its own optimization/execution layers (that actually include hash join algorithms and advanced knowledge-grid-based nested loop optimizations in particular).

Categories: Infobright, MySQL, Theory and architecture

4 Comments

March 23, 2011

DataStax introduces a Cassandra-based Hadoop distribution called Brisk

Cassandra company DataStax is introducing a Hadoop distribution called Brisk, for use cases that combine short-request and analytic processing. Brisk in essence replaces HDFS (Hadoop Distributed File System) with a Cassandra-based file system called CassandraFS. The whole thing is due to be released (Apache open source) within the next 45 days.

The core claims for Cassandra/Brisk/CassandraFS are:

CassandraFS has the same interface as HDFS. So, in particular, you should be able to use most Hadoop add-ons with Brisk.
CassandraFS has comparable performance to HDFS on sequential scans. That’s without predicate pushdown to Cassandra, which is Coming Soon but won’t be in the first Brisk release.
Brisk/CassandraFS is much easier to administer than HDFS. In particular, there are no NameNodes, JobTracker single points of failure, or any other form of head node. Brisk/CassandraFS is strictly peer-to-peer.
Cassandra is far superior to HBase for short-request use cases, specifically with 5-6X the random-access performance.

There’s a pretty good white paper around all this, which also recites general Cassandra claims — [edit] and here at last is the link.

Categories: Cassandra, DataStax, Hadoop, HBase, MapReduce, Open source

3 Comments

March 23, 2011

Hadapt (commercialized HadoopDB)

The HadoopDB company Hadapt is finally launching, based on the HadoopDB project, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you get if you use Hive, but with much better performance* and perhaps somewhat better SQL functionality.** Advantages vs. a DBMS-based analytic platform that includes MapReduce — e.g. Aster Data — are less clear. Read more

Categories: Analytic technologies, Data warehousing, Hadapt, Hadoop, MapReduce, MySQL, Open source, Parallelization, PostgreSQL, SQL/Hadoop integration, Theory and architecture, VectorWise

12 Comments

March 15, 2011

MySQL soundbites

Oracle announced MySQL enhancements, plus intentions to use MySQL to compete against Microsoft SQL Server. My thoughts, lightly edited from an instant message Q&A, include:

Given how hard Oracle fought the antitrust authorities to keep MySQL around the time of the acquisition, we always knew they were serious about the business.
We’ll know they’re even more serious if they buy MySQL enhancements such as Infobright, dbShards, or Schooner MySQL.
Oracle-quality MySQL’s most obvious target is SQL Server.
But if you’ve bought into the Windows stack, why not stay bought-in?
MySQL vs. SQL Server competition is mainly about new applications; few users will actually switch.
A lot of SaaS vendors use Oracle Standard Edition, and have some MySQL somewhere as well. They don’t want to pay up for Oracle Enterprise Edition or Exadata. Good MySQL could suit them.
Mainly, I see the Short Request Processing market as being a battle between MySQL versions and NoSQL systems. (I’m a VoltDB pessimist.)

The last question was “Is there an easy shorthand to describe how Oracle DB is superior to MySQL even with these improvements?” My responses, again lightly edited, were: Read more

Categories: Analytic technologies, Exadata, MySQL, NoSQL, Oracle, Software as a Service (SaaS)

2 Comments

March 6, 2011

Three ways Fedex is a metaphor for data integration

It occurs to me that there are three reasons why Federal Express, aka Fedex, is a great metaphor for data integration. Read more

Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, SnapLogic

2 Comments

March 4, 2011

Teradata, Aster Data, and Teradata/Aster

Teradata is acquiring Aster Data. Naturally, the deal is being presented with a Treaty of Tordesillas kind of positioning — Teradata does X, Aster Data does Y, and everybody looks forward to having X and Y in the same product portfolio. That said, my initial positioning and product strategy thoughts on the Teradata/Aster combination go something like this. Read more

Categories: Analytic technologies, Aster Data, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, RDF and graphs, Specific users, Teradata

9 Comments

February 28, 2011

Updating our vendor client disclosures

Edit: This disclosure has been superseded by a March, 2012 version.

From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:

This is a list of Monash Advantage members.
All our vendor clients are Monash Advantage members, unless …
… we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
We do not usually disclose our user clients.
We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)

With that said, our vendor client disclosures at this time are:

Aster Data
Cloudera
CodeFutures/dbShards
Couchbase
EMC/Greenplum
Endeca
IBM/Netezza
Infobright
Intel
MarkLogic
ParAccel
QlikTech
salesforce.com/database.com
SAND Technology
SAP/Sybase
Schooner Information Technology
Skytide
Splunk
Teradata
Vertica

Categories: About this blog, Aster Data, Cloudera, Couchbase, dbShards and CodeFutures, EMC, Greenplum, IBM and DB2, Infobright, Intel, MarkLogic, Netezza, ParAccel, QlikTech and QlikView, SAND Technology, SAP AG, Schooner Information Technology, Splunk, Sybase, Tableau Software, Teradata, Vertica Systems

1 Comment

February 14, 2011

Some quick notes on HP-Vertica

HP is acquiring Vertica. Read more

Categories: In-memory DBMS, Investment research and trading, Memory-centric data management, StreamBase, Streaming and complex event processing (CEP), VoltDB and H-Store

13 Comments

February 14, 2011

Now we know why Vertica has been so weirdly evasive

Communicating with Vertica has been tricky recently. But HP is now announced to be buying Vertica, which pretty much forces me to comment about Vertica. 🙂 So I’ll indulge in a little bit of explanation as to what I know about Vertica, whether for publication or under NDA. My analysis of the HP/Vertica combination, and expectations for same, will go into another post. Read more

Categories: Analytic technologies, Data warehousing, HP and Neoview, Market share and customer counts, Michael Stonebraker, Vertica Systems

10 Comments

February 12, 2011

Upcoming webinar on investigative analytics

I recently coined the phrase investigative analytics to conflate

Statistics, data mining, machine learning, and/or predictive analytics.

The more research-oriented aspects of business intelligence tools:

Ad-hoc query.

Drilldown.

Most things done by BI-using “business analysts”

Most things within BI called “data exploration.”

Analogous technologies as applied to non-tabular data types such as text or graph.

This will be be basis for my part of a webcast on March 10 at 11 am Pacific/2 pm Eastern time. The other main part of the webcast will be a demo by the webcast’s joint sponsors Aster Data and Tableau Software.

Some of Aster’s verbiage in describing and titling the webinar is so hyperbolic that I do not want to give the impression of endorsing it. But I am very hopeful that the webinar itself will be interesting and informative, and will point people at least somewhat in the direction of the benefits Aster is claiming.

Categories: Analytic technologies, Aster Data, Business intelligence, Data warehousing, Presentations, Tableau Software

3 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in