Schooner — flash-based, now software-only, and very fast
Last October I wrote about Schooner Information Technology, which made flash-based appliances, for MySQL, memcached, or persistent memcached. Schooner sold those appliances to close to 20 customers, but even so decided software-only was a better way to go.
Schooner’s core value proposition is that one Schooner box with flash does the job of a lot of MySQL or NoSQL boxes with hard drives. Highlights of the Schooner story — of which you can find more detail at the Schooner website — now include: Read more
Categories: Clustering, memcached, MySQL, OLTP, Schooner Information Technology, Solid-state memory | 4 Comments |
ScaleBase, another MPP OLTP quasi-DBMS
Liran Zelkha of ScaleBase raised his hand on Twitter. It turns out ScaleBase has a story rather similar to that of CodeFutures/dbShards. That is:
- Like dbShards, ScaleBase is a proxy that looks to the application like a scale-out DBMS, but routes work to multiple servers running MySQL against different shards of the database. Other DBMS beyond MySQL are planned, but PostgreSQL — which dbShards supports — did not get mentioned.
- Sharding is done at configuration time, and is transparent to the application. You want to shard the big tables and replicate the small ones, because if you join two sharded tables, performance can be slow. ScaleBase may have more of a configuration-advisor wizard than dbShards does.
- Each shard is replicated to a mirror, in a high-availability way.
- You can use ScaleBase across multiple data centers, but there’s little or no magic to overcome the performance issues that would arise in many use cases.
- Much like dbShards, ScaleBase supports three kinds of sharding — hash, list, and range.
- ScaleBase currently has no support whatsoever for stored procedures, which is slightly less than dbShards has.
- Liran stresses that ScaleBase looks even to management tools — e.g. TOAD — like a single DBMS.
- ScaleBase runs on EC2 and private cloud.
Our talk didn’t get deeply technical, and I don’t know exactly how ScaleBase’s replication works. But a website reference to a small transaction log in a distributed cache does sound, while not identical to the dbShards approach, at least directionally similar.
ScaleBase is a year or so old, with about 6 people, based in the Boston area despite strong Israeli roots. ScaleBase has raised a round of venture capital; I didn’t ask for details.
Liran says that ScaleBase is in closed beta, with some production users, at least one of whom has over 100 database servers.
Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Parallelization, ScaleBase, Transparent sharding | 10 Comments |
dbShards update
I talked yesterday with Cory Isaacson of CodeFutures, and hence can follow up on my previous post about dbShards. dbShards basics include:
- dbShards gives you, in effect, an MPP DBMS based on MySQL or PostgreSQL, meant for OLTP (OnLine Transaction Processing). dbShards always did distributed queries, and now does distributed transactions as well.
- dbShards works by sharding the database and automagically sending work to the correct shard.
- For safety, dbShards of course replicates each shard. Contrary to what I said in the previous post, the replication method is not log-shipping.
- At this time, dbShards only works in a single data center.
- dbShards can handle any SQL that would work through, say, a JDBC driver, and is not particularly sensitive to data type. However, dbShards’ stored procedure support is iffy — if a procedure touches data in more than one shard, it simply fails.
One dbShards customer writes 1/2 billion rows on a busy day, and serves 3-4,000 pages per second, naturally with multiple queries per page. This is on a 32-node cluster, with uninspiring hardware, in the cloud. The database has 16 shards, aggregating 128 virtual shards. I forgot to ask how big the database actually is. Overall, dbShards is up to a dozen or so signed customers, half of whom are in production or soon will be.
dbShards’ replication scheme works like this: Read more
Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Parallelization, Transparent sharding | 9 Comments |
Choices in analytic computing system design
When I posted a long list of architectural options for analytic DBMS, I left a couple of IOUs in for missing parts. One was in the area of what is sometimes called advanced-analytics functionality, which roughly speaking means aspects of analytic database management systems that are not directly related to conventional* SQL queries.
*Main examples of “conventional” = filtering, simple aggregrations.
The point of such functionality is generally twofold. First, it helps you execute analytic algorithms with high performance, due to reducing data movement and/or executing the analytics in parallel. Second, it helps you create and execute sophisticated analytic processes with (relatively) little effort.
For now, I’m going to refer to an analytic RDBMS that has been extended by advanced-analytics functionality as an analytic computing system, rather than as some kind of “platform,” although I suspect the latter term is more likely to wind up winning. So far, there have been five major categories of subsystem or add-on module that contribute to making an analytic DBMS a more fully-fledged analytic computing system:
- SQL extensions. Examples include SQL-2003 analytics (notably windowing), or vendor-specific temporal functionality.
- A framework for UDFs (User-Defined Functions) to further extend SQL. At its core, a relational DBMS is a big SQL interpreter. SQL, while powerful, only does a limited number of things. User-Defined Functions are new predicates in the SQL language that do additional things.
- An execution engine for analytic processes that is less coupled to the SQL engine than a pure UDF framework might be. The two main approaches are MapReduce (e.g. Aster Data) and general C++ libraries (Netezza, ParAccel).
- Libraries of pre-built analytic processes. Commonly included are statistics, (other machine learning), general linear algebra, and Monte Carlo analysis. Some of these functions are fully parallelized (perhaps tens per vendor). Others just play nicely with the vendor’s execution framework, in that a separate copy can be run on each node (up to thousands per vendor, for those who bring in open source statistics libraries).
- Development tools such as integrated development environments (IDEs). Aster keeps trying to convince me that having built a nice Eclipse IDE is a major competitive differentiation.
Categories: Aster Data, MapReduce, Netezza, ParAccel, Parallelization, Predictive modeling and advanced analytics, Workload management | 8 Comments |
Notes, links, and comments January 20, 2011
I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)
First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.
The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.
One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more. Read more
Categories: About this blog, Analytic technologies, Data warehousing, GIS and geospatial, Investment research and trading, MongoDB, OLTP, Open source, PostgreSQL, Vertica Systems | 4 Comments |
Sound bites on HP/Microsoft and Neoview
HP and Microsoft put out a press release. Three new appliances are being announced, and we’re being reminded of at least one past announcement. I wasn’t briefed, and wouldn’t want to comment on, say, price/performance or feature particulars. That said:
- HP Neoview seems pretty dead.
- I haven’t heard a single favorable reference to HP Neoview since I remarked in March, 2010 that “HP Neoview is reeling.”
- A reporter asked me “What went wrong?” Well, almost any new analytic DBMS/appliance product will compete mainly on two things in its early days — price/performance (or absolute performance), and just how (im)mature it initially is. (Aster Data may be the only prominent exception to that rule.) Presumably, HP Neoview did badly by those metrics.
- HP Neoview was widely conjectured to be a pet project of ousted former HP CEO Mark Hurd.
- Nobody tells me of competing with Microsoft SQL Server 2008 Parallel Data Warehouse either (i.e. Madison/DATallegro). Thus, in particular, I haven’t heard any reason to believe there’s anything good about the technology, especially now that the ever-upbeat Stuart Frost has left Microsoft. I’m conjecturing that Parallel Data Warehouse is focused heavily on the existing Microsoft installed base.
- Speaking of Aster — even under NDA, they won’t tell me or give me any useful hints as to who their undisclosed strategic investor is. Well, HP has a long history of investing in sometimes-competing DBMS vendors (back to Oracle and Informix), and a good reason to keep quiet (reluctance to admit the end of Neoview). Hmm …
- The consolidation appliance in the HP/Microsoft announcement is a clear response to Oracle’s Exadata strategy, or (which is probably more accurate) to the same market opportunity Oracle identified.
- I couldn’t quite figure out whether the cheap data warehouse appliance included Microsoft PowerPivot support, but that would make sense if it did.
Categories: Aster Data, Data warehouse appliances, Data warehousing, HP and Neoview, Microsoft and SQL*Server | 3 Comments |
Architectural options for analytic database management systems
Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS. Let’s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general. Read more
Mike Stonebraker on “real column stores”
Mike Stonebraker has a post up on Vertica’s blog trying to differentiate “real” from “pretend” column stores. (Edit: That post seems to have come back down, but as of 1/19 it can be found in Google Cache.) In essence, Mike argues that the One Right Way to design a column store is Vertica’s, a position that Daniel Abadi used to share but since has retreated from.
There are some good things about that post, and some not-so-good. The worst paragraph is probably
Several row-store vendors (including Oracle, Greenplum and Aster Data) now claim to be selling a column store. Obviously, this would require a complete rewrite of a DBMS to move from Figure 1 to Figure 2. Hence, none of the “pretenders” have actually done this. Instead all have implemented some aspects of column stores, and then claim to be the real thing. This blog defines what the “real enchilada” looks like, and how to tell it from the pretenders.
which I question on two levels. Read more
Categories: Aster Data, Columnar database management, Database compression, Michael Stonebraker, Sybase, Theory and architecture, Vertica Systems | 24 Comments |
The six useful things you can do with analytic technology
I seem to be in the mode of sharing some of my frameworks for thinking about analytic technology. Here’s another one.
Ultimately, there are six useful things you can do with analytic technology:
- You can make an immediate decision.
- You can plan in support of future decisions.
- You can research, investigate, and analyze in support of future decisions.
- You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
- You can communicate, to help other people and organizations do these same things.
- You can provide support, in technology or data gathering, for one of the other functions.
Technology vendors often cite similar taxonomies, claiming to have all the categories (as they conceive them) nicely represented, in slickly integrated fashion. They exaggerate. Most of these categories are in rapid flux, and the rest should be. Analytic technology still has a long way to go.
In more detail: Read more
Categories: Analytic technologies, Business intelligence, Cognos, Data warehousing, RDF and graphs, Text | 13 Comments |
Document-oriented DBMS without joins
When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)
Categories: CouchDB, MarkLogic, NoSQL, Structured documents, Text, Theory and architecture | 8 Comments |