Aster Data
Analysis of data warehouse DBMS vendor Aster Data. Related subjects include:
Hope for a new PostgreSQL era?
In a comedy of briefing errors, I’m not too clear on the details of my client salesforce.com’s new PostgreSQL-as-a-service offering, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said:
- PostgreSQL is good technology.
- MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways. (Database extensibility if nothing else.)
- PostgreSQL has a lot of users. (Many of them in academia and/or Russia.)
- Neither EnterpriseDB (which now calls itself “The enterprise PostgreSQL company”) nor the PostgreSQL community leadership have covered themselves with stewardship glory.
- A significant number of interesting DBMS products can be regarded as PostgreSQL forks (e.g. Greenplum, Aster Data nCluster, Netezza if you squint, and Vertica if you stand on your head*).
- PostgreSQL advancement is not dead. For example, Hadapt beta users are running actual PostgreSQL on many nodes each.
- There’s no assurance that Oracle will be a benevolent MySQL steward forever. (Specifically, Oracle’s “Play nicely with others” antitrust commitments expire in 2014.)
So I think it would be cool if one or the other big company put significant wood behind the PostgreSQL arrow.
*While Vertica was originally released using little or no PostgreSQL code — reports varied — it featured high degrees of PostgreSQL compatibility.
| Categories: Aster Data, EnterpriseDB and Postgres Plus, Greenplum, MySQL, Netezza, Open source, Vertica Systems, salesforce.com | 7 Comments |
Highlights of a busy news week
I put up 14 posts over the past week, so perhaps you haven’t had a chance yet to read them all.
Highlights included:
- My most important post of the week was a general guide to IT vendor strategy. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar.
- The best comment thread of the week was probably on my post about scale-out relational OLTP choices, in which people discussed the merits of various particular alternatives.
- I recommended that people strongly consider attending XLDB 5 in Menlo Park on October 18-19.
Most of the posts, however, were reactions to news events. In particular:
- Teradata announced that Teradata 14 will be hybrid-columnar, more in Vertica’s way than in Greenplum’s or Aster Data’s. (Pay no attention to the Wall Street Journal’s apparent belief that no other analytic DBMS is hybrid-columnar at all.)
- Aster announced the unsurprising news that there will be a Teradata Aster appliance. Also, Aster talked about greater analytic flexibility in the forthcoming Aster 5.0.
- With Oracle OpenWorld coming up, Oracle decided to get some of its announcing out of the way early. In particular, it announced the Oracle Database Appliance, which is small-business-friendly hardware for running the Oracle DBMS. However, the Oracle Database Appliance doesn’t seem to do much about the complexity of running the Oracle DBMS software.
- In a catch-all Hadoop post, I noted that:
- Oracle has now clearly said it has a Hadoop appliance coming, no doubt next week at OpenWorld.
- I still can’t see why Hadoop appliances would succeed, but a lot of smart folks seem to disagree with me.
- Greenplum announced what looks like a nice but unimportant little product upgrade.
- It’s a really good thing that previously reported plans to revamp Hadoop are underway.
- DataStax announced that it really is a Cassandra company after all. Pay no attention to previous marketing that seemed to put DataStax in the same Hadoop-alternative category as, say, MapR.
- Ingres has changed its name to Actian. The announcement seems like a confession that Ingres and VectorWise are going nowhere.
| Categories: Aster Data, Data warehousing, DataStax, Greenplum, Hadoop, Ingres, Teradata, VectorWise | Leave a Comment |
Workload management and RAM
Closing out my recent round of Teradata-related posts, here’s a little anomaly:
- Teradata is proud that Teradata 14′s workload management now explicitly manages I/O, to go with Teradata’s long-standing management of CPU. Teradata’s WLM still does not explicitly manage RAM.
- Aster is proud that Aster 5′s workload management now explicitly manages RAM, to go along with the WLM capabilities Aster has had for a while managing CPU and I/O. Aster’s Tasso Argyros believes this is an important capability, at least in some edge cases.
- Mike Pilcher of SAND emailed me that SAND’s WLM capabilities to explicitly manage CPU, I/O, and RAM are very well-received by the marketplace.
| Categories: Aster Data, Data warehousing, SAND Technology, Teradata, Workload management | 4 Comments |
Hybrid-columnar soundbites
Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday’s post on Teradata columnar:
- Oracle does not actually offer columnar I/O; the other three systems do. But see the “I won’t be surprised” part in yesterday’s Teradata post.
- Aster does not offer columnar compression; the other three do.
- EMC Greenplum and Teradata offer different kinds of ways to mix column and row storage in the same table; each has its advantages.
- Teradata generally has a more mature and capable offering than EMC Greenplum, for most purposes, whichever way you choose to organize your tables.
Edit: The Wall Street Journal got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote
While columnar technology has been around for years, Teradata says its product is unique because it allows users to include both columns and rows in the same database.
Googling on “Teradata To Unveil New Analytics Product To Speed Business Adoption” might get you around the paywall to see the offending piece.
| Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Teradata | 2 Comments |
Aster Database Release 5 and Teradata Aster appliance
It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn’t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.
Along with the announcements comes updated positioning such as:
- Better SQL than the MapReduce alternatives have.
- Better MapReduce than the SQL alternatives have.
- Easy(ier) way to do complex analytics on multi-structured data. (Aster has embraced that term.)
and of course
- Now also with Teradata’s beautifully engineered hardware and system management software!
| Categories: Aster Data, Data warehouse appliances, Data warehousing, Predictive modeling and advanced analytics, Teradata, Workload management | Leave a Comment |
Aster Data business trends
Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to acquisition by their new orange overlords. The answers aren’t what they used to be. Aster no longer focuses much on what it used to call frontline (i.e., low-latency, operational) applications; those are of course a key strength for Teradata. Rather, Aster focuses on investigative analytics — they’ve long endorsed my use of the term — and on the batch run/scoring kinds of applications that inform operational systems.
| Categories: Analytic technologies, Application areas, Aster Data, Data warehousing, DataStax, Liberty and privacy, RDF and graphs, Teradata, Web analytics | 1 Comment |
Data management at Zynga and LinkedIn
Mike Driscoll and his Metamarkets colleagues organized a bit of a bash Thursday night. Among the many folks I chatted with were Ken Rudin of Zynga, Sam Shah of LinkedIn, and D. J. Patil, late of LinkedIn. I now know more about analytic data management at Zynga and LinkedIn, plus some bonus stuff on LinkedIn’s People You May Know application.
It’s blindingly obvious that Zynga is one of Vertica’s petabyte-scale customers, given that Zynga sends 5 TB/day of data into Vertica, and keeps that data for about a year. (Zynga may retain even more data going forward; in particular, Zynga regrets ever having thrown out the first month of data for any game it’s tried to launch.) This is game actions, for the most part, rather than log files; true logs generally go into Splunk.
I don’t know whether the missing data is completely thrown away, or just stashed on inaccessible tapes somewhere.
I found two aspects of the Zynga story particularly interesting. First, those 5 TB/day are going straight into Vertica (from, I presume, memcached/Membase/Couchbase), as Zynga decided that sending the data to some kind of log first was more trouble than it’s worth. Second, there’s Zynga’s approach to analytic database design. Highlights of that include: Read more
| Categories: Aster Data, Couchbase, Data models and architecture, Games and virtual worlds, Greenplum, Hadoop, Petabyte-scale data management, Specific users, Vertica Systems, Zynga | 24 Comments |
Eight kinds of analytic database (Part 1)
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more
Notes and links, June 15, 2011
Five things: Read more
Alternatives for Hadoop/MapReduce data storage and management
There’s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and Bottleneck Whack-A-Mole remaining in its future.
Known HDFS and Hadoop data storage and management issues include but are not limited to:
- Hadoop is run by a master node, and specifically a namenode, that’s a single point of failure.
- HDFS compression could be better.
- HDFS likes to store three copies of everything, whereas many DBMS and file systems are satisfied with two.
- Hive (the canonical way to do SQL joins and so on in Hadoop) is slow.
Different entities have different ideas about how such deficiencies should be addressed. Read more
| Categories: Aster Data, Cassandra, Cloudera, Data warehouse appliances, DataStax, EMC, Greenplum, Hadapt, Hadoop, IBM and DB2, MapReduce, MongoDB and 10gen, Netezza, Parallelization | 21 Comments |
