Workload management
Discussion of workload management technology, typically in analytic or mixed-workload DBMS.
Clarifying SAND’s customer metrics, positioning and technical story
Talking with my clients at SAND can be confusing. That said:
- I need to revise my figures for SAND’s customer count way downward.
- SAND finally has a reasonably clear positioning.
- SAND’s product actually seems to have a lot of features.
A few months ago, I wrote:
SAND Technology reported >600 total customers, including >100 direct.
Upon talking with the company, I need to revise that figure downward, from > 600 to 15.
Exasol update
I last wrote about Exasol in 2008. After talking with the team Friday, I’m fixing that now.
The general theme was as you’d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.
Top-level points included:
- Exasol’s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.
- Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.
- Exasol has 25 EXASolution customers, all in Germany.*
- 5 of those are “cloud” customers, at hosting providers engaged by Exasol.
- EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.
- Pretty much the whole company is in Nuremberg.
Hadapt is moving forward
I’ve talked with my clients at Hadapt a couple of times recently. News highlights include:
- The Hadapt 1.0 product is going “Early Access” today.
- General availability of Hadapt 1.0 is targeted for an officially unspecified time frame, but it’s soon.
- Hadapt raised a nice round of venture capital.
- Hadapt added Sharmila Mulligan to the board.
- Dave Kellogg is in the picture too, albeit not as involved as Sharmila.
- Hadapt has moved the company to Cambridge, which is preferable to Yale environs for obvious reasons. (First location = space they’re borrowing from their investors at Bessemer.)
- Headcount is in the low teens, with a target of doubling fast.
The Hadapt product story hasn’t changed significantly from what it was before. Specific points I can add include: Read more
| Categories: Hadapt, Hadoop, MapReduce, PostgreSQL, Theory and architecture, Workload management | 4 Comments |
MarkLogic’s Hadoop connector
It’s time to circle back to a subject I skipped when I otherwise wrote about MarkLogic 5: MarkLogic’s new Hadoop connector.
Most of what’s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you:
- Hadoop can talk XQuery to MarkLogic. But alternatively, Hadoop can use a long-established simple(r) Java API for streaming documents into or out of a MarkLogic database.
- Hadoop can make requests to MarkLogic in MarkLogic’s normal mode of operation, namely to address any node in the MarkLogic cluster, which then serves as a “head” node for the duration of that particular request. But alternatively, Hadoop can use a long-standing MarkLogic option to circumvent the whole DBMS cluster and only talk to one specific MarkLogic node.
Otherwise, the whole thing is just what you would think:
- Hadoop can read from and write to MarkLogic, in parallel at both ends.
- If Hadoop is just writing to MarkLogic, there’s a good chance the process is properly called “ETL.”
- If Hadoop is reading a lot from MarkLogic, there’s a good chance the process is properly called “batch analytics.”
MarkLogic said that it wrote this Hadoop connector itself.
| Categories: Clustering, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, MarkLogic, Parallelization, Workload management | 2 Comments |
Workload management and RAM
Closing out my recent round of Teradata-related posts, here’s a little anomaly:
- Teradata is proud that Teradata 14′s workload management now explicitly manages I/O, to go with Teradata’s long-standing management of CPU. Teradata’s WLM still does not explicitly manage RAM.
- Aster is proud that Aster 5′s workload management now explicitly manages RAM, to go along with the WLM capabilities Aster has had for a while managing CPU and I/O. Aster’s Tasso Argyros believes this is an important capability, at least in some edge cases.
- Mike Pilcher of SAND emailed me that SAND’s WLM capabilities to explicitly manage CPU, I/O, and RAM are very well-received by the marketplace.
| Categories: Aster Data, Data warehousing, SAND Technology, Teradata, Workload management | 4 Comments |
Aster Database Release 5 and Teradata Aster appliance
It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn’t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.
Along with the announcements comes updated positioning such as:
- Better SQL than the MapReduce alternatives have.
- Better MapReduce than the SQL alternatives have.
- Easy(ier) way to do complex analytics on multi-structured data. (Aster has embraced that term.)
and of course
- Now also with Teradata’s beautifully engineered hardware and system management software!
| Categories: Aster Data, Data warehouse appliances, Data warehousing, Predictive modeling and advanced analytics, Teradata, Workload management | Leave a Comment |
Virtual data marts in Sybase IQ
I made a few remarks about Sybase IQ 15.3 when it became generally available in July. Now that I’ve had a current briefing, I’ll make a few more.
The key enhancement in Sybase IQ 15.3 is distributed query — what others might call parallel query — aka PlexQ. A Sybase IQ query can now be distributed among many nodes, all talking to the same SAN (Storage-Area Network). Any Sybase IQ node can take the responsibility of being the “leader” for that particular query.
In itself, this isn’t that impressive; all the same things could have been said about pre-Exadata Oracle.* But PlexQ goes somewhat further than just removing a bottleneck from Sybase IQ. Notably, Sybase has rolled out a virtual data mart capability. Highlights of the Sybase IQ virtual data mart story include: Read more
| Categories: Columnar database management, Data warehousing, Oracle, Parallelization, Sybase, Theory and architecture, Workload management | 1 Comment |
Hadoop evolution
I wanted to learn more about Hadoop and its futures, so I talked Friday with Arun Murthy of Hortonworks.* Most of what we talked about was:
- NameNode evolution, and the related issue of file-count limitations.
- JobTracker evolution.
Arun previously addressed these issues and more in a June slide deck.
Read more
| Categories: Hadoop, MapReduce, Parallelization, Workload management, Yahoo | 6 Comments |
Eight kinds of analytic database (Part 1)
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more
Vertica as an analytic platform
Vertica 5.0 is coming out today, and delivering the down payment on Vertica’s analytic platform strategy. In Vertica lingo, there’s now a Vertica SDK (Software Development Kit), featuring Vertica UDT(F)s* (User-Defined Transform Functions). Vertica UDT syntax basics start: Read more
