Analysis of Sybase and its various product lines, such as Sybase IQ. Related subjects include:
The cardinal rules of DBMS development
Rule 1: Developing a good DBMS requires 5-7 years and tens of millions of dollars.
That’s if things go extremely well.
Rule 2: You aren’t an exception to Rule 1.
- Concurrent workloads benchmarked in the lab are poor predictors of concurrent performance in real life.
- Mixed workload management is harder than you’re assuming it is.
- Those minor edge cases in which your Version 1 product works poorly aren’t minor after all.
DBMS with Hadoop underpinnings …
… aren’t exceptions to the cardinal rules of DBMS development. That applies to Impala (Cloudera), Stinger (Hortonworks), and Hadapt, among others. Fortunately, the relevant vendors seem to be well aware of this fact. Read more
Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — evaluations
To my taste, the most glaring mis-rankings in the 2012/2013 Gartner Magic Quadrant for Data Warehouse Database Management are that it is too positive on Kognitio and too negative on Infobright. Secondarily, it is too negative on HP Vertica, and too positive on ParAccel and Actian/VectorWise. So let’s consider those vendors first.
Gartner seems confused about Kognitio’s products and history alike.
- Gartner calls Kognitio an “in-memory” DBMS, which is not accurate.
- Gartner doesn’t remark on Kognitio’s worst-in-class* compression.
- Gartner gives Kognitio oddly high marks for a late, me-too Hadoop integration strategy.
- Gartner writes as if Kognitio’s next attempt at the US market will be the first one, which is not the case.
- Gartner says that Kognitio pioneered data warehouse SaaS (Software as a Service), which actually has existed since the pre-relational 1970s.
Gartner is correct, however, to note that Kognitio doesn’t sell much stuff overall.
In the cases of HP Vertica, Infobright, ParAccel, and Actian/VectorWise, the 2012 Gartner Magic Quadrant for Data Warehouse Database Management’s facts are fairly accurate, but I dispute Gartner’s evaluation. When it comes to Vertica: Read more
Analyzing companies of any size is hard. Analyzing large ones, however, is harder yet.
- I get (much) less substance in an hour on the phone with a megacorp than I do when I talk with a smaller company.
- What large companies say is less reliable than what I hear from smaller ones.
- Large companies have policies, procedures, bureaucracy and attitudes that get in the way of communicating in the first place.
Such limitations should be borne in mind in connection with anything I write about, for example, Oracle, Microsoft, IBM, or SAP.
There are many reasons for large companies to communicate less usefully with analysts than smaller ones do. Some of the biggest are:
- For reasons of internal information flow, the people I talk with just know less than their counterparts at smaller companies. Similarly, what they do “know” is more often wrong, since different parts of the same company may not hold identical views.
- That’s when we talk about real issues at all, which can get crowded out by large companies’ voluminous efforts in complex positioning, messaging, and product names.
- Huge companies have huge bureaucracies, and they hurt.
- A small company C-level executive can make smart decisions about what to say or not say. A large company minion doesn’t have the same freedom.
- Just the process of getting access to even a mid-level spokesminion at a large company is harder than reaching a senior person at a smaller outfit.
- Large firms are clearest when communicating with their existing customers and those organizations’ key influencers. They’re less effective or clear when opening themselves up to competitive comparisons.
- If a company wants to behave unethically in its analyst dealings, there are economies of scale to doing so.
|Categories: About this blog, IBM and DB2, Microsoft and SQL*Server, Oracle, SAP AG, Sybase||6 Comments|
This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!
Note: Words and phrases in italics will be linked to other entries when the glossary is complete.
“In-database analytics” is a catch-all term for analytic capabilities, beyond standard SQL, running on the same machine as and under the management of an analytic DBMS. These can run in one or both of two modes:
- In-process or unfenced, i.e. in the same process as the DBMS itself. This option gives maximum performance, but any defects in the analytic code may crash the whole DBMS. Also, it generally requires that the code be in the same language as the DBMS, i.e. C++.
- Out-of-process or fenced, i.e. in a separate process. This option sacrifices performance, in favor of reliability and language flexibility.
In-database analytics may offer great performance and scalability advantages versus the alternative of extracting data and having it be processed on a separate server. This is particularly likely to be the case in MPP (Massively Parallel Processing) analytic DBMS environments.
Examples of in-database analytics include:
- Creating temporary data structures that persist past the life of a query.
- Creating temporary data structures that are non-tabular.
- Predictive modeling that uses all the same nodes in an MPP cluster where the data resides.
- Predictive analytics (scoring only).
Other common domains for in-database analytics include sessionization, time series analysis, and relationship analytics.
Notable products offering in-database analytics include:
- Teradata Aster SQL/MR.
- Multiple other analytic platforms, such as Sybase IQ, Vertica, or IBM Netezza. Indeed, in-database analytics are a defining feature of analytic platforms.
- Fuzzy Logix (for predictive analytics).
|Categories: Analytic glossary, Aster Data, Data warehousing, IBM and DB2, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, Sybase, Teradata, Vertica Systems||8 Comments|
In a call Monday with a prominent company, I was told:
- Teradata, Netezza, Greenplum and Vertica aren’t relational.
- Teradata, Netezza, Greenplum and Vertica are all data warehouse appliances.
That, to put it mildly, is not accurate. So I shall try, yet again, to set the record straight.
In an industry where people often call a DBMS just a “database” — so that a database is something that manages a database! — one may wonder why I bother. Anyhow …
1. The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.
2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:
- Was designed to do relational stuff* from the get-go, even if it now does other things too.
- Supports a lot of SQL.
- Was designed primarily to do non-relational things.*
- Doesn’t support all that much SQL.
*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
3. There are two chief kinds of relational DBMS: Read more
I’d like to survey a few related ideas:
- Enterprises should each have a variety of different analytic data stores.
- Vendors — especially but not only IBM and Teradata — are acknowledging and marketing around the point that enterprises should each have a number of different analytic data stores.
- In addition to having multiple analytic data management technology stacks, it is also desirable to have an agile way to spin out multiple virtual or physical relational data marts using a single RDBMS. Vendors are addressing that need.
- Some observers think that the real essence of analytic data management will be in data integration, not the actual data management.
Here goes. Read more
|Categories: Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Exadata, Greenplum, Hadoop, Hortonworks, IBM and DB2, Informatica, Netezza, Oracle, Sybase, Teradata, Workload management||12 Comments|
This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:
- In general, I regard Gartner Magic Quadrants as a bad use of good research.
- Illustrating the uselessness of — or at least poor execution on — the overall quadrant metaphor, a large majority of the vendors covered are lined up near the line x = y, each outpacing the one below in both of the quadrant’s dimensions.
- I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year’s versions. Two factors jump to mind as possible reasons:
- This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn’t add as much discussion of overall trends. So there’s less to (potentially) disagree with.
- Merv Adrian is now at Gartner.
- Whatever the problems may be with Gartner’s approach, the whole thing comes out better than do Forrester’s failed imitations.
*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.
Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more
I find myself in need of a word or phrase that means bring data together from various sources so that it’s ready to be used, where the use can be analysis or operations. The first words I thought of were “aggregation” and “collection,” but they both have other meanings in IT. Even “data marshalling” has a specific meaning different from what I want. So instead, I’ll go with data mustering.
I mean for the term “data mustering” to encompass at least three scenarios:
- Integrated (relational) data warehouse.
- Big bit bucket.
- Big bit stream.
Let me explain what I mean by each. Read more
|Categories: Complex event processing (CEP), Data warehousing, Investment research and trading, Sybase, Teradata||12 Comments|
I last wrote about Exasol in 2008. After talking with the team Friday, I’m fixing that now. The general theme was as you’d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.
Top-level points included:
- Exasol’s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.
- Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.
- Exasol has 25 EXASolution customers, all in Germany.*
- 5 of those are “cloud” customers, at hosting providers engaged by Exasol.
- EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.
- Pretty much the whole company is in Nuremberg.
|Categories: Benchmarks and POCs, Columnar database management, Data warehousing, Database compression, Exasol, Market share and customer counts, Pricing, Software as a Service (SaaS), Specific users, Sybase, Workload management||1 Comment|
When I agreed to launch the StreamBase LiveView product via DBMS 2, I planned to catch up on the whole CEP/streaming area first. Due to the power and internet outages last week, that didn’t entirely happen. So I’ll do a bit of that now, albeit more cryptically than I hoped and intended.
- The upshot of my what to call CEP thread in August was that “streaming” and “event processing” are not the same concept, but it so happens that they have the most traction where they intersect. That said, I both observe and endorse an apparent shift from “event” to “stream” as the core of the terminology, in a reversal of my opinion of several years ago.
- IBM continues to throw a lot of resources at its System S/ InfoSphere Streams product, but I haven’t heard yet of much marketplace success. That said, I believe IBM is still pretty serious about Streams, as one would expect from an effort whose code name so cheekily references System R. In particular, Streams shows up prominently on IBM’s top-level analytic architecture slide.
- Sybase recently released its ESP (Event Stream Processor) 5.0, which it says is the full merger of the Aleri and Coral8 predecessors. You can still get Sybase ESP without buying into the full Sybase RAP stack, and Sybase has no plans to change that.
- Sybase has discontinued all the business intelligence types of products Aleri and Coral8 were developing. Rather, Sybase is OEMing Panopticon, which it reports has been well received. Other than the discontinuation of the BI efforts, there seem to be few Aleri or Coral8 features missing from the merged Sybase ESP product.
- Truviso continues to be out of the picture.
- I have more to say about StreamBase separately.
- I have more to say about Sybase and IBM, which I’ll get to when I can.
- I have nothing new on Progress Apama. I also know little about any of the open source efforts.
Meanwhile, if you want to see technically nitty-gritty posts about the CEP/streaming area, you may want to look at my CEP/streaming coverage circa 2007-9, based on conversations with (among others) Mike Stonebraker, John Bates, and Mark Tsimelzon.
|Categories: Business intelligence, Complex event processing (CEP), IBM and DB2, StreamBase, Sybase, Truviso||4 Comments|