May 13th, 2008 Curt Monash
McObject — vendor of memory-centric DBMS eXtremeDB — is a tiny, tiny company, without a development team of the size one would think needed to turn out one or more highly-reliable DBMS. So I haven’t spent a lot of time thinking about whether it’s a serious alternative to solidDB for embedded DBMS, e.g. in telecom equipment. However:
- IBM’s acquisition of Solid seems to suggest a focus on DB2 caching rather than the embedded market
- McObject actually has built up something of a customer list, as per the boilerplate on any of its press releases.
And they do seem to have some nice features, including Patricia tries (like solidDB), R-trees (for geospatial), and some kind of hybrid disk-centric/memory-centric operation.
Posted in GIS and geospatial, McObject and eXtremeDB, Memory-centric data management, solidDB | 2 Comments »
May 8th, 2008 Curt Monash
In which we bring you another instantiation of Monash’s First Law of Commercial Semantics: Bad jargon drives out good.
When Enterprise DB announced a partnership with Truviso for a “blade,” I naturally assumed they were using the term in a more-or-less standard way, and hence believed that it was more than a “Barney” press release.* Silly me. Rather than referring to something closely akin to “datablade,” EnterpriseDB’s “blade” program turns out to just to be a catchall set of partnerships.
*A “Barney” announcement is one whose entire content boils down to “I love you; you love me.”
According to EnterpriseDB CTO Bob Zurek, the main features of the “blade” program include:
Read the rest of this entry »
Posted in Data types, EnterpriseDB and Postgres Plus, Open source RDBMS, Portability, transparency, and plug-compatibility, PostgreSQL, Relational database management systems, Specialized data management in general | 3 Comments »
April 29th, 2008 Curt Monash
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
-
There’s data being managed transactionally by EnterpriseDB.
-
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
-
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Data types, EnterpriseDB and Postgres Plus, Games and virtual worlds, Memory-centric data management, Open source RDBMS, PostgreSQL, Specialized data management in general, Truviso | 1 Comment »
April 29th, 2008 Curt Monash
Mark Logic* has an interesting, complex story. They sell a technology stack based on an XML DBMS with text search designed in from the get go. They usually want to be known as a “content” technology provider rather than a DBMS vendor, but not quite always.
*Note: Product name = MarkLogic, company name = Mark Logic.
I’ve agreed to do a white paper and webcast for Mark Logic (sponsored, of course). But before I start serious work on those, I want to blog based on what I know. As always, feedback is warmly encouraged.
Some of the big differences between MarkLogic and other DBMS are:
-
MarkLogic’s primary DML/DDL (Data Manipulation/Description Language) is XQuery. Indeed, Mark Logic is in many ways the chief standard-bearer for pure XQuery, as opposed to SQL/XQuery hybrids.
-
MarkLogic’s XML processing is much faster than many alternatives. A client told me last year that – in an application that had nothing to do with MarkLogic’s traditional strength of text search – MarkLogic’s performance beat IBM DB2/Viper’s by “an order of magnitude.” And I think they were using the phrase correctly (i.e., 10X or so).
-
MarkLogic indexes all kinds of entities and facts, automagically, without any schema-prebuilding. (Nor, I gather, do they depend on individual documents carrying proper DTDs.) So there actually isn’t a lot of DDL. (Mark Logic claims in one test MarkLogic had more or less 0 DDL, vs. 20,000 lines in DB2/Viper.) What MarkLogic indexes includes, as Mark Logic puts it:
-
As opposed to most extended-relational DBMS, MarkLogic indexes all kinds of information in a single, tightly integrated index. Mark Logic claims this is part of the reason for MarkLogic’s good performance, and asserts that competitors’ lack of full integration often causes overhead and/or gets in the way of optimal query plans. (For example, Mark Logic claims that Microsoft SQL Server’s optimizer is so FUBARed that it always does the text part of a search first.) Interestingly, Intersystems’ object-oriented Cache’ does pretty much the same thing.
-
MarkLogic is proud of its text search extensions to XQuery. I’ve neglected to ask how that relates to the XQuery standards process. (For example, text search wasn’t integrated into the SQL standard until SQL3.)
Other architectural highlights include:
Read the rest of this entry »
Posted in Data types, IBM and DB2, Mark Logic, Native XML | 1 Comment »
March 28th, 2008 Curt Monash
Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.
I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …
Please subscribe to our feed!
Posted in Database theory and practice, MySQL, Native XML | 2 Comments »
March 6th, 2008 Curt Monash
As usual, Microsoft forgot to brief me, but Mary Jo Foley reports on Microsoft SQL Server Data Services. A look at the official site clarifies that this database-in-a-cloud offering uses “Microsoft SQL Server as a data storage node.” However, there seems to be a software layer on top of SQL Server providing scale-out and appropriate management.
In addition to the more-than-SQL-Server layer, there seems to be a less-than-SQL-Server aspect as well. In a particular, Microsoft SQL Server Data Services boasts “Support for simple types: string, numeric, datetime, boolean.” XML is the “primary wire format,” and hints dropped about the schema philosophy sound XMLish too.
Interestingly, Foley reports that Microsoft plans to offer an on-premises version of Microsoft SQL Server Data Services as well.
Please subscribe to our feed!
Posted in Cloud computing, Microsoft and SQL*Server, Native XML | No Comments »
February 16th, 2008 Curt Monash
In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog.
- OLTP DBMSs focused on fast, reliable transaction processing
- Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
-
Science DBMSs — after all MatLab does not scale to disk-sized arrays
- RDF stores focused on efficiently storing semi-structured data in this format
-
XML stores focused on semi-structured data in this format
- Search engines — the big players all use proprietary engines in this area
- Stream Processing Engines focused on real-time StreamSQL
- “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
- MapReduce and Hadoop — after all Google has enough “throw weight” to define a category
He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS. Read the rest of this entry »
Posted in Data types, Database diversity, Database theory and practice, Michael Stonebraker, Mid-range DBMS, OLTP database management, RDF and graphs, Relational database management systems | No Comments »
February 15th, 2008 Curt Monash
This is the fifth of a five-part series on database management system choices. For the first post in the series, please click here.
Relational database management systems have three essential elements:
- Rows and columns. Theoretically, rows and columns may be inessential to the relational model. But in reality, they are built into the design of every real-world relational product. If you don’t have rows and columns, you’re not using the product to do what it was well-designed for.
- Predicate logic. Theoretically, everything can be fitted into a predicate Procrustean bed. But if you’re looking for relevancy rankings on a text search, binary logic is a highly convoluted way to get them.
- Fixed schemas. Database theorists commonly assume that databases have fixed schemas. If this means that 90%+ of all information is null or missing, they have elegant ways of dealing with that. Even so, as computing gets ever more concerned with individuals — each with his/her/its unique “profile(s)” — fixed schemas get ever harder to maintain.
If any of these three elements is missing or inappropriate, then a traditional relational database management system may not be the best choice.
Read the rest of this entry »
Posted in Data types, Database diversity, Database theory and practice | 1 Comment »
February 1st, 2008 Curt Monash
Dan Weinreb was one of the key techies at Object Design, the company that made the object-oriented database management system ObjectStore. (Object Design later merger into Excelon, which was eventually sold to Progress, which has deemphasized but still supports ObjectStore.) Recently he wrote a pair of long and fascinating articles about Object Design, ObjectStore, and OODBMS, the first of which makes the case that “object-oriented database management systems succeeded.” Read the rest of this entry »
Posted in Objects, Progress, Apama, and DataDirect | No Comments »
February 1st, 2008 Curt Monash
I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.
Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read the rest of this entry »
Posted in CouchDB, Database diversity, Database theory and practice, Native XML | 3 Comments »
January 31st, 2008 Curt Monash
My recent post about datatype extensibility zoomed over at least one head, as per the comment thread. Since then I’ve googled, and come to suspect that part of what I was assuming as common knowledge may not be so common after all. So I’m going to back up and explain a bit about data access methods, as well as the sub-topic of data structures. If you take nothing else away from this post, I hope it will at least remind of you of the sheer variety of ways data can be stored on disk or in RAM.
First, let’s define the concept of data access method in three steps:
Read the rest of this entry »
Posted in Data types | 1 Comment »
January 28th, 2008 Curt Monash
Question of the day #2
Who is actually using native XML?
Mark Logic is having a fine time using its native XML engine for custom publishing. One outfit I know of is using a native XML for something like web analytics, but is driving me crazy by never coming through on permission to divulge details. There’s a bit of native XML use out there supporting the insurance industry’s ACORD standard.
And after that I quickly run out of examples of native XML use. Read the rest of this entry »
Posted in Data types, IBM and DB2, Mark Logic, Microsoft and SQL*Server, Native XML, Oracle | 1 Comment »
January 28th, 2008 Curt Monash
I have quite the excess of “flu-like symptoms,” and nothing substantive I’m writing today is coming to fruition. So instead of forcing the issue, I’m going to put a few questions out for discussion.
Question of the day #1
Is anybody indexing the actual contents of still images, video, or sound files?
Obviously, there are applications that serve huge numbers of videos, pictures, and/or songs — YouTube, Flickr, iTunes, and so on. But generally, these media are just handled as files or BLOBs, while all the database indexing is on alphanumeric metadata such as title, tags, uploader, date, download stats, comments, and so on.
The technology certainly exists to be more sophisticated. Consider, for example, Oracle’s Still Image datatype, which in typical Oracle fashion implements the relevant parts of SQL/MM and goes yet further. Read the rest of this entry »
Posted in Data types, Oracle | 1 Comment »
January 27th, 2008 Curt Monash
Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.
“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.
But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:
- Text. Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at all.
- Geospatial. Information about locations on the earth’s surface is essentially two-dimensional. Some geospatial apps use three dimensions.
- Object. There are two main reasons for using object datatypes. First, the data can have complex internal structures. Second, it can comprise a variety of simpler types. Object structures are well-suited for engineering and medical applications.
- XML. A great deal of XML is, at its heart, either relational/tabular data or text documents. Still, there are a variety of applications for which the most natural datatype truly is XML.
Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).
Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment.
Read the rest of this entry »
Posted in Data types, GIS and geospatial, Native XML, Objects, Text | 9 Comments »
January 24th, 2008 Curt Monash
Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:
Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.
Read the rest of this entry »
Posted in Data types, Google, BigTable, and MapReduce, Scientific research | No Comments »
December 17th, 2007 Curt Monash
Every few months I try to make contact with Intersystems. Sometimes they graciously respond, promising to schedule a briefing, which then never happens. Other times they don’t even bother. Now, on one level I can’t blame them, based on what happened at my last briefing. Read the rest of this entry »
Posted in Hierarchies, networks, graphs, and trees, Objects | 5 Comments »
December 8th, 2007 Curt Monash
Since I was researching Software AG anyway, I took the opportunity to ask about Software AG’s native XML DBMS Tamino, which certainly has some fans. Jim Fowler, Software AG’s Director of Market Development, Enterprise Transaction Systems, was kind enough to write up the following for me:
As you know, when Tamino was released in the late 1990s it was one of the first – if not the first – commercially available native XML database. We now have several hundred Tamino customers worldwide, and Software AG is fully committed to supporting our customers.
At the same time, we recognize that XML has matured and evolved in many different directions during the past decade;
Read the rest of this entry »
Posted in Data types, Native XML, Software AG and ADABAS | No Comments »
December 5th, 2007 Curt Monash
I’m going to praise EnterpriseDB’s marketing communications twice in two blog posts, because I really liked some of the crunch they put into a press release announcing a MySQL replacement at FortiusOne. To wit (emphasis mine):
The PostGIS geospatial extensions to PostgreSQL played a key role in FortiusOne’s selection of EnterpriseDB Advanced Server, a PostgreSQL-based solution, and dramatically improved performance. FortiusOne needed to run complex spatial queries against large datasets quickly and efficiently, and found the MySQL spatial extensions to be far less complete and comprehensive than PostGIS. EnterpriseDB Advanced Server processes some of GeoCommons’ database-intensive rendering requests in one-thirtieth of the time required by MySQL. During peak loads, GeoCommons processes more than one hundred thousand complex requests per hour, requiring true enterprise-class performance and scalability.
Another major factor in FortiusOne’s replacement of MySQL with EnterpriseDB Advanced Server was the company’s need for advanced partitioning, custom triggers, and functional indexing. EnterpriseDB’s advanced partitioning capabilities instantly enabled linear performance, even with tables having billions of rows.
Read the rest of this entry »
Posted in Data types, EnterpriseDB and Postgres Plus, GIS and geospatial, MySQL | 10 Comments »
November 7th, 2007 Curt Monash
Vertica quietly announced an appliance bundling deal with HP and Red Hat today. That got me quickly onto the phone with Vertica’s Andy Ellicott, to discuss a few different subjects. Most interesting was the part about Vertica’s customer base, highlights of which included:
- Vertica’s claim to have “50” customers includes a bunch of unpaid licenses, many of them in academia.
- Vertica has about 15 paying customers.
- Based on conversations with mutual prospects, Vertica believes that’s more customers than DATAllegro has. (Of course, each DATAllegro sale is bigger than one of Vertica’s. Even so, I hope Vertica is wrong in its estimate, since DATAllegro told me its customer count was “double digit” quite a while ago.)
- Most Vertica customers manage over 1 terabyte of user data. A couple have bought licenses showing they intend to manage 20 terabytes or so.
- Vertica’s biggest customer/application category – existing customers and sales pipelines alike – is call detail records for telecommunications companies. (Other data warehouse specialists also have activity in the CDR area.). Major applications are billing assurance (getting the inter-carrier charges right) and marketing analysis. Call center uses are still in the future.
- Vertica’s other big market to date is investment research/tick history. Surely not coincidentally, this is a big area of focus for Mike Stonebraker, evidently at both companies for which he’s CTO. (The other, of course, is StreamBase.)
-
Runners-up in market activity are clickstream analysis and general consumer analytics. These seem to be present in Vertica’s pipeline more than in the actual customer base.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business Objects, DATAllegro, Data warehouse appliances, Data warehousing, HP and Neoview, RDF and graphs, Relational database management systems, Vertica Systems | No Comments »
October 22nd, 2007 Curt Monash
Philip Howard went to at least one conference this month I didn’t, namely IBM’s, and wrote up some highlights. As usual, he seems to have been favorably impressed.
In one note, he says that IBM is claiming a 2-5X XML performance improvement. This is a good step, since one of my clients who evaluated such engines dismissed IBM early on for being an order of magnitude too slow. That client ultimately chose Marklogic, with Cache’ having been the only other choice to make the short list.
Speaking of IBM, I flew back from the Business Objects conference next to a guy who supports IMS. He told me that IBM has bragged of an actual new customer win for IMS within the past couple of years (a large bank in China). Read the rest of this entry »
Posted in Hierarchies, networks, graphs, and trees, IBM and DB2, Intersystems and Cache', Mark Logic, Native XML | No Comments »
September 27th, 2007 Curt Monash
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is:
- Extending Netezza’s SQL via user-defined functions (which probably wasn’t too hard, especially since the Netezza engine is related to PostgreSQL).
- Providing a C-to-Verilog compiler.
- Providing an application development environment and associated tools. (Presumably rather primitive, but I haven’t really checked it out.)
The applications mentioned in the NDN press release, and I quote directly, are:
- Multi-dimensional geospatial analytics on comprehensive data sets for risk management
- Predictive model scoring for customer segmentation, enabling real-time offer provisioning for customers
- Iterative modeling and analytics on billions of call detail records (CDRs) for telco price optimization
- Real-time Monte Carlo simulations on terabytes of detail-level data for risk management
- “Fingerprinting” with hashing algorithms for chain-of-custody document fingerprinting and to ensure that files transferred are intact
- Fuzzy text search analysis uses algorithms that provide a “best guess” of most likely results
Netezza says that the greatest interest has come from usual-suspect sophisticated users, specifically intelligence agencies and perhaps also financial services firms. But naturally, the partners actually trotted out at Netezza’s user conference were mainly hopeful small-company ISVs. The biggest stir was made by not-so-small SAS, which evidently believes this new capability will provide massive improvements to SAS/Netezza combined performance.
In principle, there are four different ways this new programmability could be a big win: Read the rest of this entry »
Posted in Data warehouse appliances, Data warehousing, Native XML, Netezza, PostgreSQL, Relational database management systems, SAS Institute, Specialized data management in general | 8 Comments »
September 24th, 2007 Curt Monash
Pervasive Software has a long history – 25 years, in fact, as they’re emphasizing in some current marketing. Ownership and company name have changed a few times, as the company went from being an independent startup to being owned by Novell to being independent again. The original product, and still the cash cow, was a linked-list DBMS called Btrieve, eventually renamed Pervasive PSQL as it gained more and more relational functionality.
Pervasive Summit PSQL v10 has just been rolled out, and I wrote a nice little white paper to commemorate the event, describing some of the main advances over v9, primarily for the benefit of current Pervasive PSQL developers. In one major advance, Pervasive made the SQL functionality much stronger. In particular, you now can have a regular SQL data dictionary, so that the database can be used for other purposes – BI, additional apps, whatever. Apparently, that wasn’t possible before, although it had been possible in yet earlier releases. Pervasive also added view-based security permissions, which is obviously a Very Good Thing.
There also are some big performance boosts. Read the rest of this entry »
Posted in Database compression, Hierarchies, networks, graphs, and trees, Memory-centric data management, Microsoft and SQL*Server, Mid-range DBMS, OLTP database management, Pervasive Software, Portability, transparency, and plug-compatibility, Relational database management systems | No Comments »
August 12th, 2007 Curt Monash
The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Hierarchies, networks, graphs, and trees, IBM and DB2, Memory-centric data management, Native XML, StreamBase | No Comments »
August 3rd, 2007 Curt Monash
Complex event/stream processing vendor Coral8 raised its hand and offered a briefing – non-technical, alas, but at least it was a start. Here are some of the highlights: Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Hierarchies, networks, graphs, and trees, Memory-centric data management, Native XML, StreamBase | No Comments »
July 13th, 2007 Curt Monash
I just finished a short Monash Letter on markets for nonstandard data management software. Of course, the whole thing is available only to Monash Advantage members, but here are some salient points:
- When new kinds of data are managed, new kinds of data management are used. More precisely, the old ways are tried first — but once they fail new technologies are tried out.
- Up through the “Bowling Alley,” markets for nonstandard data management technology commonly follow the classic Geoffrey Moore pattern. However, they rarely experience a “Tornado” or mass adoption.
- I think this is apt to change. My three strongest candidates are native XML, RDF, and memory-centric event/stream processing used for data reduction (as opposed to sub-millisecond latency, which I do think will continue to be a niche requirement).
Posted in Complex event/stream processing (CEP), Hierarchies, networks, graphs, and trees, Memory-centric data management, Native XML, RDF and graphs | No Comments »