Object
Analysis of data management technology optimized for object data. Related subjects include:
- Native XML database management
- Intersystems, vendor of the Cache’ object-oriented DBMS
Detailed analysis of Perst and other in-memory object-oriented DBMS
Dan Weinreb — inspired by but not linking to my recent short post on McObject’s object-oriented in-memory DBMS Perst — has posted a detailed discussion of Perst on his own blog. For context, he compares it briefly to analogous products, most especially Progress’s — which used to be ObjectStore, of which Dan was the chief architect.
This was based on documentation and general sleuthing (Dan figured out who McObject got Perst from), rather than hands-on experience, so performance figures and the like aren’t validated. Still, if you’re interested in such technology, it’s a fascinating post.
Categories: In-memory DBMS, McObject, Memory-centric data management, Object | Leave a Comment |
Open source in-memory DBMS
I’ve gotten email about two different open source in-memory DBMS products/projects. I don’t know much about either, but in case you care, here are some pointers to more info.
First, the McObject guys — who also sell a relational in-memory product — have an object-oriented, apparently Java-centric product called Perst. They’ve sent over various press releases about same, the details of which didn’t make much of an impression on me. (Upon review, I see that one of the main improvements they cite in Perst 3.0 is that they added 38 pages of documentation.)
Second, I just got email about something called CSQL Cache. You can read more about CSQL Cache here, if you’re willing to navigate some fractured English. CSQL’s SourceForge page is here. My impression is that CSQL Cache is an in-memory DBMS focused on, you guessed it, caching. It definitely seems to talk SQL, but possibly its native data model is of some other kind (there are references both to “file-based” and “network”.)
Categories: Cache, DBMS product categories, In-memory DBMS, McObject, Memory-centric data management, Object, OLTP, Open source | 5 Comments |
Dan Weinreb on ObjectStore
Dan Weinreb was one of the key techies at Object Design, the company that made the object-oriented database management system ObjectStore. (Object Design later merger into Excelon, which was eventually sold to Progress, which has deemphasized but still supports ObjectStore.) Recently he wrote a pair of long and fascinating articles* about Object Design, ObjectStore, and OODBMS, the first of which makes the case that “object-oriented database management systems succeeded.”
Read more
The 4 main approaches to datatype extensibility
Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.
“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.
But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:
- Text. Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at all.
- Geospatial. Information about locations on the earth’s surface is essentially two-dimensional. Some geospatial apps use three dimensions.
- Object. There are two main reasons for using object datatypes. First, the data can have complex internal structures. Second, it can comprise a variety of simpler types. Object structures are well-suited for engineering and medical applications.
- XML. A great deal of XML is, at its heart, either relational/tabular data or text documents. Still, there are a variety of applications for which the most natural datatype truly is XML.
Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).
Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment. Read more
Categories: Data types, GIS and geospatial, Object, Structured documents, Text | 10 Comments |
Intersystems’ stealth marketing has gotten pretty extreme
Every few months I try to make contact with Intersystems. Sometimes they graciously respond, promising to schedule a briefing, which then never happens. Other times they don’t even bother. Now, on one level I can’t blame them, based on what happened at my last briefing. Read more
Categories: Intersystems and Cache', Object | 5 Comments |
Progress Software progress report
For the past 20+ years – all the way back to when it was still privately held — I’ve periodically gotten up to speed on Progress Software. I’m trying again now, and to that end dropped by yesterday for a chat with Jeff Stamen. I’ll give a brief overview now – which is probably all I’m qualified to do right now anyway – and then loop back with more detailed info after I get it.
After a reorganization at the beginning of this (November) fiscal year, the vast majority of Progress’ products fall into one of five buckets, which I shall glibly refer to in decreasing order of size as “Progress Classic,” “SOA,” “drivers,” “memory-centric,” and “EasyAsk.” Here’s a quick overview of each. Read more
Categories: Companies and products, Mid-range, Object, OLTP, Progress, Apama, and DataDirect | 3 Comments |
Hot times at Intersystems
About a year ago, I wrote a very favorable column focusing on Intersystems’ OODBMS Cache’. Cache’ appears to be the one OODBMS product that has good performance even in a standard disk-centric configuration, notwithstanding that random pointer access seems to be antithetical to good disk performance.
Intersystems also has a hot new Cache’-based integration product, Ensemble. They attempted to brief me on it (somewhat belatedly, truth be told) last Wednesday. Through no fault of the product, however, the briefing didn’t go so well. I still look forward to learning more about Ensemble.
Categories: EAI, EII, ETL, ELT, ETLT, Humor, Intersystems and Cache', Object, OLTP | Leave a Comment |
White paper on memory-centric data management — excerpt
Here’s an excerpt from the introduction to my new white paper on memory-centric data management. I don’t know why WordPress insists on showing the table gridlines, but I won’t try to fix that now. Anyhow, if you’re interested enough to read most of this excerpt, I strongly suggest downloading the full paper.
|
Introduction
|
Conventional DBMS don’t always perform adequately. |
Ideally, IT managers would never need to think about the details of data management technology. Market-leading, general-purpose DBMS (DataBase Management Systems) would do a great job of meeting all information management needs. But we don’t live in an ideal world. Even after decades of great technical advances, conventional DBMS still can’t give your users all the information they need, when and where they need it, at acceptable cost. As a result, specialty data management products continue to be needed, filling the gaps where more general DBMS don’t do an adequate job.
|
Memory-centric technology is a powerful alternative. |
One category on the upswing is memory-centric data management technology. While conventional DBMS are designed to get data on and off disk quickly, memory-centric products (which may or may not be full DBMS) assume all the data is in RAM in the first place. The implications of this design choice can be profound. RAM access speeds are up to 1,000,000 times faster than random reads on disk. Consequently, whole new classes of data access methods can be used when the disk speed bottleneck is ignored. Sequential access is much faster in RAM, too, allowing yet another group of efficient data access approaches to be implemented.
|
It does things disk-based systems can’t. |
If you want to query a used-book database a million times a minute, that’s hard to do in a standard relational DBMS. But Progress’ ObjectStore gets it done for Amazon. If you want to recalculate a set of OLAP (OnLine Analytic Processing) cubes in real-time, don’t look to a disk-based system of any kind. But Applix’s TM1 can do just that. And if you want to stick DBMS instances on 99 nodes of a telecom network, all persisting data to a 100th node, a disk-centric system isn’t your best choice – but Solid’s BoostEngine should get the job done.
|
Memory-centric data managers fill the gap, in various guises. |
Those products are some leading examples of a diverse group of specialist memory-centric data management products. Such products can be optimized for OLAP or OLTP (OnLine Transaction Processing) or event-stream processing. They may be positioned as DBMS, quasi-DBMS, BI (Business Intelligence) features, or some utterly new kind of middleware. They may come from top-tier software vendors or from the rawest of startups. But they all share a common design philosophy: Optimize the use of ever-faster semiconductors, rather than focusing on (relatively) slow-spinning disks.
|
They have a rich variety of benefits. |
For any technology that radically improves price/performance (or any other measure of IT efficiency), the benefits can be found in three main categories:
For memory-centric data management, the “things that you couldn’t do before at all” are concentrated in areas that are highly real-time or that use non-relational data structures. Conversely, for many relational and/or OLTP apps, memory-centric technology is essentially a much cheaper/better/faster way of doing what you were already struggling through all along.
|
Memory-centric technology has many applications. |
Through both OEM and direct purchases, many enterprises have already adopted memory-centric technology. For example: |
|
|
Categories: Data types, Memory-centric data management, MOLAP, Object, OLTP, Open source, Progress, Apama, and DataDirect | 3 Comments |
Another OLTP success for memory-centric OO
Computerworld published a Progress ObjectStore OLTP success story.
Hotel reservations system, this time. Not as impressive as the Amazon store — what is? — but still nice.
Categories: Cache, Memory-centric data management, Object, OLTP, Progress, Apama, and DataDirect, Theory and architecture | 5 Comments |
Defining and surveying “Memory-centric data management”
I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.
I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:
- TimesTen (now owned by Oracle): TimesTen is the quintessentional “in-memory DBMS.” It’s a fairly full relational DBMS, but if you want to persist memory to disk it has to be handed off to a conventional DBMS. Historically, that has usually been MySQL or Oracle. TimesTen’s biggest market penetration has been in financial trading.
- Solid Information Technology‘s BoostEngine: Solid is a Finnish company (or was — it’s pretty American now) specializing in embedded DBMS sold mainly for telecommunication uses. Big OEM customers include several well-known telecom equipment manufacturers and HP (for OpenView). “Embedded” often means no DBA, no monitor, no keyboard — they box manufacturer installs it and there it stays for the life of the product. Solid has to offer strong replication capabilities, since its products are often used in highly distributed (e.g., multiblade, multibox) environments. So it’s taken the next step and exploited the replication by allowing customers to use some instances of the product disklessly.
- Event-stream products from Streambase and Progress: The canonical application for event-stream products is automating financial trading decisions based on the flow of market information. Mike Stonebraker, the brains behind Streambase, has recently popularized the idea; Progress bought Apama, who actually have been in the business longer. These applications require even more speed than the financial trading apps that TimesTen handles, and they discard most of the information they look at. In-memory is the only way to go.
- Progress’s ObjectStore: ObjectStore comes from the company Object Design, which merged into Excelon, which was acquired by Progress. It’s really a toolkit for building DBMS and similar systems, which is why it’s at various times been marketed as an OODBMS and an XML DBMS, without a lot of success either way. But there have been a few sterling apps built in ObjectStore even so, including a key part of the Amazon bookstore Despite this limited market success, a significant fraction of Progress’s best engineering talent has moved over to the Real-Time Division to focus on ObjectStore and other memory-centric products. The memory-centric aspect of ObjectStore is this: ObjectStore’s big virtue is that it gets objects from disk to memory and vice-versa very efficiently, then distributes and caches them around a network as needed. This was originally invented for client/server processing, but works fine in a multi-server thin client setup as well. And object processing, of course, relies on a whole lot of pointers. And pointer-chasing is pretty much the worst way to deal with the disk speed barrier, unless you do it in main memory.
- Applix‘s TM1: Like many companies in the analytics area, Applix has had trouble deciding whether it sells applications, BI system software, or both. But in any case its core technology is TM1, a memory-centric MOLAP offering. Traditional MOLAP products reside on the horns of a nasty dilemma: They rely on precalculation to give good performance, but that causes ghastly database explosion. Applix gets out of this problem by doing no precalculation whatsoever, loading the data into main memory, and executing all queries on the fly.
- SAP’s BI Accelerator: SAP is building out an elaborate technology stack with NetWeaver, especially in the BI area. One important aspect is that the full data warehouse is logically broken (or copied) into a series of data marts called “InfoCubes.” BI Accelerator takes the logical next step, loading an entire InfoCube into main memory. Almost every query is executed via a full table scan, which would be insane on disk but makes perfect sense when the data is already in RAM.
So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.