April 29th, 2008 Curt Monash
Mark Logic* has an interesting, complex story. They sell a technology stack based on an XML DBMS with text search designed in from the get go. They usually want to be known as a “content” technology provider rather than a DBMS vendor, but not quite always.
*Note: Product name = MarkLogic, company name = Mark Logic.
I’ve agreed to do a white paper and webcast for Mark Logic (sponsored, of course). But before I start serious work on those, I want to blog based on what I know. As always, feedback is warmly encouraged.
Some of the big differences between MarkLogic and other DBMS are:
-
MarkLogic’s primary DML/DDL (Data Manipulation/Description Language) is XQuery. Indeed, Mark Logic is in many ways the chief standard-bearer for pure XQuery, as opposed to SQL/XQuery hybrids.
-
MarkLogic’s XML processing is much faster than many alternatives. A client told me last year that – in an application that had nothing to do with MarkLogic’s traditional strength of text search – MarkLogic’s performance beat IBM DB2/Viper’s by “an order of magnitude.” And I think they were using the phrase correctly (i.e., 10X or so).
-
MarkLogic indexes all kinds of entities and facts, automagically, without any schema-prebuilding. (Nor, I gather, do they depend on individual documents carrying proper DTDs.) So there actually isn’t a lot of DDL. (Mark Logic claims in one test MarkLogic had more or less 0 DDL, vs. 20,000 lines in DB2/Viper.) What MarkLogic indexes includes, as Mark Logic puts it:
-
As opposed to most extended-relational DBMS, MarkLogic indexes all kinds of information in a single, tightly integrated index. Mark Logic claims this is part of the reason for MarkLogic’s good performance, and asserts that competitors’ lack of full integration often causes overhead and/or gets in the way of optimal query plans. (For example, Mark Logic claims that Microsoft SQL Server’s optimizer is so FUBARed that it always does the text part of a search first.) Interestingly, Intersystems’ object-oriented Cache’ does pretty much the same thing.
-
MarkLogic is proud of its text search extensions to XQuery. I’ve neglected to ask how that relates to the XQuery standards process. (For example, text search wasn’t integrated into the SQL standard until SQL3.)
Other architectural highlights include:
Read the rest of this entry »
Posted in Data types, IBM and DB2, Mark Logic, Native XML | 1 Comment »
March 28th, 2008 Curt Monash
Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.
I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …
Please subscribe to our feed!
Posted in Database theory and practice, MySQL, Native XML | 2 Comments »
March 6th, 2008 Curt Monash
As usual, Microsoft forgot to brief me, but Mary Jo Foley reports on Microsoft SQL Server Data Services. A look at the official site clarifies that this database-in-a-cloud offering uses “Microsoft SQL Server as a data storage node.” However, there seems to be a software layer on top of SQL Server providing scale-out and appropriate management.
In addition to the more-than-SQL-Server layer, there seems to be a less-than-SQL-Server aspect as well. In a particular, Microsoft SQL Server Data Services boasts “Support for simple types: string, numeric, datetime, boolean.” XML is the “primary wire format,” and hints dropped about the schema philosophy sound XMLish too.
Interestingly, Foley reports that Microsoft plans to offer an on-premises version of Microsoft SQL Server Data Services as well.
Please subscribe to our feed!
Posted in Cloud computing, Microsoft and SQL*Server, Native XML | No Comments »
February 1st, 2008 Curt Monash
I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.
Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read the rest of this entry »
Posted in CouchDB, Database diversity, Database theory and practice, Native XML | 3 Comments »
January 28th, 2008 Curt Monash
Question of the day #2
Who is actually using native XML?
Mark Logic is having a fine time using its native XML engine for custom publishing. One outfit I know of is using a native XML for something like web analytics, but is driving me crazy by never coming through on permission to divulge details. There’s a bit of native XML use out there supporting the insurance industry’s ACORD standard.
And after that I quickly run out of examples of native XML use. Read the rest of this entry »
Posted in Data types, IBM and DB2, Mark Logic, Microsoft and SQL*Server, Native XML, Oracle | 1 Comment »
January 27th, 2008 Curt Monash
Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.
“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.
But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:
- Text. Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at all.
- Geospatial. Information about locations on the earth’s surface is essentially two-dimensional. Some geospatial apps use three dimensions.
- Object. There are two main reasons for using object datatypes. First, the data can have complex internal structures. Second, it can comprise a variety of simpler types. Object structures are well-suited for engineering and medical applications.
- XML. A great deal of XML is, at its heart, either relational/tabular data or text documents. Still, there are a variety of applications for which the most natural datatype truly is XML.
Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).
Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment.
Read the rest of this entry »
Posted in Data types, GIS and geospatial, Native XML, Objects, Text | 9 Comments »
December 8th, 2007 Curt Monash
Since I was researching Software AG anyway, I took the opportunity to ask about Software AG’s native XML DBMS Tamino, which certainly has some fans. Jim Fowler, Software AG’s Director of Market Development, Enterprise Transaction Systems, was kind enough to write up the following for me:
As you know, when Tamino was released in the late 1990s it was one of the first – if not the first – commercially available native XML database. We now have several hundred Tamino customers worldwide, and Software AG is fully committed to supporting our customers.
At the same time, we recognize that XML has matured and evolved in many different directions during the past decade;
Read the rest of this entry »
Posted in Data types, Native XML, Software AG and ADABAS | No Comments »
October 22nd, 2007 Curt Monash
Philip Howard went to at least one conference this month I didn’t, namely IBM’s, and wrote up some highlights. As usual, he seems to have been favorably impressed.
In one note, he says that IBM is claiming a 2-5X XML performance improvement. This is a good step, since one of my clients who evaluated such engines dismissed IBM early on for being an order of magnitude too slow. That client ultimately chose Marklogic, with Cache’ having been the only other choice to make the short list.
Speaking of IBM, I flew back from the Business Objects conference next to a guy who supports IMS. He told me that IBM has bragged of an actual new customer win for IMS within the past couple of years (a large bank in China). Read the rest of this entry »
Posted in Hierarchies, networks, graphs, and trees, IBM and DB2, Intersystems and Cache', Mark Logic, Native XML | No Comments »
September 27th, 2007 Curt Monash
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is:
- Extending Netezza’s SQL via user-defined functions (which probably wasn’t too hard, especially since the Netezza engine is related to PostgreSQL).
- Providing a C-to-Verilog compiler.
- Providing an application development environment and associated tools. (Presumably rather primitive, but I haven’t really checked it out.)
The applications mentioned in the NDN press release, and I quote directly, are:
- Multi-dimensional geospatial analytics on comprehensive data sets for risk management
- Predictive model scoring for customer segmentation, enabling real-time offer provisioning for customers
- Iterative modeling and analytics on billions of call detail records (CDRs) for telco price optimization
- Real-time Monte Carlo simulations on terabytes of detail-level data for risk management
- “Fingerprinting” with hashing algorithms for chain-of-custody document fingerprinting and to ensure that files transferred are intact
- Fuzzy text search analysis uses algorithms that provide a “best guess” of most likely results
Netezza says that the greatest interest has come from usual-suspect sophisticated users, specifically intelligence agencies and perhaps also financial services firms. But naturally, the partners actually trotted out at Netezza’s user conference were mainly hopeful small-company ISVs. The biggest stir was made by not-so-small SAS, which evidently believes this new capability will provide massive improvements to SAS/Netezza combined performance.
In principle, there are four different ways this new programmability could be a big win: Read the rest of this entry »
Posted in Data warehouse appliances, Data warehousing, Native XML, Netezza, PostgreSQL, Relational database management systems, SAS Institute, Specialized data management in general | 8 Comments »
August 12th, 2007 Curt Monash
The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Hierarchies, networks, graphs, and trees, IBM and DB2, Memory-centric data management, Native XML, StreamBase | No Comments »
August 3rd, 2007 Curt Monash
Complex event/stream processing vendor Coral8 raised its hand and offered a briefing – non-technical, alas, but at least it was a start. Here are some of the highlights: Read the rest of this entry »
Posted in Complex event/stream processing (CEP), Coral8, Hierarchies, networks, graphs, and trees, Memory-centric data management, Native XML, StreamBase | No Comments »
July 13th, 2007 Curt Monash
I just finished a short Monash Letter on markets for nonstandard data management software. Of course, the whole thing is available only to Monash Advantage members, but here are some salient points:
- When new kinds of data are managed, new kinds of data management are used. More precisely, the old ways are tried first — but once they fail new technologies are tried out.
- Up through the “Bowling Alley,” markets for nonstandard data management technology commonly follow the classic Geoffrey Moore pattern. However, they rarely experience a “Tornado” or mass adoption.
- I think this is apt to change. My three strongest candidates are native XML, RDF, and memory-centric event/stream processing used for data reduction (as opposed to sub-millisecond latency, which I do think will continue to be a niche requirement).
Posted in Complex event/stream processing (CEP), Hierarchies, networks, graphs, and trees, Memory-centric data management, Native XML, RDF and graphs | No Comments »
June 14th, 2007 Curt Monash
I’ve been implying that the short list for native XML database engine vendors should be Mark Logic, IBM, and maybe Microsoft, on the theory that Progress and Intersystems tried the market and pulled back. Well, add Intersystems to the list, and not necessarily in last place. They’ve long had a very fast nonrelational engine in Cache’. Perhaps building Ensemble on it has induced them to sharpen up the XML capabilities again.
Anyhow, while I’m not at liberty to explain more of my reasoning (i.e., to disclose my evidence) — Cache’ should be taken seriously as an XML DBMS alternative … even if I never can seem to get a proper DBMS briefing from them (which is far from entirely being their fault).
Want to continue getting great research about DBMS, analytics, data integration, and other technologies related to data management? Then get a FREE subscription, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Technorati Tags: XML database, Intersystems, Cache’
Posted in Hierarchies, networks, graphs, and trees, IBM and DB2, Intersystems and Cache', Mark Logic, Microsoft and SQL*Server, Native XML, Progress, Apama, and DataDirect | 1 Comment »
August 26th, 2006 Curt Monash
I’ve been interested in the Mark Logic story from the first time CEO Dave Kellogg told me about it. Basically, Mark Logic sells an XML-based DBMS optimized for text search, called MarkLogic Server. For obvious reasons, they don’t want to position it as a DBMS; hence they call it an “XML content server” instead. I posted about their marketing and application focus over on Text Technologies. In this post, I’ll dive a little deeper into the core technology.
Read the rest of this entry »
Posted in Hierarchies, networks, graphs, and trees, Mark Logic, Native XML | 1 Comment »
May 15th, 2006 Curt Monash
Philip Howard likes DB2’s Viper release. Truth be told, Philip Howard seems to like most products, whether they deserve it or not. But in this case, I think his analysis is spot-on.
Posted in Hierarchies, networks, graphs, and trees, IBM and DB2, Native XML, OLTP database management, Relational database management systems | No Comments »
May 2nd, 2006 Curt Monash
I had a chat a couple of weeks ago with Bob Picciano, who runs servers (i.e., DBMS) for IBM. I came away feeling that, while they don’t use that name, they’re well down the DBMS2 path. By no means is this SAP’s level of commitment; after all, they have to cater to traditional technology strategies as well. But they definitely seem to be getting there.
Why do I say that? Well, in no particular order:
- They have a huge commitment to a data integration business, with an increasing XML focus.
- Their favorite buzzword these days is “information-intensive,” which seems to amount to semi-composite apps that may talk in part to unstructured/semi-structured data.
- They’re serious about their XML data server.
- Unprompted – well, OK, he’s clearly read my stuff, but other than that it was unprompted – Bob referred to one of the key benefits (real and perceived) of XML storage as being “schema flexibility.”
- By accident or design, IBM has a multi-server, horses-for-courses DBMS strategy: DB2 in two important flavors, XML server, Multivalue/Pick (that’s growing, by the way), and so on.
The big piece of a DBMS2 strategy that IBM seems to be lacking is a data-oriented services repository. IBM has had disasters in the past with over-grand repository plans, so they’re treading cautiously this time around. There also might be an organizational issue; DBMS and integration technology sit in separate divisions, and I doubt it’s yet appreciated throughout IBM how central data is to an SOA strategy.
But that not-so-minor detail aside, IBM definitely seems to be developing a DBMS2-like technology vision.
Posted in EII, ETL, and/or EAI, Hierarchies, networks, graphs, and trees, IBM and DB2, Native XML, OLTP database management, Relational database management systems | No Comments »
April 10th, 2006 Curt Monash
IBM’s recent press release on Viper says:
Viper is expected to be the only database product able to seamlessly manage both conventional relational data and pure XML data without requiring the XML data to be reformatted or placed into a large object within the database.
That, so far as I know, is true, at least among major products.
I’m willing to apply the “native” label to Microsoft’s implementation anyway, because conceptually there’s little or no necessary performance difference between their approach and IBM’s. (Dang. I thought I posted more details on that months ago. I need to remedy the lack soon.)
As for Oracle — well, right now Oracle has a bit of a competitive problem …
Posted in Hierarchies, networks, graphs, and trees, IBM and DB2, Microsoft and SQL*Server, Native XML, Oracle | 1 Comment »
April 6th, 2006 Curt Monash
From Barbara Darrow’s “Unblog”:
“How we store XML on the database is, excuse me, none of your business. The point is you can write an app using XML standards,” said Mark Drake, manager of product management for XML technology for the Redwood Shores, Calif. vendor.
“Whether we shred it, parse it, it doesn’t matter. There is no such thing as a native XML storage model, there is no W3c standard or 11th stone tablet, telling us how,” he noted.
So implementation doesn’t matter? I.e., performance doesn’t matter?
That’s not generally Oracle’s viewpoint in areas where it has a performance or implementation advantage, or even parity …
Posted in Hierarchies, networks, graphs, and trees, Native XML, Oracle | 3 Comments »
March 14th, 2006 Curt Monash
Software AG consultant Jose Huerga reminded me that Software AG has been selling XML database managers for a long time, and that they are now up to Release 4.4 of Tamino.
Personally, I’m out of touch with Software AG (e.g., I last visited Darmstadt in 1984). Would anybody care to share knowledge of or experiences with this product?
Posted in Hierarchies, networks, graphs, and trees, Native XML | 4 Comments »
January 31st, 2006 Curt Monash
IBM announced the freeware version of DB2 today. I’ll post links to the details later, but I want to highlight a couple of interesting implications:
1. They define the cutoff between the free and paid version not by how big a database you can manage on disk, but rather by how much RAM the software can address. This supports my thesis that effective use of RAM is crucial to DBMS performance, and is corollary — specially optimized memory-centric data management products deserve a place in most large enterprises’ product portfolios.
2. Having a free version of DB2 lets one play with whatever features DB2 may have that simply aren’t available in other DBMS, to see if they’re worth using. And the most significant such feature, in my opinion, is native XML storage. Whatever else this product does or doesn’t accomplish, it may serve to speed adoption of IBM’s native XML server technology.
Posted in Hierarchies, networks, graphs, and trees, IBM and DB2, Memory-centric data management, Mid-range DBMS, Native XML, OLTP database management, Relational database management systems | No Comments »
January 26th, 2006 Curt Monash
In my recent column on XML storage (referenced a couple of posts back), I referenced a Microsoft-provided example of an inventory database. A retailer (I think an online one) wanted to manage books and DVDs and so on, and search across attributes that we common to the different entity kinds, such as title.
Obviously, there are relational alternatives. Items have unique SKU numbers, and they have one of a limited number of kinds, and a set of integrity constraints could mandate that an item was listed in the appropriate table for its kind and no other, and then common attributes could be search on via views that amounted to unions (or derived tables kept synchronized via their own integrity constraints).
I pushed back at Microsoft — which is, you may recall, not just an XML advocate but also one of the largest RDBMS vendors — with this kind of reasoning, and they responded with the following, which I just decided to (with permission) post verbatim.
“If all you ever do is manage books and DVDs, then managing them relationally works well, especially if their properties do not change. However, you many want to add CDs and MP3 on memory cards and many other items that all have different properties. Then you quickly run into an administration overhead and may not be able to keep up with your schema evolution (and you need an additional DBA for managing the complex relational schema). Even if you use a relational approach that stores common properties in joint tables, the recomposition costs of the information for one item may become too expensive to bear.”
Posted in Hierarchies, networks, graphs, and trees, Microsoft and SQL*Server, Native XML | 3 Comments »
January 16th, 2006 Curt Monash
After several months of headfakes, I finally did a column on XML storage this month. There turned out to be room for application discussion, but not for much technical nitty-gritty.
The app discussion is pretty consistent with what I’d already posted here, although I wish I’d gone into more detail on the inventory database example. (Stay tuned for followup here!)
I also intend to post soon with some technical detail about how XML storage is actually handled.
I also got some good insight from Marklogic about what customers wanted in their text-centric markets. More on that soon too.
And by the way — I didn’t pick the Oracle-bashing title. I also didn’t pick the Oracle-bashing title for my Network World “Hot Seat” video. But somehow, the Oracle-doubting parts of my views are of special interest to my friends in the media. And it’s not as if the titles say anything I actually disagree with …
Posted in Hierarchies, networks, graphs, and trees, Native XML, OLTP database management, Oracle | 2 Comments »
January 9th, 2006 Curt Monash
It’s not that easy to find detailed, vendor-neutral explanations of XML storage in RDBMS. One reason may be that there isn’t much vendor-neutral reality to talk about yet; each implementation is different.
Anyhow, while it’s not overwhelming, I found one book chapter online that’s fairly useful for reviewing one or the other somewhat murky area of the technology. Here’s a link to the section on shredding.
The book in question is a collection of chapters by various XQuery experts, a couple of whom have made strong, direct contributions to my research for this blog. I’m not sure I see the point in buying ANY book about a technology area so ill-defined and fast-changing, especially one over a year old. But if I did want a book, it would be very high on my list of ones to consider.
Posted in Native XML | No Comments »
December 14th, 2005 Curt Monash
From a DevX article on Microsoft’s SQL Server 2005
Depending on your situation, XML can also be the best choice for storing even highly structured data. Here are a few practical reasons to consider storing data in a field of type XML:
* Repeated shredding or publishing—On-demand transformations carry a performance penalty. If you have to shred or publish the same document over and over again, consider storing it natively as XML. You can always expose it to relational consumers with an XML view.
* Rapidly changing data structures—When modeled correctly, XML lives up to its name: It’s extensible. Developers can add new pieces of data—even new hierarchies—to a schema without compromising existing software. Extensibility is an extra advantage when prototyping, or when working with rapidly changing problem domains such as bioinformatics.
* Atomic data—Sometimes, you’ll have XML data that’s never consumed except as a whole. Think of this as logical atomicity—if you never access the parts individually, you might as well store it in one big chunk.
* Debugging—Especially for new releases, it can be a good idea to tuck away a copy of your XML imports. The data may be redundant, but keeping the original makes tracking down problems a whole lot easier.
Nothing there to disagree with too heavily, although I can think of some other reasons that might rank higher yet.
Posted in Hierarchies, networks, graphs, and trees, Microsoft and SQL*Server, Native XML | 4 Comments »
November 17th, 2005 Curt Monash
The introduction and technical-implementation part of this discussion was in Part 1.
It seems likely that widespread adoption of native XML storage is, at best, several years off, if for no other reason than that the DML (Data Manipulation Language) situation is still rather primitive. But looking beyond that nontrivial problem, it does seem as if there are broad classes of application that might go better in native XML. Here’s a survey.
First of all, there’s what might be called custom document composition – technical publishing, customized technical manuals, etc. If you make complex products, or sell information, this is obviously an important specialty application for you. Otherwise, it probably is rather peripheral, at least for now. If you do have an interest in this area, by the way, you shouldn’t only look at the big guys’ XML offerings; you should even talk to specialists like Mark Logic. (Mark Logic sells an XML-only DBMS with a strong text-search orientation.)
Second, there are complex documents with low update rates. Medical records are a prime example – and, by the way, may of those are stored in InterSystems’ OODBMS Cache rather than in a relational system. Other examples might include insurance claims, media assets, etc. – basically, the areas that have been thought of as the purview of document management systems. In many cases, these apps ain’t broke and shouldn’t be fixed, such as when they exist mainly to satisfy slow-changing regulatory requirements. Besides, it’s not obvious that native XML is particularly useful for these apps anyway. Often, the information is in a DBMS for three main reasons: General manageability (e.g., backup), ad-hoc searchability, and management of metadata. If the metadata is simple enough to fit comfortably into a tabular structure, extended-relational DBMS may be satisfactory as underpinnings for these apps indefinitely.
Third, and here’s where it really begins to get interesting, is complex transactional documents. One of the flagship apps in Viper’s alpha test was financial derivatives trading, with complex, number-laden, term-laden contracts being processed very quickly, and it’s easy to envision that kind of functionality spreading across the trading sector. Governments – wisely or not – may want to require new complex forms to be filled out, or to make older ones easier to process. (E.g., tax returns, or applications for various kinds of permits.) If privacy concerns allow, medical information might be collected and processed centrally by governments or large insurance providers. Complex service-level agreements could be negotiated for a broad variety of product and service categories. Customers might demand radically faster processing of insurance claims than has historically been necessary. Indeed, it’s hard to think of an industry sector where complex transactional documents might not gain a foothold. And if you’re looking for high performance access to portions of documents, native XML may well be the best storage choice.
Finally, there’s a fourth category, which I’ll give the trendy-looking name Profiles 2.0, in imitation of Web 2.0, Identity 2.0, and so on. Here’s what I mean by it. A number of the hottest buzzconcepts in computing focus on collecting, organizing, and using information about individual people – presence, identity, personalization/customization, narrowcasting/market-of-one, data mining/predictive analytics, weblog analysis, social software, and so on. Put all those together, and you have a humongous hairball of a user profile that no current systems come close to handling properly.
Let’s think about some characteristics of this data. Some of it is transient. Some of it is unreliable. Some of it indeed is guesswork – albeit educated guesswork – rather than fact (e.g., the results of data mining analyses). Much of it exists for some profilees but not others. Much of it is naturally tree- or graph-shaped (e.g., information about website traversal, product category interests, relationship networks, role-based authorizations, etc.) There are many kinds of it; pulling it all together relationally can lead to Joins From Hell.
And this isn’t just for individuals; similar kinds of stories can be told for information about organizations, battleships, and so on. Those are objects with rich internal structures. True, those can usually be modeled hierarchically – but at each node, some of the complications mentioned in the prior paragraph occur. Profiling an enterprise is even messier than profiling a single individual who shops or works there.
Applications using this kind of information are typically extremely primitive, even though the beginnings of the personalization hype are now 7-8 years in the past. I don’t think we’re going to get these systems kind right until we take a true, holistic view of individuals and their profiles – and until we learn how to think about apps whose fundamental objects keep changing in shape. But as hard as the problem is, it has to be worked on immediately, because what I’m talking about here are some of the major classes of competitive-advantage app.
So Profiles 2.0 isn’t something we can just ignore. And when we do pay attention to it, I don’t think we’ll find that it looks very natural dressed in rows and columns.
Posted in Hierarchies, networks, graphs, and trees, Intersystems and Cache', Mark Logic, Native XML | 2 Comments »