September 7, 2012

Integrated internet system design

What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.

Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.

The top integration and integration-like challenges for me, from a practical standpoint, are:

Other concerns that get mentioned include:

Let’s skip those latter issues for now, focusing instead on the first four.

Integrating silos

While the software industry has been working on application integration for decades, there’s clearly a long way yet to go. Let me illustrate by way of personal story.

I needed a new laptop computer on short notice, and decided to go with an HP Folio.* Driving to a local Wal-Mart seemed more practical than ordering online, as a couple of stores near my house were listed by as being in stock. I called just to check; both were out of stock. The Wal-Mart folks on the phone told me such errors are routine.

*It was pretty much the cheapest all-solid-state credible alternative I could find, is said to have a good keyboard, and has an Ethernet port for all those client visits when guest Wi-Fi doesn’t work.

You may recall my outraged tweets about a similar silos-of-non-integration story in Dell customer support, a couple of years back. Yet Dell is one of the larger computer companies in the world, while Wal-Mart is one of the most accomplished computer users. If Wal-Mart and Dell can’t get basic system functionality right, just imagine how screwed up everybody else is.

Dynamic schemas with joins

There are multiple reasons to use dynamic schemas over fixed ones. This is especially true when recording web interaction data, because every page can have very different information to log. But there are also multiple reasons to want to use joins, especially when your application combines two or more of:

That doesn’t mean that a fully general join syntax is needed in every DBMS. But it does mean that the workarounds to joinlessness I wrote about a couple of years ago often don’t suffice.

Fortunately, much better stuff is being developed. The best that I know of still awaits launch — but I’ve begun to connect users with vendors who can address that problem head-on.

Low-latency business intelligence

If you have data pounding into a short-request system, there are several levels of BI you could try to do on it in human real time.

*Counters are the canonical example.

Single-server RDBMS have, for years, combined OLTP (OnLine Transaction Processing) and a reasonable amount of reporting or BI. As needs get more intense, Oracle and SAP are throwing hardware at the problem, via Exadata, Exalytics, HANA, and so on. But suppose you prefer a short-request system that scales out, runs on cheap commodity hardware, and fits well into the cloud. What do you do then?

One approach, which in some form I’ve recommend to multiple clients, is to stream the data to some kind of analytic data store, and serve your analytics from there. That technology is getting better all the time, even though many vendors haven’t yet recognized the magnitude of the need and opportunity.

More responsive personalization

Another kind of human-real-time analytics is even more important — automated response, such as ad personalization. Ideally, you want your response to be well-informed by everything the user has been doing over the past few minutes and even seconds. But two difficulties loom.

First, if we combine this point and the previous two, we might ideally want to stream data from a NoSQL store to an analytic one and back to a short-request SQL DBMS. That would be — complicated. Fortunately, there are a variety of not-crazy approaches, with varying degrees of cost, pain, or risk, with more coming soon as different kinds of data stores somewhat re-converge.

The second problem is more conceptual. What are the models and algorithms that tell us how to personalize based on up-to-the-second information? Since only the most simple-minded approaches seem practical to implement, only the most simple-minded answers have ever been worked out. A lot of data science lies ahead — and for once I don’t think that term is overwrought.

And with that I’m shutting down for 2 1/2 weeks for vacation. Depending on how things go with my new HP Folio :) , as well as Wi-Fi in Istanbul, I hope to be fairly responsive to blog comments and email, and indeed will work on setting up a long October California trip. But I also hope that, for once, there isn’t any vacation-busting news; I’ve had some bad luck in that regard before, professionally and personally alike.


4 Responses to “Integrated internet system design”

  1. Rob Klopp on September 7th, 2012 3:07 pm

    Isn’t everybody throwing hardware at the problem?

    Applying NoSQL systems that solve a subset of the problem very efficiently and then replicating data to an analytics data store to solve for the places NoSQL is inefficient is throwing hardware at the problem.

    Then replicating the data to a data warehouse to tie it together over time… and so that you avoid having your HP Folio appear in stock when it is actually out… throws more hardware still.

  2. Curt Monash on September 7th, 2012 3:34 pm

    Everybody throws hardware at a problem. Not everybody throws whole beachfuls of silicon.

  3. Zip on September 12th, 2012 10:38 am

    Any thoughts on FoundationDB?

  4. Brian on September 12th, 2012 5:37 pm

    Great stuff and content..I run a dba website You seem to be on the cutting edge of database systems.. The integration part is what I am seeing as an overall problem. It’s been my experience that these store IP systems are never up to date.. it just takes one bonehead employee not to do a stock update for the system to be out of stock! Oh 1st world problems right?!

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.