Discussion of how data warehousing and analytic technologies are applied to clickstream analysis and other web analytics challenges. Related subjects include:
I talk with a lot of companies, and repeatedly hear some of the same application themes. This post is my attempt to collect some of those ideas in one place.
1. So far, the buzzword of the year is “real-time analytics”, generally with “operational” or “big data” included as well. I hear variants of that positioning from NewSQL vendors (e.g. MemSQL), NoSQL vendors (e.g. AeroSpike), BI stack vendors (e.g. Platfora), application-stack vendors (e.g. WibiData), log analysis vendors (led by Splunk), data management vendors (e.g. Cloudera), and of course the CEP industry.
Yeah, yeah, I know — not all the named companies are in exactly the right market category. But that’s hard to avoid.
Why this gold rush? On the demand side, there’s a real or imagined need for speed. On the supply side, I’d say:
- There are vast numbers of companies offering data-management-related technology. They need ways to differentiate.
- Doing analytics at short-request speeds is an obvious data-management-related challenge, and not yet comprehensively addressed.
2. More generally, most of the applications I hear about are analytic, or have a strong analytic aspect. The three biggest areas — and these overlap — are:
- Customer interaction
- Network and sensor monitoring
- Game and mobile application back-ends
Also arising fairly frequently are:
- Algorithmic trading
- Risk measurement
- Law enforcement/national security
- Stakeholder-facing analytics
I’m hearing less about quality, defect tracking, and equipment maintenance than I used to, but those application areas have anyway been ebbing and flowing for decades.
In typical debates, the extremists on both sides are wrong. “SQL vs. NoSQL” is an example of that rule. For many traditional categories of database or application, it is reasonable to say:
- Relational databases are usually still a good default assumption …
- … but increasingly often, the default should be overridden with a more useful alternative.
Reasons to abandon SQL in any given area usually start:
- Creating a traditional relational schema is possible …
- … but it’s tedious or difficult …
- … especially since schema design is supposed to be done before you start coding.
Some would further say that NoSQL is cheaper, scales better, is cooler or whatever, but given the range of NewSQL alternatives, those claims are often overstated.
Sectors where these reasons kick in include but are not limited to: Read more
|Categories: Health care, Investment research and trading, Log analysis, NewSQL, NoSQL, Web analytics||8 Comments|
I recently opined that, especially for cutting-edge internet businesses, analytic applications were not a realistic option; rather, analytic application subsystems are the most you can currently expect. Erin Griffith further observed that the problem isn’t just confined to analytics:
“We didn’t need 90 percent of the stuff they were offering, and when we told them what we did need — integration with social, curation tools, individual boutiques and analytics — they had nothing”
… a suitable solution to merge his editorial staff’s output with his separate site for selling tickets to events and goods … was not available, so had to build his own hybrid publishing and commerce platform. Likewise, Birchbox had to build a custom backend so that it could include videos and editorial content alongside its e-commerce site.
… it’s DIY or die.
With that as background, let’s consider why building business-to-consumer internet software is so complicated.
I’d suggest that a consumer website starts with four major conceptual parts: Read more
With Strata/Hadoop World being next week, there is much Hadoop discussion. One theme of the season is BI over Hadoop. I have at least 5 clients claiming they’re uniquely positioned to support that (most of whom partner with a 6th client, Tableau); the first 2 whose offerings I’ve actually written about are Teradata Aster and Hadapt. More generally, I’m hearing “Using Hadoop is hard; we’re here to make it easier for you.”
If enterprises aren’t yet happily running business intelligence against Hadoop, what are they doing with it instead? I took the opportunity to ask Cloudera, whose answers didn’t contradict anything I’m hearing elsewhere. As Cloudera tells it (approximately — this part of the conversation* was rushed): Read more
|Categories: Business intelligence, Cloudera, EAI, EII, ETL, ELT, ETLT, Hadoop, HBase, Health care, Investment research and trading, MapR, Market share and customer counts, Telecommunications, Web analytics||4 Comments|
In a short October, 2011 post about Datameer, I wrote:
Datameer is designed to let you do simple stuff on large amounts of data, where “large amounts of data” typically means data in Hadoop, and “simple stuff” includes basic versions of a spreadsheet, of BI, and of EtL (Extract/Transform/Load, without much in the way of T).
That’s all still mainly true, although with the recent Datameer 2.0:
- You can run Datameer and the underlying Hadoop on a desktop or workgroup group.
- There are some infographics pretty-picture-drawing capabilities, which will surely delight those who like vector-based HTML 5 pictures of coffee cups, saucers and macaroons.
- No doubt Datameer has been generally enhanced on multiple fronts.
In essence, Datameer has two positionings.
- One is “OK, you’ve got Hadoop — now wouldn’t you like to do something useful with it?” That can include both business intelligence and ETL.
- Beyond that, Datameer founder/CEO Stefan Groschupf’s core argument is that schema-on-read is really, really useful, even at the cost of absorbing a potentially large performance hit. In other words, he’s making a case for a form of non-relational BI.
|Categories: Business intelligence, Data models and architecture, Datameer, EAI, EII, ETL, ELT, ETLT, Hadoop, Log analysis, Market share and customer counts, Web analytics||8 Comments|
I talked with MemSQL shortly before today’s launch. MemSQL technology basics are:
- In-memory relational DBMS.
- Being released single-box only. Transparent sharding is under development for release in the fall. Basic replication is under development too.
- Subset of SQL-92.
- MySQL wire-compatible (SQL coverage issues excepted).
MemSQL’s performance claims include:
- Read performance 10% or so worse than memcached.
- Write performance 20% or so better than memcached.
- 1.2 million inserts/second on a 64-core, 1/2 TB of RAM machine.
- Similarly, 1/2 billion records loaded in under 20 minutes.
MemSQL company basics include: Read more
|Categories: Database compression, In-memory DBMS, Investment research and trading, Market share and customer counts, memcached, MemSQL, OLTP, Pricing, Web analytics||3 Comments|
I previously dropped a few hints about my clients at Metamarkets, mentioning that they:
- Have built vertical-market analytic platform technology.
- Use a lot of Hadoop.
- Throw good parties. (That’s where the background photo on my Twitter page comes from.)
But while they’re a joy to talk with, writing about Metamarkets has been frustrating, with many hours and pages of wasted of effort. Even so, I’m trying again, in a three-post series:
Much like Workday, Inc., Metamarkets is a SaaS (Software as a Service) company, with numerous tiers of servers and an affinity for doing things in RAM. That’s where most of the similarities end, however, as Metamarkets is a much smaller company than Workday, doing very different things.
Metamarkets’ business is SaaS (Software as a Service) business intelligence, on large data sets, with low latency in both senses (fresh data can be queried on, and the queries happen at RAM speed). As you might imagine, Metamarkets is used by digital marketers and other kinds of internet companies, whose data typically wants to be in the cloud anyway. Approximate metrics for Metamarkets (and it may well have exceeded these by now) include 10 customers, 100,000 queries/day, 80 billion 100-byte events/month (before summarization), 20 employees, 1 popular CEO, and a metric ton of venture capital.
To understand how Metamarkets’ technology works, it probably helps to start by realizing: Read more
Last November, I wrote two posts on agile predictive analytics. It’s time to return to the subject. I’m used to KXEN talking about the ability to do predictive modeling, very quickly, perhaps without professional statisticians; that the core of what KXEN does. But I was surprised when Revolution Analytics told me a similar story, based on a different approach, because ordinarily that’s not how R is used at all.
Ultimately, there seem to be three reasons why you’d want quick turnaround on your predictive modeling: Read more
|Categories: Business intelligence, Investment research and trading, KXEN, Predictive modeling and advanced analytics, Revolution Analytics, Telecommunications, Web analytics||10 Comments|
There are several reasons it’s hard to confirm great analytic user stories. First, there aren’t as many jaw-dropping use cases as one might think. For as I wrote about performance, new technology tends to make things better, but not radically so. After all, if its applications are …
… all that bloody important, then probably people have already been making do to get it done as best they can, even in an inferior way.
Further, some of the best stories are hard to confirm; even the famed beer/diapers story isn’t really true. Many application areas are hard to nail down due to confidentiality, especially but not only in such “adversarial” domains as anti-terrorism, anti-spam, or anti-fraud.
Even so, I have two questions in my inbox that boil down to “What are the coolest or most significant analytics stories out there?” So let’s round up some of what I know. Read more
|Categories: Analytic technologies, Google, Health care, Investment research and trading, Predictive modeling and advanced analytics, Scientific research, Telecommunications, Web analytics||6 Comments|
It is a reasonable (over)simplification to say that my business boils down to:
- Advising vendors what/how to sell.
- Advising users what/how to buy.
One complication that commonly creeps in is that different groups of users have different buying practices and technology needs. Usually, I nod to that point in passing, perhaps by listing different application areas for a company or product. But now let’s address it head on. Whether or not you care about the particulars, I hope the sheer length of this post reminds you that there are many different market segments out there.
Last June I wrote:
In almost any IT decision, there are a number of environmental constraints that need to be acknowledged. Organizations may have standard vendors, favored vendors, or simply vendors who give them particularly deep discounts. Legacy systems are in place, application and system alike, and may or may not be open to replacement. Enterprises may have on-premise or off-premise preferences; SaaS (Software as a Service) vendors probably have multitenancy concerns. Your organization can determine which aspects of your system you’d ideally like to see be tightly integrated with each other, and which you’d prefer to keep only loosely coupled. You may have biases for or against open-source software. You may be pro- or anti-appliance. Some applications have a substantial need for elastic scaling. And some kinds of issues cut across multiple areas, such as budget, timeframe, security, or trained personnel.
I’d further say that it matters whether the buyer:
- Is a large central IT organization.
- Is the well-staffed IT organization of a particular business department.
- Is a small, frazzled IT organization.
- Has strong engineering or technical skills, but less in the way of IT specialists.
- Is trying to skate by without much technical knowledge of any kind.
Now let’s map those considerations (and others) to some specific market segments. Read more
|Categories: Data mart outsourcing, Games and virtual worlds, IBM and DB2, Investment research and trading, Microsoft and SQL*Server, Open source, Software as a Service (SaaS), Telecommunications, Web analytics||9 Comments|