Posts focusing on the use of database and analytic technologies in specific application domains. Related subjects include:
- Any subcategory
- (in Text Technologies) Specific application areas for text analytics
Interana has an interesting story, in technology and business model alike. For starters:
- Interana does ad-hoc event series analytics, which they call “interactive behavioral analytics solutions”.
- Interana has a full-stack analytic offering, include:
- Its own columnar DBMS …
- … which has a non-SQL DML (Data Manipulation Language) meant to handle event series a lot more fluently than SQL does, but which the user is never expected to learn because …
- … there also are BI-like visual analytics tools that support plenty of drilldown.
- Interana sells all this to “product” departments rather than marketing, because marketing doesn’t sufficiently value Interana’s ad-hoc query flexibility.
- Interana boasts >40 customers, with annual subscription fees ranging from high 5 figures to low 7 digits.
And to be clear — if we leave aside any questions of marketing-name sizzle, this really is business intelligence. The closest Interana comes to helping with predictive modeling is giving its ad-hoc users inspiration as to where they should focus their modeling attention.
Interana also has an interesting twist in its business model, which I hope can be used successfully by other enterprise software startups as well. Read more
0. A huge fraction of what’s important in analytics amounts to making sure that you are analyzing the right data. To a large extent, “the right data” means “the right subset of your data”.
1. In line with that theme:
- Relational query languages, at their core, subset data. Yes, they all also do arithmetic, and many do more math or other processing than just that. But it all starts with the set theory.
- Underscoring the power of this approach, other data architectures over which analytics is done usually wind up with SQL or “SQL-like” language access as well.
2. Business intelligence interfaces today don’t look that different from what we had in the 1980s or 1990s. The biggest visible* changes, in my opinion, have been in the realm of better drilldown, ala QlikView and then Tableau. Drilldown, of course, is the main UI for business analysts and end users to subset data themselves.
*I used the word “visible” on purpose. The advances at the back end have been enormous, and much of that redounds to the benefit of BI.
3. I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the template:
- Divide your data into clusters.
- Model each cluster separately.
That continues to be tough work. Attempts to productize shortcuts have not caught fire.
|Categories: Business intelligence, Data warehousing, Databricks, Spark and BDAS, Google, Log analysis, Memory-centric data management, Predictive modeling and advanced analytics, QlikTech and QlikView, Tableau Software, Web analytics||Leave a Comment|
For starters, let me say:
- SequoiaDB, the company, is my client.
- SequoiaDB, the product, is the main product of SequoiaDB, the company.
- SequoiaDB, the company, has another product line SequoiaCM, which subsumes SequoiaDB in content management use cases.
- SequoiaDB, the product, is fundamentally a JSON data store. But it has a relational front end …
- … and is usually sold for RDBMS-like use cases …
- … except when it is sold as part of SequoiaCM, which adds in a large object/block store and a content-management-oriented library.
- SequoiaDB’s products are open source.
- SequoiaDB’s largest installation seems to be 2 PB across 100 nodes; that includes block storage.
- Figures for DBMS-only database sizes aren’t as clear, but the sweet spot of the cluster-size range for such use cases seems to be 6-30 nodes.
- SequoiaDB, the company, was founded in Toronto, by former IBM DB2 folks.
- Even so, it’s fairly accurate to view SequoiaDB as a Chinese company. Specifically:
- SequoiaDB’s founders were Chinese nationals.
- Most of them went back to China.
- Other employees to date have been entirely Chinese.
- Sales to date have been entirely in China, but SequoiaDB has international aspirations
- SequoiaDB has >100 employees, a large majority of which are split fairly evenly between “engineering” and “implementation and technical support”.
- SequoiaDB’s marketing (as opposed to sales) department is astonishingly tiny.
- SequoiaDB cites >100 subscription customers, including 10 in the global Fortune 500, a large fraction of which are in the banking sector. (Other sectors mentioned repeatedly are government and telecom.)
Unfortunately, SequoiaDB has not captured a lot of detailed information about unpaid open source production usage.
|Categories: Application areas, Business intelligence, Data models and architecture, Data warehousing, Databricks, Spark and BDAS, Market share and customer counts, NoSQL, OLTP, Open source, PostgreSQL, SequoiaDB, Structured documents||4 Comments|
“Real-time” technology excites people, and has for decades. Yet the actual, useful technology to meet “real-time” requirements remains immature, especially in cases which call for rapid human decision-making. Here are some notes on that conundrum.
1. I recently posted that “real-time” is getting real. But there are multiple technology challenges involved, including:
- General streaming. Some of my posts on that subject are linked at the bottom of my August post on Flink.
- Low-latency ingest of data into structures from which it can be immediately analyzed. That helps drive the (re)integration of operational data stores, analytic data stores, and other analytic support — e.g. via Spark.
- Business intelligence that can be used quickly enough. This is a major ongoing challenge. My clients at Zoomdata may be thinking about this area more clearly than most, but even they are still in the early stages of providing what users need.
- Advanced analytics that can be done quickly enough. Answers there may come through developments in anomaly management, but that area is still in its super-early days.
- Alerting, which has been under-addressed for decades. Perhaps the anomaly management vendors will finally solve it.
|Categories: Business intelligence, Databricks, Spark and BDAS, In-memory DBMS, Investment research and trading, Log analysis, Streaming and complex event processing (CEP), Text, Web analytics, Zoomdata||2 Comments|
Then felt I like some watcher of the skies
When a new planet swims into his ken
— John Keats, “On First Looking Into Chapman’s Homer”
1. In June I wrote about why anomaly management is hard. Well, not only is it hard to do; it’s hard to talk about as well. One reason, I think, is that it’s hard to define what an anomaly is. And that’s a structural problem, not just a semantic one — if something is well enough understood to be easily described, then how much of an anomaly is it after all?
Artificial intelligence is famously hard to define for similar reasons.
“Anomaly management” and similar terms are not yet in the software marketing mainstream, and may never be. But naming aside, the actual subject matter is important.
2. Anomaly analysis is clearly at the heart of several sectors, including:
- IT operations
- Factory and other physical-plant operations
Each of those areas features one or both of the frameworks:
- Surprises are likely to be bad.
- Coincidences are likely to be suspicious.
So if you want to identify, understand, avert and/or remediate bad stuff, data anomalies are the first place to look.
3. The “insights” promised by many analytics vendors — especially those who sell to marketing departments — are also often heralded by anomalies. Already in the 1970s, Walmart observed that red clothing sold particularly well in Omaha, while orange flew off the shelves in Syracuse. And so, in large college towns, they stocked their stores to the gills with clothing in the colors of the local football team. They also noticed that fancy dresses for little girls sold especially well in Hispanic communities … specifically for girls at the age of First Communion.
|Categories: Business intelligence, Log analysis, Predictive modeling and advanced analytics, Web analytics||1 Comment|
1. The cloud is super-hot. Duh. And so, like any hot buzzword, “cloud” means different things to different marketers. Four of the biggest things that have been called “cloud” are:
- The Amazon cloud, Microsoft Azure, and their competitors, aka public cloud.
- Software as a service, aka SaaS.
- Co-location in off-premises data centers, aka colo.
- On-premises clusters (truly on-prem or colo as the case may be) designed to run a broad variety of applications, aka private cloud.
Further, there’s always the idea of hybrid cloud, in which a vendor peddles private cloud systems (usually appliances) running similar technology stacks to what they run in their proprietary public clouds. A number of vendors have backed away from such stories, but a few are still pushing it, including Oracle and Microsoft.
This is a good example of Monash’s Laws of Commercial Semantics.
2. Due to economies of scale, only a few companies should operate their own data centers, aka true on-prem(ises). The rest should use some combination of colo, SaaS, and public cloud.
This fact now seems to be widely understood.
I’ve been an analyst for 35 years, and debates about “real-time” technology have run through my whole career. Some of those debates are by now pretty much settled. In particular:
- Yes, interactive computer response is crucial.
- Into the 1980s, many apps were batch-only. Demand for such apps dried up.
- Business intelligence should occur at interactive speeds, which is a major reason that there’s a market for high-performance analytic RDBMS.
- Theoretical arguments about “true” real-time vs. near-real-time are often pointless.
- What matters in most cases is human users’ perceptions of speed.
- Most of the exceptions to that rule occur when machines race other machines, for example in automated bidding (high frequency trading or otherwise) or in network security.
A big issue that does remain open is: How fresh does data need to be? My preferred summary answer is: As fresh as is needed to support the best decision-making. I think that formulation starts with several advantages:
- It respects the obvious point that different use cases require different levels of data freshness.
- It cautions against people who think they need fresh information but aren’t in a position to use it. (Such users have driven much bogus “real-time” demand in the past.)
- It covers cases of both human and automated decision-making.
Straightforward applications of this principle include: Read more
Databricks CEO Ali Ghodsi checked in because he disagreed with part of my recent post about Databricks. Ali’s take on Databricks’ position in the Spark world includes:
- What I called Databricks’ “secondary business” of “licensing stuff to Spark distributors” was really about second/third tier support. Fair enough. But distributors of stacks including Spark, for whatever combination of on-premise and cloud as the case may be, may in many cases be viewed as competitors to Databricks cloud-only service. So why should Databricks help them?
- Databricks’ investment in Spark Summit and similar evangelism is larger than I realized.
- Ali suggests that the fraction of Databricks’ engineering devoted to open source Spark is greater than I understood during my recent visit.
Ali also walked me through customer use cases and adoption in wonderful detail. In general:
- A large majority of Databricks customers have machine learning use cases.
- Predicting and preventing user/customer churn is a huge issue across multiple market sectors.
The story on those sectors, per Ali, is: Read more
Five years ago, in a taxonomy of analytic business benefits, I wrote:
A large fraction of all analytic efforts ultimately serve one or more of three purposes:
- Problem and anomaly detection and diagnosis
- Planning and optimization
That continues to be true today. Now let’s add a bit of spin.
1. A large fraction of analytics is adversarial. In particular: Read more
|Categories: Business intelligence, Investment research and trading, Log analysis, Predictive modeling and advanced analytics, RDF and graphs, Surveillance and privacy, Web analytics||3 Comments|
Numerous tussles fit the template:
- A government wants access to data contained in one or more devices (mobile/personal or server as the case may be).
- The computer’s manufacturer or operator doesn’t want to provide it, for reasons including:
- That’s what customers prefer.
- That’s what other governments require.
- Being pro-liberty is the right and moral choice. (Yes, right and wrong do sometimes actually come into play. )
As a general rule, what’s best for any kind of company is — pricing and so on aside — whatever is best or most pleasing for their customers or users. This would suggest that it is in tech companies’ best interest to favor privacy, but there are two important quasi-exceptions: Read more
|Categories: Amazon and its cloud, Google, Microsoft and SQL*Server, Surveillance and privacy, Web analytics||2 Comments|