October 18, 2011

Oracle is buying Endeca

Oracle is buying Endeca. The official talking points for the deal aren’t a perfect match for Endeca’s actual technology, but so be it.

In that post, I wrote:

… the Endeca paradigm is really to help you make your way through a structured database, where different portions of the database have different structures. Thus, at various points in your journey, it automagically provides you a list of choices as to where you could go next.

That kind of thing could help Oracle with apps like the wireless telco product catalog deal MongoDB got.

Going back to the Endeca-post quote well, Endeca itself said:

Inside the MDEX Engine there is no overarching schema; each data record carries its own metadata. This enables the rapid combination of a wide range of structured and unstructured content into Latitude’s unified data model. Once inside, the MDEX Engine derives common dimensions and metrics from the available metadata, instantly exposing each for high-performance refinement and analysis in the Discovery Framework. Have a new data source? Simply add it and the MDEX Engine will create new relationships where possible. Changes in source data schema? No problem, adjustments on the fly are easy.

And I pointed out that the MDEX engine was a columnar DBMS.

Meanwhile, Oracle’s own columnar DBMS efforts have been disappointing. Endeca could be an intended answer to that. However, while Oracle’s track record with standalone DBMS acquisitions is admirable (DEC RDB, MySQL, etc.), Oracle’s track record of integrating DBMS acquisitions into the Oracle product itself is not so good. (Express? Essbase? The text product line? None of that has gone particularly well.)

So while I would expect Endeca’s flagship e-commerce shopping engine products to flourish under Oracle’s ownership, I would be cautious about the integration of Endeca’s core technology into the Oracle product line.

Categories: Columnar database management, Endeca, Oracle

7 Comments

October 18, 2011

Vertica Community Edition

The press release announcing Vertica’s Community Edition is a bit vague. And indeed, much of what I know about Vertica Community Edition is along the lines of “This is what I think will happen, but of course it could still change.” That said, I believe:

Vertica Community Edition has all of regular Vertica’s features. However …
… HP Vertica reserves the right to open a feature gap in future releases.
The license restriction on Vertica Community Edition is that you’re limited to 1 terabyte of data, and 3 nodes. I imagine that’s for one production copy, and you’re perfectly free to also set up mirrors for test, development, disaster recovery, and so on. However …
… HP Vertica would be annoyed if you stuck a free copy of Vertica on each of 50 nodes and managed the whole thing via, say, Hadapt.
HP Vertica plans to be very generous with true academic researchers, suspending or waiving limits on database size and node count. Not coincidentally, Vertica Community Edition is being announced at XLDB, where Vertica is also a top-level sponsor. (I introduced Vertica and XLDB’s Jacek Becla to each other as soon as I heard about Vertica’s Community Edition plans.)
The only support available for Vertica Community Edition is through forums. This could change.

I’m a big supporter of the Vertica Community Edition idea, for four reasons:

It should now be easier to download and evaluate Vertica.
Vertica Community Edition could be a big help to academic researchers.
Vertica could now be more appealing to some of the “Omigod, we’re outgrowing Oracle Standard Edition and we don’t want to pay up for Oracle Enterprise Edition/Exadata” crowd.
People are under the impression that what Vertica actually charges today resembles its long-ago list prices. This announcement may help puncture Vertica’s outdated pricing image.

Categories: Pricing, Vertica Systems

7 Comments

October 14, 2011

Commercial software for academic use

As Jacek Becla explained:

Academic scientists like their software to be open source, for reasons that include both free-like-speech and free-like-beer.
What’s more, they like their software to be dead-simple to administer and use, since they often lack the dedicated human resources for anything else.

Even so, I think that academic researchers, in the natural and social sciences alike, commonly overlook the wealth of commercial software that could help them in their efforts.

I further think that the commercial software industry could do a better job of exposing its work to academics, where by “expose” I mean:

Give your stuff to academics for free.
Call their attention to your free offering.

Reasons to do so include:

Public benefit. Scientific research is important.
Training future customers. There’s huge academic/commercial crossover, especially as students join the for-profit workforce.

Categories: Business intelligence, Data warehousing, Infobright, Petabyte-scale data management, Predictive modeling and advanced analytics, Scientific research

7 Comments

October 13, 2011

Compression in Sybase ASE 15.7

Sybase recently came up with Adaptive Server Enterprise 15.7, which is essentially the “Make SAP happy” release. Features that were slated for 2012 release, but which SAP wanted, were accelerated into 2011. Features that weren’t slated for 2012, but which SAP wanted, were also brought into 2011. Not coincidentally, SAP Business Suite will soon run on Sybase Adaptive Server Enterprise 15.7.

15.7 turns out to be the first release of Sybase ASE with data compression. Sybase fondly believes that it is matching DB2 and leapfrogging Oracle in compression rate with a single compression scheme, namely page-level tokenization. More precisely, SAP and Sybase seem to believe that about compression rates for actual SAP application databases, based on some degree of testing. Read more

Categories: Database compression, Sybase

5 Comments

October 11, 2011

IBM is buying parallelization expert Platform Computing

IBM is acquiring Platform Computing, a company with which I had one briefing, last August. Quick background includes: Read more

Categories: Hadoop, IBM and DB2, Investment research and trading, MapReduce, Parallelization, Scientific research

5 Comments

October 10, 2011

Text data management, Part 3: Analytic and progressively enhanced

This is Part 3 of a three post series. The posts cover:

Confusion about text data management.

Choices for text data management (general and short-request).

Choices for text data management (analytic).

I’ve gone on for two long posts about text data management already, but even so I’ve glossed over a major point:

Using text data commonly involves a long series of data enhancement steps.

Even before you do what we’d normally think of as “analysis”, text markup can include steps such as:

Figure out where the words break.
Figure out where the clauses and sentences break.
Figure out where the paragraphs, sections, and chapters break.
(Where necessary) map the words to similar ones — spelling correction, stemming, etc.
Figure out which words are grammatically which parts of speech.
Figure out which pronouns and so on refer to which other words. (Technical term: Anaphora resolution.)
Figure out what was being said, one clause at a time.
Figure out the emotion — or “sentiment” — associated with it.

Those processes can add up to dozens of steps. And maybe, six months down the road, you’ll think of more steps yet.

Categories: Data warehousing, Hadoop, NoSQL, Text

4 Comments

October 10, 2011

Text data management, Part 2: General and short-request

This is Part 2 of a three post series. The posts cover:

I’ve recently given widely varied advice about managing text (and similar files — images and so on), ranging from

Sure, just keep going with your old strategy of keeping .PDFs in the file system and pointing to them from the relational database. That’s an easy performance optimization vs. having the RDBMS manage them as BLOBs.

I suspect MongoDB isn’t heavyweight enough for your document management needs, let alone just dumping everything into Hadoop. Why don’t you take a look at MarkLogic?

Here are some reasons why.

There are three basic kinds of text management use case:

Text as payload.
Text as search parameter.
Text as analytic input.

Categories: MarkLogic, NoSQL, Text

5 Comments

October 10, 2011

Text data management, Part 1: Confusion

This is Part 1 of a three post series. The posts cover:

There’s much confusion about the management of text data, among technology users, vendors, and investors alike. Reasons seems to include:

The terminology around text data is inaccurate.
Data volume estimates for text are misleading.
Multiple different technologies are in the mix, including:
- Enterprise text search.
- Text analytics — text mining, sentiment analysis, etc.
- Document stores — e.g. document-oriented NoSQL, or MarkLogic.
- Log management and parsing — e.g. Splunk.
- Text archiving — e.g., various specialty email archiving products I couldn’t even name.
- Public web search — Google et al.
Text search vendors have disappointed, especially technically.
Text analytics vendors have disappointed, especially financially.
Other analytic technology vendors ignore what the text analytic vendors actually have accomplished, and reinvent inferior wheels rather than OEM the state of the art.

Above all: The use cases for text data vary greatly, just as the use cases for simply-structured databases do.

There are probably fewer people now than there were six years ago who need to be told that text and relational database management are very different things. Other misconceptions, however, appear to be on the rise. Specific points that are commonly overlooked include: Read more

Categories: Analytic technologies, Archiving and information preservation, Google, Log analysis, MarkLogic, NoSQL, Oracle, Splunk, Text

2 Comments

October 4, 2011

Cloudera versus Hortonworks

A few weeks ago I wrote:

The other big part of Hortonworks’ story is the claim that it holds the axe in Apache Hadoop development.

and

… just how dominant Hortonworks really is in core Hadoop development is a bit unclear. Meanwhile, Cloudera people seem to be leading a number of Hadoop companion or sub-projects, including the first two I can think of that relate to Hadoop integration or connectivity, namely Sqoop and Flume. So I’m not persuaded that the “we know this stuff better” part of the Hortonworks partnering story really holds up.

Now Mike Olson — CEO of my client Cloudera — has posted his analysis of the matter, in response to an earlier Hortonworks post asserting its claims. In essence, Mike argues:

It’s ridiculous to say any one company, e.g. Hortonworks, has a controlling position in Hadoop development.
Such diversity is a Very Good Thing.
Cloudera folks now contribute and always have contributed to Hadoop at a higher rate than Hortonworks folks.
If you consider just core Hadoop projects — the most favorable way of counting from a Hadoop standpoint — Hortonworks has a lead, but not all that big of one.

Categories: Cloudera, Hadoop, Hortonworks, MapReduce, Open source

6 Comments

October 3, 2011

Teradata Unity and the idea of active-active data warehouse replication

Teradata is having its annual conference, Teradata Partners, at the same time as Oracle OpenWorld this week. That made it an easy decision for Teradata to preannounce its big news, Teradata Columnar and the rest of Teradata 14. But of course it held some stuff back, notably Teradata Unity, which is the name chosen for replication technology based on Teradata’s Xkoto acquisition.

The core mission of Teradata Unity is asynchronous, near-real-time replication across Teradata systems. The point of “asynchronous” is performance. The point of “near-real-time” is that it Teradata Unity can be used for high availability and disaster recovery, and further can be used to allow real work on HA and DR database copies. Teradata Unity works request-at-a-time, which limits performance somewhat;* Unity has a lock manager that makes sure updates are applied in the same order on all copies, in cases where locks are needed at all.

Categories: Data warehousing, Teradata

2 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Oracle is buying Endeca

Vertica Community Edition

Commercial software for academic use

Compression in Sybase ASE 15.7

IBM is buying parallelization expert Platform Computing

Text data management, Part 3: Analytic and progressively enhanced

Text data management, Part 2: General and short-request

Text data management, Part 1: Confusion

Cloudera versus Hortonworks

Teradata Unity and the idea of active-active data warehouse replication

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin