April 21, 2010

ITA Software and Needlebase

Rumors are flying that Google may acquire ITA Software. I know nothing of their validity, but I have known about ITA Software for a while. Random notes include:

ITA Software builds huge OLTP systems that it runs itself on behalf of airlines.
Very, very unusually, ITA Software builds these huge OLTP systems in LISP.
ITA Software is an Oracle shop (see Dan Weinreb’s comment).
ITA Software is run by a techie (again, see Dan Weinreb’s comment).
ITA Software has an interesting screen-scraping/web ETL project called Needlebase

ITA’s software does both price/reservation lookup/checking and reservation-making. I’ve had trouble keeping it straight, but I think the lookup is ITA’s actual business, and the reservation-making is ITA’s Next Big Thing. This is one of the ultimate federated-transaction-processing applications, because it involves coordinating huge OLTP systems run, in some cases, by companies that are bitter competitors with each other. Network latencies have to allow for intercontinental travel of the data itself.

Indeed, airline reservation systems are pretty much the OLTP ultimate in themselves. As the story goes, transaction monitors were pretty much invented for airline reservation systems in the 1960s.

A really small project for ITA Software is Needlebase. I stopped by ITA to look at Needlebase in January, and what it is is a very smart and hence interesting screen-scraping system. The idea is people publish database information to the web, and you may want to look at their web pages and recover the database records it is based on. Applications of this to the airline industry, which has 100s of 1000s of price changes per day — and I may be too low by one or two orders of magnitude when I say that — should be fairly obvious. ITA Software has aspirations of applying Needlebase to other sectors as well, or more precisely having users who do so. Last I looked, ITA hadn’t put significant resources behind stimulating Needlebase adoption — but Google might well change that.

Edit: I just re-found an old characterization of (some of) what ITA Software does by — who else? — Dan Weinreb:

I am working on our new product, an airline reservation system. It’s an online transaction-processing system that must be up 99.99% of the time, maintaining maximum response time (e.g. on www.aircanada.com). It’s a very, very complicated system. The presentation layer is written in Java using conventional techniques. The business rule layer is written in Common Lisp; about 500,000 lines of code (plus another 100,000 or so of open source libraries). The database layer is Oracle RAC. We operate our own data centers, some here in Massachusetts and a disaster-recovery site in Canada (separate power grid).

Related links

ITA Software and Needlebase websites
More about LISP 🙂

Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Google, OLTP, Oracle

5 Comments

April 18, 2010

I’ll be speaking in Washington, DC on May 6

My clients at Aster Data are putting on a sequence of conferences called “Big Data Summit(s)”, and wanted me to keynote one. I agreed to the one in Washington, DC, on May 6, on the condition that I would be allowed to start with the same liberty and privacy themes I started my New England Database Summit keynote with. Since I already knew Aster to be one of the multiple companies in this industry that is responsibly concerned about the liberty and privacy threats we’re all helping cause, I expected them to agree to that condition immediately, and indeed they did.

On a rough-draft basis, my talk concept is:

Implications of New Analytic Technology in four areas:

Liberty & privacy
Data acquisition & retention
Data exploration
Operationalized analytics

I haven’t done any work yet on the talk besides coming up with that snippet, and probably won’t until the week before I give it. Suggestions are welcome.

If anybody actually has a link to a clear discussion of legislative and regulatory data retention requirements, that would be cool. I know they’ve exploded, but I don’t have the details.

Categories: Analytic technologies, Archiving and information preservation, Aster Data, Data warehousing, Presentations, Surveillance and privacy

1 Comment

April 18, 2010

Greenplum et alia’s BigDataNews.com site

Greenplum recently started a website BigDataNews.com, and quickly signed up Aster Data as a co-sponsor. (Edit: As per a comment below, the decision to sign up additional sponsors was made by the site’s independent publisher.) It’s actually being run by Brett Sheppard, a former Gartner/DataQuest analyst who now gets involved in this kind of thing. (Brett and I may be working on another project soon, with Greenplum funding.)

The heart of the site is feeds* from a variety of high-profile blogs (DBMS2, Daniel Abadi’s, Joe Hellerstein’s, James Kobelius’, et al.), plus some additional posts written by Brett (primarily) or Greenplum folks. Highlights of Brett’s posts include:

What I am told was an unauthorized revelation that Greenplum Chorus is built on CouchDB and Erlang.
An impassioned defense of the integrity of Gartner’s analysis.

*At least in my case, that’s just a post title or snippet, plus a link back to the main post. The same goes for mapreduce.org, actually.

Categories: Analytic technologies, Data warehousing, Greenplum, NoSQL

2 Comments

April 18, 2010

Aster Data’s mapreduce.org site

Aster Data has started a site mapreduce.org, which purports to compile “the best information about MapReduce.” At the moment, mapreduce.org highlights include:

A feed of MapReduce-related posts from several blogs, including this one.
A calendar of MapReduce-related events, not necessarily Aster-specific, integrated with a feed combining …
- … Aster MapReduce-related press releases and also …
- … not necessarily Aster-specific MapReduce-related press articles.
Links to a lot of Aster Data MapReduce-related collateral. Some of that stuff is quite good.*
A sycophantic introduction from Colin White praising the value of the mapreduce.org “independent forum.”

*I did a couple of MapReduce-related webinars for Aster late last year. 🙂 But seriously — Aster does a good job of writing clear and informative collateral.

Categories: Analytic technologies, Aster Data, MapReduce

3 Comments

April 16, 2010

Introduction to Datameer

Elder care issues have flared up with a vengeance, so I’m not going to be blogging much for a while, and surely not at any length. That said, my first post about Datameer was never going to be very long, so lets get right to it:

Datameer offers a business intelligence and analytics stack that runs on any distribution of Hadoop.
Datameer is still building a lot of features that it talks about, for target release in (I think) the fall.
Datameer’s pride and joy is its user interface. Very laudably for a software start-up, Datameer claims to have spent considerable time with professional user interface designers.
Datameer’s core user interface metaphor is formula definition via a spreadsheet.
Datameer includes 124 functions one can use in these formulae, ranging from math stuff to text tokenization.
Datameer does some straight BI, with 4 kinds of “visualization” headed for 20 kinds later. But if you want to do hard-core BI, use Datameer to dump data into an RDBMS and then use the BI tool of your choice. (Datameer’s messaging does tend to obscure or even contradict that point.)
Rather, Datameer seems to be designed for the classic MapReduce use cases of ETL and heavy data crunching.
Datameer’s messaging includes a bit about “Datameer is real-time, even though Hadoop is generally thought of as batch.” So far as I can tell, what that boils down to is …
… Datameer will let you examine sample and/or partial query results before a full Hadoop run is over. Apparently, there are three different ways Datameer lets you do this:
- You can truly query against a sample of the data set.
- You can query against intermediate results, when only some stages of the Hadoop process have already been run.
- You can drill down into a “distributed index,” whatever the heck that means when Datameer says it.
Datameer will let you import data from 15 or so different kinds of sources, SQL, NoSQL, and file system alike.

Categories: Analytic technologies, Business intelligence, Datameer, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce

3 Comments

April 16, 2010

Story of an analytic DBMS evaluation

One of our readers was kind enough to walk me through his analytic DBMS evaluation process. The story is:

The X Company (XCo) has a <1 TB database.
100s of XCo’s customers log in at once to run reports. 50-200 concurrent queries is a good target number.
XCo had been “suffering” with Oracle and wanted to upgrade.
XCo didn’t have a lot of money to spend. Netezza pulled out of the sales cycle early due to budget (and this was recently enough that Netezza Skimmer could have been bid).
Greenplum didn’t offer any references that approached the desired number of concurrent users.
Ultimately the evaluation came down to Vertica and ParAccel.
Vertica won.

Notes on the Vertica vs. ParAccel selection include: Read more

Categories: Analytic technologies, Benchmarks and POCs, Buying processes, Data warehousing, Greenplum, Netezza, Oracle, ParAccel, Vertica Systems

7 Comments

April 12, 2010

Greenplum Chorus and Greenplum 4.0

Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.

Greenplum 4.0 highlights and related observations include: Read more

Categories: Analytic technologies, Benchmarks and POCs, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, Market share and customer counts, Petabyte-scale data management, Specific users, Telecommunications, Theory and architecture

5 Comments

April 12, 2010

Is the enterprise data warehouse a myth?

An enterprise data warehouse should:

Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.
Manage all the data in your organization.

Pick ONE. Read more

Categories: Data models and architecture, Data warehousing, Database diversity, Teradata, Theory and architecture

8 Comments

April 8, 2010

Examples of machine-generated data

Not long ago I pointed out that much future Big Data growth will be in the area of machine-generated data, examples of which include: Read more

Categories: Analytic technologies, Data warehousing, Games and virtual worlds, Investment research and trading, Log analysis, Oracle, Telecommunications, Web analytics

28 Comments

April 7, 2010

Thoughts on IBM’s anti-Oracle announcements

IBM is putting out a couple of press releases today that are obviously directed competitively at Oracle/Sun, and more specifically at Oracle’s Exadata-centric strategy. I haven’t been briefed, so I just have those to go on.

On the whole, the releases look pretty lame. Highlights seem to include:

Maybe a claim of enhanced data compression.
Otherwise, no obvious new technology except product packaging and bundling.
Aggressive plans to throw capital at the Sun channel to convert it to selling IBM gear. (A figure of $1/2 billion is mentioned, for financing.

Disappointingly, IBM shows a lot of confusion between:

Text data
Machine-generated data such as that from sensors

While both highly important, those are very different things. IBM has not in the past shown much impressive technology in either of those two areas, and based on these releases, I presume that trend is continuing.

Edits:

I see from press coverage that at least one new IBM model has some Fusion I/O solid-state memory boards in it. Makes sense.

A Twitter hashtag has a number of observations from the event. Not much substance I could detect except various kind of Oracle bashing.

Categories: Database compression, Exadata, IBM and DB2, Oracle, Solid-state memory

14 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

ITA Software and Needlebase

I’ll be speaking in Washington, DC on May 6

Greenplum et alia’s BigDataNews.com site

Aster Data’s mapreduce.org site

Introduction to Datameer

Story of an analytic DBMS evaluation

Greenplum Chorus and Greenplum 4.0

Is the enterprise data warehouse a myth?

Examples of machine-generated data

Thoughts on IBM’s anti-Oracle announcements

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin