Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Teradata decides to compete head-on as a data warehouse appliance vendor
In a press release today that is surely timed to impinge on the Netezza user conference news cycle, Teradata has come out swinging. Highlights include:
- Teradata, which long avoided the “appliance” term, now says it sells both “data warehouse appliances” and “data mart appliances.” Indeed, it claims to have “invented the original appliance” — which is pretty close to being true.*
- Teradata claims its “new appliance easily delivers up to 5 to 10 times performance improvement over competitors’ appliances,” at $119,000 per terabyte US list price.
- Teradata claims a 150% faster “scan rate” than competitors. Teradata is surely thinking of Netezza when saying that.
- Teradata claims 10X performance improvement on “selected queries” vs. the “competition.”
- Teradata thinks its geospatial data management capability is better than competitors’, and that this is an important indicator of Teradata’s general overall greater sophistication.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, GIS and geospatial, Netezza, Teradata | 4 Comments |
Infobright update
In connection with the announcements that:
- Infobright is open sourcing its analytical DBMS product (which is a really good idea)
- Infobright raised a $10 million VC round, with Sun as a new investor
I got my first real Infobright update since January. Highlights included: Read more
Categories: Data warehousing, Infobright, MySQL, Open source | 2 Comments |
Infobright’s open source move has a lot of potential
Infobright announced today that it’s going full-bore into open source – specifically in the MySQL ecosystem — with the licensing approach, pricing, distribution strategy, and VC money from Sun that such a move naturally entails. I think this is a great idea, for a number of reasons: Read more
Categories: Columnar database management, Data warehousing, Infobright, MySQL, Open source | 4 Comments |
Infobright goes open source — sound bites
As has recently become my custom when there is industry news, I herewith provide quotable sound bites about Infobright and its move to an open source strategy. Weather permitting, I’ll be on a plane to the Netezza conference this afternoon. And I’ve only slept about 10 hours since Thursday. So I hope these suffice, although if they don’t and you email me I’ll try to respond by some time Tuesday morning.
- For almost anybody in the MySQL world who needs high-performance analytics, Infobright is the first good solution.
- Infobright’s product strengths and use cases are a great match for open source.
- Most leading analytic DBMS have open source roots, but they generally haven’t been open sourced themselves. Infobright immediately becomes one of the premier open source analytic database offerings. The only serious open source rival that’s coming to mind is MonetDB.
- Storage engines are MySQL’s achilles heel. Each good MySQL storage engine is precious.
- Infobright has enough production references to show that it can get the job done for many data mart uses. It won’t meet everybody’s needs, but it’s well worth an experimental download.
- If you want to build a little data mart and run it yourself, most good products are too complicated or expensive. But in the right use cases, Infobright pretty much runs itself, and there’s no arguing with the Community Edition price (free).
- So Infobright is a great fit for the individual downloader – i.e., for the stereotypical open source user.
- Netezza, DATAllegro, Vertica, ParAccel, Greenplum, and Aster Data are all based in one way or another on PostgreSQL (even though Vertica includes no PostgreSQL code). DATAllegro was based on Ingres. Infobright and Kickfire are based on MySQL.
- If Infobright doesn’t get the job done, try downloading Vertica, which – while closed source – is also free for download and development.
- The “rough set” part of Infobright’s story is a lot of mumbo-jumbo, but the “knowledge grid” part is more real.
- When you compare Infobright to Teradata, Netezza, Greenplum, or even Vertica, it’s kind of a toy. But when you compare it to generic MySQL, it’s more like rocket science.
- Infobright was too little, too late in the mainstream analytic DBMS market. They had to do something different. Kudos to them for recognizing that.
- The Infobright product has some serious limitations. If you want a market that’s willing to adopt a DBMS with serious limitations, the MySQL world is the place for you.
Posts today on open source DBMS
- Infobright’s smart move to open source
- General Infobright update
- Infobright sound bites
- The many faces of open source DBMS
Categories: Data warehousing, Infobright, MySQL, Open source | 3 Comments |
How will SSDs get incorporated into data warehousing?
SSDs (Solid-State Drives) have gotten a lot of recent attention as an eventual replacement for spinning disk. I haven’t researched expected timelines in detail, but George Crump offered a plausible scenario recently in a highly visible Information Week blog post. After the great recent (and still ongoing!) discussion in the SAN vs. DAS comment thread, I’d like to throw some questions out for discussion, including:
- Just how much faster than disk will SSDs be than disk for random reads?
- Will SSDs be faster or slow than disk for sequential reads, and by how much?
- What will the speed comparison be on SSDs between sequential and random reads?
- How many times will it be possible to write to an SSD? Will this be a problem?
- Will DBMS — which today invariably assume that storage is homogeneous — need to take account of storage heterogeneity?
- What are the implications of SSDs for database and DBMS architecture?
I commented on some of these issues a year ago. Now it’s your turn. 🙂
Categories: Data warehousing, Solid-state memory, Storage | 5 Comments |
Some Netezza customer metrics
From the conference call based on Netezza’s July, 2008 Q1, as of the end of Q1:
- There are now 191 Netezza customers.
- 18 of those were new.
- 78% of Netezza’s business was in North America and 22% was international.
- Netezza operates in 10 countries.
- “The top 4 vertical markets represented approximately 75% of our business, with those markets being telcos, retail, financial services, and the analytic service provider segment. “
- One analytic service provider was greater than 10% of revenue for the quarter, and is expected to keep buying a lot in subsequent quarters. Also, one analytic service provider standardized on Netezza. I’m guessing that’s the same customer.
- “We ended the quarter with 45 [quota] carrying teams made up of a sales rep and a systems engineer and our plan is to continue to hire direct sales teams at the pace of 3 to 5 per quarter every quarter. These direct reps accounted for 85% of the business while the indirect activity was 15% this quarter.”
Categories: Application areas, Data mart outsourcing, Data warehouse appliances, Data warehousing, Market share and customer counts, Netezza, Telecommunications | 1 Comment |
Teradata’s major vertical markets in 2007
From a May, 2008 earnings conference call transcript:
- telecommunication, media and entertainment industry is 28%;
- financial services is 24%;
- retail is 19% of our revenues last year;
- manufacturing 9%;
- government 7%;
- travel and transportation 6%;
- and healthcare 5%.
Categories: Application areas, Data warehouse appliances, Data warehousing, Telecommunications, Teradata | Leave a Comment |
Teradata/Netezza/Tesco kerfuffle
Netezza evidently put out a press release bragging of a competitive replacement of Teradata at UK retailing giant Tesco. That press release cannot be now found on Netezza’s site, but it lives on elsewhere. Meanwhile, Teradata has put out a press release in which Tesco is quoted emphatically contradicting what it is quoted as saying in the Netezza press release. While I haven’t discussed this with Netezza, my guess is that somebody there got a little overenthusiastic in advance of their user conference next week and thought they’d gotten a permission they really hadn’t.
Beyond that, I’d note that the Netezza quote made reference to around 25 heavy analytical users, while the Teradata quote talked of 8000 people across more than 2000 suppliers.
Categories: Data warehouse appliances, Data warehousing, Memory-centric data management, Netezza, Oracle, Specific users, Teradata | 2 Comments |
Mike Stonebraker’s counterarguments to MapReduce’s popularity
In response to recent posting I’ve done about MapReduce, Mike Stonebraker just got on the phone to give me his views. His core claim, more or less, is that anything you can do in MapReduce you could already do in a parallel database that complies with SQL-92 and/or has PostgreSQL underpinnnings. In particular, Mike says: Read more
Categories: Data warehousing, MapReduce, Michael Stonebraker, PostgreSQL | 5 Comments |
More data on data warehouse sizes and issues
I spoke today with Paul Barth and Randy Bean of consultancy NewVantage Partners. The core of NewVantage’s business seems to be helping large enterprises (especially financial services) with their data warehouse strategies. Takeaways — none of which should shock regular readers of DBMS2 — included:
- Administrative cost and difficulty are often the single biggest issue in selecting analytic DBMS products.
- Oracle hits a wall around 10 terabytes of user data. The one customer NewVantage can think of with an Oracle data warehouse over 10 terabytes is fleeing Oracle for Netezza.
- NewVantage says that very specialized data warehouses on Oracle could conceivably be larger than that.
- NewVantage does have a customer on DB2/UDB in the 30-40 terabyte range. That customer does a lot of careful tuning to make it work.
- About 15% of NewVantage’s customers use Netezza. Few if any use newer analytic DBMS (but I got the sense more will soon). The rest rely on “traditional” DBMS, a group that includes Teradata.
Categories: Data warehousing, IBM and DB2, Netezza, Oracle | 1 Comment |