Teradata Columnar and Teradata 14 compression
Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by “Teradata 14” I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14’s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster Data (now part of Teradata).
The basic idea of Teradata Columnar is:
- Each table can be stored in Teradata in row format, column format, or a mix.
- You can do almost anything with a Teradata columnar table that you can do with a row-based one.
- If you choose column storage, you also get some new compression choices.
Categories: Archiving and information preservation, Columnar database management, Data warehousing, Database compression, Oracle, Rainstor, Teradata | 7 Comments |
Oracle Database Appliance soundbites
It turns out that Oracle’s new small appliance isn’t really an Exadata Mini-Me. Rather, the Oracle Database Appliance is — well, it seems to be a box with an Oracle DBMS in it. (Plus Oracle RAC and so on.) The whole thing is priced for and targeted at the SMB (Small & Medium Business) market, whatever that means to Oracle.
I’m not hugely optimistic about the Oracle Database Appliance. Rather, my thoughts — lightly edited from a chat with a reporter — include:
- This doesn’t solve Oracle’s SMB problems, which include:
- Oracle software is too difficult and costly to administer. The appliance will make a modest dent in that one, but it’s not any kind of game-changer, because the issues relate to the antique design of the Oracle DBMS. (I.e., I think ongoing database administration is a bigger deal than, say, one-time system set-up.)
- SMBs use third-party applications whenever they can, with an increasing preference for SaaS. Application and SaaS vendors prefer non-Oracle alternatives when they are feasible.
- Thus, Oracle is not well positioned to thrive in the SMB market … except maybe through its MySQL subsidiary, but that has a long way to go too.
- Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.
Categories: MySQL, Oracle, Software as a Service (SaaS) | 13 Comments |
XLDB: The one conference I like to attend
I’m not a big fan of conferences, but I really like XLDB. Last year I got a lot out of XLDB, even though I couldn’t stay long (my elder care issues were in full swing). The year before I attended the whole thing — in Lyon, France, no less — and learned a lot more. This year’s XLDB conference is at SLAC — the organization formerly known as the Stanford Linear Accelerator Center — on Sand Hill Road in Menlo Park, October 18-19. As of right now, I plan to be there, at least on the first day. XLDB’s agenda and registration details (inexpensive) can be found on the XLDB conference website.
The only reason I wouldn’t go is if that turned out to be a lousy week for me to travel to California.
The people who go XLDB tend to be really smart — either research scientists, hardcore database technologists, or others who can hold their own with those folks. Audience participation can be intense; the most talkative members I can recall were Mike Stonebraker, Martin Kersten, Michael McIntire, and myself. Even the vendor folks tend to the smart — past examples include Stephen Brobst, Jeff Hammerbacher, Luke Lonergan, and IBM Fellow Laura Haas. When we had a datageek bash on my last trip to the SF area, several guys said they were planning to attend XLDB as well.
XLDB stands for eXtremely Large DataBases, and those are indeed what gets talked about there. Read more
Categories: Data warehousing, Predictive modeling and advanced analytics, Scientific research | 5 Comments |
Exadata Mini-Me?
It is being suggested that Oracle is about to introduce small, (relatively) cheap Exadata boxes. Key quotes include:
We estimate a price point of $100K-$200K, well below Exadata prices of $500K-$2.5M.
and
- The Exadata could fit under a desk;
- Customers wouldn’t need a database admin to maintain the Exadata environment;
- The focus of the Exadata mini would be ease of management over running complex enterprise applications.
The whole thing sounds appealing, but I must confess that the idea of “zero-DBA” Oracle takes me aback. It might look OK at demo time, but I have trouble imagining it working in live production situations.
Categories: Exadata, Oracle | 14 Comments |
Are there any remaining reasons to put new OLTP applications on disk?
Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:
- 100s of gigabytes of data at first, growing to >1 terabyte over time.
- High peak loads.
- Public cloud portability (but they have private data centers they can use today).
- Simple database design — not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.
- Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)
So I’m leaning to saying: Read more
The database architecture of salesforce.com, force.com, and database.com
salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That’s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as:
- A long-ago marketing decision to not give infrastructure details, so as to convey a “Don’t worry; we’ll take care of everything” message.
- Even so, a long-ago and perhaps now-regretted marketing decision to disclose and even exaggerate salesforce.com’s reliance on Oracle, as part of an early-days attempt to prove salesforce was using enterprise-class technology.
- A desire to hide the recipe for salesforce.com’s secret sauce.
- Force of habit — I’m not sure salesforce even knows how to tell its technical story with any clarity.
Actually, salesforce.com has moved some kinds of data out of Oracle that previously used to be stored there. Besides Oracle, salesforce uses at least a file system and a RAM-based data store about which I have no details. Even so, much of salesforce.com’s data is stored in Oracle — a single instance of Oracle, which it believes may be the largest instance of Oracle in the world.
Categories: Data models and architecture, Market share and customer counts, Memory-centric data management, Object, OLTP, Oracle, salesforce.com, Software as a Service (SaaS) | 19 Comments |
salesforce.com, force.com, database.com, data.com, heroku.com — notes and context
As previously noted, I attended Dreamforce, the user conference for my clients at salesforce.com. When I work with them, I focus primarily on database.com and related businesses. I’ve had to struggle a bit, however, to sort out the various pieces, and specifically the differences among:
- salesforce.com. This is the parent company, and the runaway leader in the SaaS (Software as a Service) enterprise application market, especially in the area of CRM (Customer Relationship Management).
- force.com. This is salesforce.com’s application development stack split out for other SaaS vendors to use, both inside and outside the CRM segment. It can be referred to as a PaaS offering (Platform as a Service). force.com relies on a proprietary salesforce.com language called APEX, which has a strong stored procedure/ database trigger orientation.
- database.com. This is the database part of force.com, spun out separately in general availability as of Dreamforce two weeks ago.
- data.com. Also launched at Dreamforce (and based, if I understand correctly, on an acquisition), this is a provider of 3rd-party data you might use as inputs to your CRM systems.
- Heroku. Another salesforce.com acquisition, Heroku is in essence a PaaS competitor to force.com. Heroku is focused on Ruby and Java, and supports a number of DBMS, SQL and NoSQL alike.
- AppExchange. This is a marketplace for things designed to integrate with salesforce.com (and perhaps also apps built on force.com). The latest claim is that there are 1200+ AppExchange offerings.
- The complete set of SaaS apps built on force.com. A 2008 white paper refers to 47,000 organizations being “supported” by force.com. Recently I’ve heard a figure just under 100,000. I’m not clear as to what that metric measures — aggregate users of SaaS apps built via force.com? Clearly there are a lot of SaaS apps built on force.com, with actual customers, but I don’t know how big “a lot” is. (Perhaps a salesforce.com person could chime into the comment thread with some clarity.)
Categories: Market share and customer counts, Pricing, salesforce.com, Software as a Service (SaaS) | 2 Comments |
Kaminario goes (mainly) flash
Kaminario, which used to be in the business of solid state storage via DRAM, now is emphasizing hybrid DRAM/flash storage appliances instead. The reason is evidently price. Per terabyte of primary storage (before mirroring onto disk and so on):
- A Kaminario K2 DRAM-only appliance costs $100K.
- A Kaminario K2 flash-only appliance costs $30K (but nobody buys that configuration).
- A typical Kaminario K2 hybrid DRAM/flash appliance might cost $35K (which tells us that there’s a lot more flash than DRAM).
Kaminario positions DRAM as where you focus your most write-intensive/ bottlenecking loads, such as logging or temp space, with the primary benefit being performance and a secondary benefit being slowing the wear on your flash.
Categories: Kaminario, OLTP, Pricing, Solid-state memory | 6 Comments |
Hadoop notes
I visited California recently, and chatted with numerous companies involved in Hadoop — Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I’ll defer further Hadoop technical discussions for now — my target to restart them is later this month — but that still leaves some other issues to discuss, namely adoption and partnering.
The total number of enterprises in the world paying subscription and license fees that they would regard as being for “Hadoop or something Hadoop-related” probably is not much over 100 right now, but I’d expect to see pretty rapid growth. Beyond that, let’s divide customers into three groups:
- Internet businesses.
- Traditional enterprises ‘ internet operations.
- Traditional enterprises’ other operations.
Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of machine-generated data, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play — web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it’s one area of scientific research that actually enjoys fat for-profit research budgets.
Categories: Cloudera, Hadoop, Health care, Hortonworks, Investment research and trading, Log analysis, MapR, MapReduce, Market share and customer counts, Scientific research, Web analytics | 5 Comments |
“Big data” has jumped the shark
I frequently observe that no market categorization is ever precise and, in particular, that bad jargon drives out good. But when it comes to “big data” or “big data analytics”, matters are worse yet. The definitive shark-jumping moment may be Forrester Research’s Brian Hopkins’ claim that:
… typical data warehouse appliances, even if they are petascale and parallel, [are] NOT big data solutions.
Nonsense almost as bad can be found in other venues.
Forrester seems to claim that “big data” is characterized by Volume, Velocity, Variety, and Variability. Others, less alliteratively-inclined, might put Complexity in the mix. So far, so good; after all, much of what people call “big data” is collections of disparate data streams, all collected somewhere in a big bit bucket. But when people start defining “big data” to include Variety and/or Variability, they’ve gone too far.