- SANs (Storage Area Networks) are pulling ahead of DAS (Direct Attached Storage).
- Much of the growth in storage is due to data warehousing.
- MPP (Massively Parallel Processing) is pulling ahead of SMP (Symmetric MultiProcessing) for high-end data warehousing.
- MPP architectures are commonly shared-nothing.
- Shared-nothing entails DAS.
But if you think about it, those facts don’t exactly add up.
The freshest take I have on the subject right comes from Vertica, with whom I met on Wednesday. It turns out that while Vertica initially thought DAS would be the only viable way to go, quite a few Vertica customers actually run SANs. A big Vertica installation these days is 10 nodes and 10-20 TB of user data, while a small one might be 1/10 that size. Within that range, SANs do just fine, as long as they have sufficient bandwidth, which commonly equates to 4 gigabit HBAs (Host Bus Adapters). One point Vertica noted is that SANs commonly have lots of cache and 15K RPM disks, which sounds higher-end than the storage hardware I usually hear about in DAS configurations.
Also interesting is CEO Jeff Vogel’s view over at Calpont. Jeff’s roots are in storage, and in response to my recent blog post mentioning storage issues he sent over a note I got permission to publish here. It says:
The storage blog underscores the increasing importance of efficient storage capacity and the impact that mixed workloads have on DW performance. It’s also front and center on the single biggest issue facing the growth prospects of the DW industry – Concurrency/User scalability. Storage efficiency and mixed workload variation have implications to cost . The single biggest cost issue in the data center is storage. It’s also a huge management cost issue. Energy consumption has only made the issue worse. Notwithstanding the issues associated with data access and availability, the proliferation of DW storage throughout the enterprise will bring business users and storage administrators together around a common storage services strategy. What emerges is a DW infrastructure strategy that incorporates storage management disciplines that have existed for years as part of a larger effort centered on Information Lifecycle Management (ILM) principles. The true benefits of ILM were only possible after companies went through the painful process of consolidating and centralizing their storage. Consolidation was made possible with the advent of the storage area network (SAN). There’s a reason why an overwhelming majority of all raw storage shipped into the enterprise is connected to a SAN.
From my perspective, the storage blog raises some important lessons for DW vendors and offers some interesting parallels. SANs made it possible for Unix and midrange OS servers to become part of the consolidated storage effort. As a result, we saw spectacular growth in applications attached to those servers. As a consequence, DAS retreated and is less than 20% of the market for warehouse and analytics. According to IDC over 50% of warehouse and analytics application storage in 2007 was midrange and is expected to top 60% within 5 years. The ‘Servers’ parallel here is Users. For DW applications to take off in a material way, we’ll need to solve the User scalability issues associated with cost, access and workload flexibility. Back to the “storage” future.
While the new DWA players have done a good job of addressing DBMS performance, scalability and costs, they don’t go far enough. Asset utilization, infrastructure flexibility and storage policies that govern data are fundamental concerns that DBMS players will need to contemplate at the architecture level. For example, while compression should be part of any solution, it’s only one storage cost dimension of a multi-faceted puzzle to asset utilization and, conversely, doesn’t fully address the broader issues mentioned in the blog. Up to now, these issues haven’t slowed DW sales, but you can hear the train in the distance.
We’re down a path on a crucial subject and we’ll be hearing a lot more from the storage players as to how the benefits of storage management will play out with DBMS solutions. As an industry, we’ll need to move beyond quid-pro-quo reference architectures and begin to think about how database behavior can help storage vendors lower costs, increase asset utilization, improve flexibility and in return, bring more Users into the game. A rising tide lifts all boats.