My California trip last week focused mainly on software — duh! — but I had some interesting hardware/storage/architecture discussions as well, especially in the areas of:
- Rack- or data-center-scale systems.
- The real or imagined demise of Moore’s Law.
I also got updated as to typical Hadoop hardware.
If systems are designed at the whole-rack level or higher, then there can be much more flexibility and efficiency in terms of mixing and connecting CPU, RAM and storage. The Google/Facebook/Amazon cool kids are widely understood to be following this approach, so others are naturally considering it as well. My most interesting of several mentions of that point was when I got the chance to talk with Berkeley computer architecture guru Dave Patterson, who’s working on plans for 100-petabyte/terabit-networking kinds of systems, for usage after 2020 or so. (If you’re interested, you might want to contact him; I’m sure he’d love more commercial sponsorship.)
One of Dave’s design assumptions is that Moore’s Law really will end soon (or at least greatly slow down), if by Moore’s Law you mean that every 18 months or so one can get twice as many transistors onto a chip of the same area and cost than one could before. However, while he thinks that applies to CPU and RAM, Dave thinks flash is an exception. I gathered that he thinks the power/heat reasons for Moore’s Law to end will be much harder to defeat than the other ones; note that flash, because of what it’s used for, has vastly less power running through it than CPU or RAM do.
Otherwise, I didn’t gain much new insight into actual flash uptake. Everybody thinks flash is or soon will be very important; but in many segments, folks are trading off disk vs. RAM without worrying much about the intermediate flash alternative.
I visited two Hadoop distribution vendors this trip, namely the ones who are my clients – Cloudera and MapR. I remembered to ask one of them, Cloudera, about typical Hadoop hardware, and got answers that sounded consistent with hardware trends Hortonworks told me about last August. The story is, more or less:
- The default assumption remains $20-30K/node, 2 sockets, 12 disks. (Edit: See lively price discussion in the comments below.)
- Most hardware vendors have standard/default Hadoop boxes by now, and in many cases customers just buy what’s on offer.
- The aforementioned disks sometimes get up to 4 terabytes now.
- 128GB is now the norm for RAM. 256GB is common. Higher amounts are seen, up to – in rare cases – 2-4 TB.
- Flash is of interest, but isn’t being demanded much yet. This could change when flash’s storage density matches disk’s.
- Flash interest is highest for Impala.
Cloudera suggested that the larger amounts of RAM tend to be used when customers frame the need as putting certain analytic datasets entirely in RAM. This rings true to me; there’s lots of evidence that users think that way, and not just in analytic cases. This is probably one of the reasons that they often jump straight from disk to RAM without fully exploring the opportunities of flash.