Amazon and its cloud
Analysis of Amazon’s role in database and analytic technology, especially via the S3/EC2 cloud computing initiative. Also covered are SimpleDB and Amazon’s role as a technology user. Related subjects include:
Some NoSQL links
I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I’m poking around a bit reading stuff on the subjects. Here are some links I found.
- A little over a year ago, Julian Browne put up a great post on Eric Brewer’s CAP conjecture/theorem, which provides much of the impetus to relax the traditional requirement for atomicity/consistency.
- Even more directly inspirational to NoSQL technology development were two seminal papers: Google’s on BigTable and Amazon’s on Dynamo. (That said, I’m having trouble getting myself to actually read them from start to finish, especially since they’ve been superseded by subsequent technology development.)
- 10gen (the MongoDB guys) hosted a NoSQL conference yesterday. Much blogging has ensued. The best post I’ve seen so far was by Adam Marcus. I find the graph database notes near the bottom particularly interesting.
- Mark Callaghan hit back against the NoSQL movement hype, and in particular against the MySQL/memcached is passe‘ meme. On the other hand, he also bemoaned many failings of MySQL. On the third hand, he praised or at least expressed hope for a variety of MySQL-related technologies, including Tokutek’s TokuDB and Continuent’s Tungsten.
- In connection with that debate, Mark Rendle offered a funny rant, mainly pro-NoSQL, in the style of a Socratic dialogue.
- John Quinn of Digg recently described Digg’s move from MySQL to Cassandra, and outlined a lot of features Digg was adding to Cassandra, all of which it is open-sourcing.
- The NoSQL guys maintain their own long list of NoSQL-related links.
| Categories: Amazon and its cloud, Cassandra, Continuent, Google, MySQL, NoSQL, Open source, RDF and graphs, Tokutek | 5 Comments |
Introduction to Gooddata
Around the end of the Cold War, Esther Dyson took it upon herself to go repeatedly to Eastern Europe and do a lot of rah-rah and catalysis, hoping to spark software and other computer entrepreneurs. I don’t know how many people’s lives she significantly affected – I’d guess it’s actually quite a few – but in any case the number is not zero. Roman Stanek, who has built and sold a couple of software business, cites her as a key influence setting him on his path.
Roman’s latest venture is business intelligence firm Gooddata. Gooddata was founded in 2007 and has been soliciting and getting attention for a while, so I was surprised to learn that Gooddata officially launched just a few weeks ago. Anyhow, some less technical highlights of the Gooddata story include: Read more
Sneakernet to the cloud
Recently, Amazon CTO Werner Vogels put up a blog post which suggested that, now and in the future, the best way to get large databases into the cloud is via sneakernet. In some circumstances, he is surely right. Possible implications include:
- When sending data to the cloud, you probably want to compress it to the max before sending. Clearpace’s new RainStor structured-data archiving service emphasizes that idea. RainStor marketing says cloud, cloud, cloud — but Clearpace thinks you really should have a bit of its software onsite too, to compress the data before sending it across the wire.
- Getting data from one cloud to another cloud could be problematic. I’m fond of saying that weblog data naturally lives in the cloud at your hosting company’s location, so you should analyze it there too. But this makes the most sense if you analyze it or at least filter/reduce it in place. (That said, the really, really big web companies have lots of different data centers, and presumably do move huge amounts of log data from place to place.)
But for one-time moves of data sets — sure, sneaker net/snail mail should work just fine.
| Categories: Amazon and its cloud, Cloud computing, Database compression, EAI, EII, ETL, ELT, ETLT, Web analytics | 2 Comments |
Maybe Amazon should be using a real DBMS after all
Amazon managers found that an employee who happened to work in France had filled out a field incorrectly and more than 50,000 items got flipped over to be flagged as “adult,” the source said. (Technically, the flag for adult content was flipped from ‘false’ to ‘true.’)
“It’s no big policy change, just some field that’s been around forever filled out incorrectly,” the source said.
Amazon employees worked on the problem well past midnight, and then handed it over to an international team, he said.
This was the best practice for reversing an error — how? Is SimpleDB somehow implicated? If this story is remotely true, and if there’s a sensible database architecture, I can’t imagine why there wouldn’t be a faster fix.
| Categories: Amazon and its cloud | 7 Comments |
Amazon Elastic MapReduce
Amazon is introducing a beta of Amazon Elastic MapReduce. What it boils down to is cheap, on-demand Hadoop.
This seems like a great way to experiment with MapReduce and see if you like it. But for serious use, I don’t know why you wouldn’t prefer MapReduce more closely integrated into a DBMS.
| Categories: Amazon and its cloud, Cloud computing, MapReduce | 1 Comment |
April Fool’s Day highlights
Amazon says it’s taking “cloud” computing to new heights, as it were.
Derivative funds and large government-subsidized entities will be especially interested in FACE’s transmodal operation. They can allocate a dedicated FACE, load it up with data, and then send it out to sea to perform advanced processing in safety. The government will have absolutely no chance of acting against them, because they will be too busy trying to decide which Federal Air Regulation (FAR) was violated, not to mention scheduling news conferences.
First excellent April Fool’s joke I saw this year was from The Guardian. The best so far is from Expedia. Others are linked in my Twitter feed. And personally, I’m encouraging the concept of April No-Fooling Day.
| Categories: Amazon and its cloud, Cloud computing, Humor | Leave a Comment |
Oracle announces an Amazon cloud offering
Per the Amazon Web Service Blog, Oracle announced that Oracle can be run in the Amazon cloud (i.e., on EC2, with EBS for persistent storage). Clustering is probably weak, however — e.g., there’s no RAC support, as per Oracle’s well-written FAQ. Perhaps not coincidentally, the FAQ seems to suggest that the primary use case at this time is for backup, and backup is generally a major point of emphasis on Oracle’s cloud computing page.
Of course, another use case could be development, but that depends in part on pricing. Of course, whether Oracle’s offering seems attractively priced compared with, for example, a similar one from EnterpriseDB and Elastra depends a lot on whether you’ve already negotiated an unlimited-use license for Oracle.
James Kobielus, who presumably was pre-briefed, has more to say.
| Categories: Amazon and its cloud, Cloud computing, Oracle | 1 Comment |
EnterpriseDB joins Elastra in the Amazon cloud
When Elastra announced their service to host MySQL and PostgreSQL in the Amazon S3/EC2 cloud, I immediately told my dear darling clients at EnterpriseDB they should do the same. Whereupon they told me it would happen soon. However, they neglected to tell me when it was actually announced. So I know no more than can be found in this Computerworld article.
But I’ll say this — it’s a very tempting option, both for new web-based applications or businesses, or simply as a development platform pending later redeployment.
| Categories: Amazon and its cloud, Cloud computing, Elastra, EnterpriseDB and Postgres Plus, Mid-range, OLTP, Open source, Software as a Service (SaaS) | 2 Comments |
Variants on SimpleDB
Ralf describes SimpleDB, a project for an open source/desktop equivalent, a .NET version, and so on. Who knew that there was so much need for a database manager that could easily lose your data forever (with simple programming errors) and that is a lead-pipe cinch to repeatedly misplace it for a while (the built-in latency issues)?
To wit: Read more
| Categories: Amazon and its cloud | 1 Comment |
Elastra – somewhat more sensible Amazon-based DBMS option
Elastra is a startup offering MySQL and PostgreSQL SaaS instances in the Amazon S3/EC2 cloud. On their board is John Hummer, which I generally regard as a good thing, although it’s hardly a guarantee of success.* High Scalability raises some doubts about Elastra’s pricing, but I think that may be missing the point. Read more
