January 20, 2011

Notes, links, and comments January 20, 2011

I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)

First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.

The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.

One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more.

Speaking of blog problems, we’ve had performance/reliability glitches here again. Melissa Bradshaw determined that the problem was an apparently activated WP Super Cache not actually caching anything. We should be OK now, so please let me know if there are further difficulties. One interesting step — it turns out that there’s a WordPress plug-in that does automatic EXPLAINs (if you’re the blog administrator).

Another interesting Mike Stonebraker post can be found (at least for now) over on the VoltDB blog. He continued his assault on the CAP Theorem, arguing that availability is an exaggerated concern when there are bug- or other human-error-driven kinds of outages, and also arguing that the concept of “partition tolerance” is misguided. Commenters pushed back, pointing out that in geographically distributed scenarios, the CAP Theorem sense of partitioning is quite a legitimate concern.

When I posted an expansive definition of machine-generated data a few weeks ago, Daniel Abadi shot back advocating a narrower one (see the comment thread, which includes a link to his thoughtful post). The disagreement boils down to conflicting intuitions as to whether the machine-data/true-human-data ratio will keep growing rapidly, in hybrid cases such as web logs or social gaming.

Dave McClure recently offered a survey of hot startup investing themes. High on his list were location-based services, which is a reminder to us all that geo-spatial data is becoming much more important. Ray Wang is savvy enough to understand the privacy dangers location-based services cause, but influential though Ray is, his view will probably remain in the minority. Machine-generated data and video each also make appearances on Dave’s list.

And wait! I have even more links for you! Several are taken from Thomas Houston’s choices for The Best Tech Writing of 2010. He chose well. I recommend sampling his list further.

In an article about new electronic exchanges, the New York Times shared some numbers — 56% of trading volume “high speed” in stocks, 1/3 or so when looking at domestic futures, .1 milliseconds to do a NASDAQ trade, 13 milliseconds for a trade that involves Chicago/NYC communication, 60 milliseconds for NYC/Frankfurt. Slashdot offers photos and other context.
James Taylor caught up with once-hot KXEN, and evidently got the impression KXEN was focusing a lot of its efforts on the tedious, time-consuming data-preparation side of modeling.
Richard Tibbetts is being pretty funny on his blog.
(Slashdot) The Russian government seems to be getting into open source software in a big way. Well, PostgreSQL is already big in Russia (close to 1 million installations, I was once told), so this might conceivably add some energy to its development.
In Drupal 7, Drupal now has “a built-in test environment, version upgrade manager, and a database abstraction layer for use with MariaDB, SQL Server, MongoDB, Oracle, MySQL, PostgreSQL, and SQLite.” That may explain how MongoDB can hope to further penetrate the Drupal market.
The Boston Phoenix argues that government lacks the manpower, budget, and expertise to keep up with its responsibilities in preserving and exposing information. Fixing that problem sounds like a pretty worthy open source development effort to me.

Finally:

Clay Shirky reminded us that modern machine learning is what replaced old-style AI.
Nominally reviewing a book he obviously disdains, Garry Kasparov — in my opinion the most admirable world chess champion ever — surveyed computer chess in quick, nontechnical way. The whole thing is a bit wordy even so, so I’ll quote one part:

In 2005, the online chess-playing site Playchess.com hosted what it called a “freestyle” chess tournament in which anyone could compete in teams with other players or computers. … The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.

Categories: About this blog, Analytic technologies, Data warehousing, GIS and geospatial, Investment research and trading, MongoDB, OLTP, Open source, PostgreSQL, Vertica Systems

Subscribe to our complete feed!

Comments

4 Responses to “Notes, links, and comments January 20, 2011”

Joe on January 20th, 2011 1:40 pm

January 20, 2010?
Curt Monash on January 20th, 2011 5:14 pm

Yikes. Will fix typo!
Mike Pilcher on January 27th, 2011 11:24 am

Curt,

If you want to speak to someone about Vertica’s marketing you could call me. It appears that the Vertica team are afraid of debate when it comes to what should be in a CDBMS (Columnar Database Management System), it makes me wonder what they’re missing. Text Analytics, Right-time search? How do they scale their database for users without GBCC (Generation Based Concurrency Control)? Hmmm. You can read more about the debate that was here on SAND.com. (Though at this point it’s not so much a debate — that takes two and someone at Vertica blinked and ran for cover.)

Mike
Curt Monash on January 29th, 2011 6:00 pm

I don’t know about that, Mike. For all their failings, they still have a clearer story than you do.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes, links, and comments January 20, 2011

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin