Scientific data sharing
I’ve been posting recently about some issues in scientific data management. One topic I haven’t addressed yet is policies around data sharing. Generally:
- Scientists, like other academics, have their research judged largely on the basis of their published papers.
- The data scientists capture benefits scientists’ careers mainly by informing and being used in their published papers.
- Scientists are correspondingly uninterested in, if not actively opposed to, sharing their data with the rest of the world
- Promptly (for the data they use to directly support their publications)
- Perhaps ever (for the rest of the data)
On the other hand, it’s blindingly obvious that the world as a whole would be better off with widespread scientific data sharing, provided that making data “free” doesn’t significantly undermine scientists’ incentives to capture it in the first place. And institutions such as funding agencies are taking note. Thus:
Scientific data management technology should be suitable for either of the scenarios:
- Data is widely shared among scientists.
- Data is jealously guarded by the scientists who first gather it.
Biologists, it seems are furthest along in sharing data. But they’ve had some drama about that recently. My very incomplete knowledge includes:
- At XLDB, it was said that in some areas of biology — and perhaps in some journals? — it was required that you make your data available to get a paper published.
- The NIH (National Institutes of Health) often requires or at least encourages data sharing as a condition of funding.
- A common practice is for data to be shared immediately, but for anybody except the scientists who gathered it to be prohibited from using it in a paper for a 12-month embargo period.
- There was a recent kerfuffle as an embargo was broken, on data residing in the NIH-sponsored genomics data repository dbGaP.
Nature Magazine seems to have had a recent issue focused on data sharing. My favorite link from that page is this comment thread.
Comments
7 Responses to “Scientific data sharing”
Leave a Reply
[…] Scientific data sharing Categories: Analytic technologies, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Facebook and Cassandra, Hadoop, Open source, SciDB, Scientific research, Specific users Subscribe to our complete feed! […]
I must point out that your discussion is American-centric whereas it shouldn’t. If the UK (say) funding agencies require data sharing (and they do), this changes the game for everyone, including the Americans. I don’t see people sharing the data just “nationally”.
As an aside, I recently wrote a book chapter which has some relevance here:
On the Challenges of Collaborative Data Processing
http://arxiv.org/abs/0906.0910
In this chapter, we ask “collaborative text editing lead to Wikipedia, where can collaborative data processing lead?”
At the very least, I feel that we are asking the right question. (As to answering it, it gets tougher.)
[…] Scientific data sharing (dbms2.com) […]
I agree with Daniel Lemire that this should very much be a global issue, although many of the response are indeed inevitably going to be nationally based. You can check out the Australian National Data Service (ANDS) to find out about a very large initiative in this area that is happening in my part of the world. http://ands.org.au/about-ands.html
[…] Scientific data sharing (dbms2.com) […]
[…] Scientific data sharing (dbms2.com) […]
[…] Scientific data sharing (dbms2.com) […]