May 17, 2012

Thoughts on “data science”

Teradata is paying me to join a panel on “data science” in downtown Boston, Tuesday May 22, at 3:00 pm. A planning phone call led me to jot down a few notes on the subject, which I’m herewith adapting into a blog post.

For starters, I have some concerns about the concepts of data science and data scientist. Too often, the term “data scientist” is used to suggest that one person needs to have strong skills both in analytics and in data management. But in reality, splitting those roles makes perfect sense. Further:

It may or may not make sense to say that a computer scientist is doing “science”; the term “data scientist” inherits that ambiguity.
It may or may not make sense to say that a corporate scientist is doing “science”; for example, a petroleum geologist might do very valuable work without making any scientific discoveries. The term “data scientist” inherits that ambiguity too.
Too often, people use the term big data as if it were something radically new, rather than a continuation of what has been done in large-scale analytic data management for decades. “Data science” has a similar problem.
The term “data science” sounds as if you need specialized academic training to do it, which isn’t really true.

The leader in raising these issues is probably Neil Raden.

But there’s one respect in which I think the term “data science” is highly appropriate. In conventional science, gathering data is just as much of an accomplishment as analyzing it. Indeed, most Nobel Prizes are given for experimental results. Similarly, if you’re doing data science, you should be thinking hard about how to corral ever more useful data. Techniques include but are not limited to:

Keeping data you used to throw away. This has driven a lot of growth in relational data warehouses and big bit buckets alike.
Bribing customers and prospects. Loyalty cards are the paradigmatic example.
Split testing. The more internet-based users you have, the more tests you can do.
Storing derived data. That can be as simple as pre-computing the scores from your predictive analytics model, or it can be as complex as running a 50-step sequence of Hadoop jobs.
Getting data from third parties, for example:
- Supply chain partners (right now this rarely amounts to more than simple BI, but that could change in the future).
- Data vendors of various kinds (e.g. credit bureaus).
- Social media/the internet in general, which also usually involves some kind of service provider.

Categories: Analytic technologies, Data warehousing, Predictive modeling and advanced analytics, Teradata

Subscribe to our complete feed!

Comments

4 Responses to “Thoughts on “data science””

Mark Stacey on May 17th, 2012 7:01 am

Great feedback – I do think one place “data scientist” is appropriate is the scientist who is now using tech to collect data and do analysis.

Not different from previously, except that with the ubiquity of sensors, gathering data about the physical world is easier.

In industrial processes, running your car, even a modern exercise monitor like the highend Polar, Garmin and Suuto : my Polar RS800CX has more instrumentation than my first car! (By count ~ 3 times as many)

Pulling in data from these different types of sensors, and then applying statistical analysis methods – that’s data *science*
R. Scott on May 17th, 2012 12:50 pm

I think the essance of data science is; the techniques and activities necessary to arrive at actionable insight.
Alex on May 17th, 2012 2:47 pm

Nice summary .I also think that in real life in many cases the collection/load/transformation and etc is actually done by “data engineers” ( that seems to be the term in fashion I guess ) and the analysis after that by “data scientist” but I could be wrong.
Thomas W Dinsmore on May 18th, 2012 9:38 am

“Data scientist” is how we refer to analysts who do not depend on user-friendly tools and vendor-defined OOTB “solutions”.

For the record, none of the generic techniques cited — from retaining data previously discarded, to leveraging experimental design, to leveraging third party data — are new. Technology, however, has advanced the frontier of what is commercially viable.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Thoughts on “data science”

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin