I won’t add to the proliferation of “What’s a data scientist?” posts on the internet. I am as bored of them as you are. Instead I will point you to this Venn diagram from Drew Conway which nails it in my opinion. In my analytics career I aim for the center of this chart and so far I am doing reasonably well:
- Hacking skills: I’ve been coding in R regularly since 2011, SAS Base certified since 2013, proficient in Excel, capable in SQL, Tableau, MicroStrategy, Business Objects, SPSS Modeler and some GIS tools.
- Math & Statistics Knowledge: I have an engineering bachelor’s and MS, graduate certificate in statistics and an MS in business analytics. Credentials – check! Thankfully I have had the good fortune to put this knowledge to work including the development and application of decision trees, regression, association rule mining, k-nearest neighbors, k-means clustering, principal component analysis, random forests, genetic algorithms, hierarchical clustering, bootstrap, cross validation, logistic regression, naive Bayes classifiers, etc.
- Substantive Expertise: I have been lucky to get my hands on data in tax, insurance, cell phone, finance, energy, real property and healthcare industries to name a few. To obtain substantive expertise it is necessary to put in time in the real world. Books will only teach you so much.
I can largely cover all three circles in the Venn diagram, but I’ll never be comfortable referring myself as an expert in any one of them, never mind all three. Here’s why – Mark Little is a favorite journalist/entrepreneur of mine and I like his ten principles of social journalism, particularly number one: “There is always someone closer to the story.” I generalize this principle and constantly remind myself that there are always people with more knowledge than me in the above three areas. Call that humility or insecurity or whatever, but it keeps me on my game.