The Fallacy of the Data Scientist's Venn Diagram
Practical walkthroughs on machine learning, data exploration and finding insight.
Originally published on Linkedin
Well, fallacy may be a strong word, how about incomplete? I’m talking about the Venn diagram that depicts the skills needed to be a data scientist. I think Drew Conway was the first to draw this and, since, hundreds of variations have followed. The one here is pretty close to the first one I saw.
It broke down the job into three distinct skills: data, statistics, and programming. I always liked that one. I never considered myself best at any one but certainly good enough at all three. As a new entrant in the field, such chart gave me the hope and courage that I could make it.
Today, after a few years as a data scientist, this is how I would draw it:
Not as symmetrical as the original and a big shift in responsibilities!
Data Scientists Are Always Selling
Data scientists are always selling and I have been selling since day one. This isn’t by choice or as a natural progression up the career ladder, but because its a brand new field that is continuously reinventing itself.
I’ve been a data science team founder, consultant, trainer, mentor, etc, all the while producing work described in the three original circles. If you’ve been at it for a few years, you’re top leadership, willingly or unwillingly. I’ve worked in environments where my customer sat 3 cubicles down all the way to those across the country spending 100k to kick-start their own data science team. The selling never abates.
We’ll start with the obvious - selling to the customer is a big one! You have to convince them to hire your services, this means explaining what data science can do for their needs and sometimes what data science is. External or internal customers, same deal. They know about their problems, but don’t always know what a machine learning or AI solution would look like. Once you have done your due diligence, built a model, you have to convince them to actually use what they paid for (yes, this does happen).
Then you have to explain it all over again with actual field users - they’re rarely involved in the development phase. You need to work with them using a different language. They’re the ones that will say ‘there goes my job’ in a joking but nervous tone. Besides explaining how it works, you need to reassure them it won’t take their job away but instead make them much, much better at it. All that is selling…
But it doesn’t end there, far from it. Internally, the job requires many tools. They change all the time - few know how to use them, fewer know how to interpret them, and even less know how to ply them into practical working pipelines. For that reason, we’re always called to explain, justify, breakdown the choices made in order to convince co-workers, managers, the C-suite, etc. about what the heck is going on here. This is actually a serious responsibility. What tools and techniques you advocate will invariably affect a department’s or company’s business analytics, warehousing practices, web-serving platforms and even who will be hired/laid off next. More selling…
And finally, we are continuously selling to ourselves - from the right to be called a Data Scientist, the need to defend the title, convincing oneself that we can keep up with the Cambrian explosion of tools and techniques (think feature engineering in the age of deep neural networks, or relational databases in the age of distributed computing, and all things open source). Harder to sell, but I do it all the time, is telling myself that I am clever enough to stay a step ahead of AI and its potentially nefarious effect on my profession, the people I work for, and the world. Selling, selling, selling.