With data science being christened “The Sexiest Job of the 21st Century” by none other than the famous Harvard Business Review, job website Glassdoor naming it the #1 job in America for the second year in a row, and well-known IT leader IDC Technologies not only predicting a need for 180,000 people with deep analytics skills by 2018 but that our field will be a $203 billion industry by 2020, data science is the hottest field in tech today. Additionally, beginning data scientists report being passionate about the field, the theory of machine learning, and the vast array of applications. It seems like nearly everyone with skills in statistics, programming, and business acumen is looking to break into the field. But behind all the excitement is a growing tension between newcomers and those who already hold data science jobs. Beginning data scientists are eager to join the workforce and show what they can do. Established data scientists are wary about the skillset and experience of all these would-be data scientists and are demanding an awful lot before giving their seal of approval to candidates for open positions in their company.
The Sometimes Dubious Quality of Data Science Education
To be fair, the incredible demand for data science has resulted in scores of opportunities to become a data scientist in short order cropping up, some of dubious quality. Statistics and machine learning books are re-titling and re-branding themselves as data science. Universities are scrambling to offer courses and full degrees in the field. Online educational portals, such as Coursera and Udemy, are filled with instruction targeted toward data scientists. And then there are full-fledged, multi-month bootcamps that charge five-figures with the promise of not only teaching students data science but also helping them land their first job. Sadly, these offerings vary wildly in quality. Anyone with basic video recording capabilities can create a course on Udemy. In fact, I once took a course on text mining that I recognized was simply a video version of an example problem from a text on data science. His code was literally a line-by-line reproduction of what was in this book. And while Coursera offers instruction from professors at prestigious universities, often times they come from those who are skilled researchers and not-so-skilled instructors. While beginning data scientists flock to these offerings, eager to develop and improve their skills, existing scientists oftentimes look upon them as unsatisfactory at best and predatory at worst. A common complaint is that these courses and books are not teaching data science; they are teaching machine learning. Machine learning is an important part of data science, to be sure, but it is only a part. The problem is two-fold. First, machine learning is the exciting—dare I say “sexy”—part of data science. The rest is brutally hard work at times. Second, the non-machine-learning aspects (e.g., data acquisition, casting a business problem as a data science one) are difficult to teach. And what does it say about someone if they take an online course, anyway? Students who take courses after a full day of work argue that this represents an above-and-beyond effort on their part. Existing data scientists, oftentimes holders of advanced degrees, sometimes look upon these students as “dabblers” in the field, or trying to break in without the necessary skillset.
How Much Do You Really Need To Know To Be A Data Scientist?
Adding to the tension between the “haves” and “have nots” (or “not yet haves”) are the job descriptions. These oftentimes read like wish lists, with employers seeking candidates with an impressive array of skills, expertise, and experience. Beginning data scientists are oftentimes confused about the difference between machine learning and data science, largely caused by the limited scope of instruction (as mentioned above). “Why isn’t a BS degree good enough to get an entry-level position?” They wonder. “After all, one can get a degree as a Mechanical Engineer without five years’ experience.” Further, if there is such a demand for data scientists, why are employers being so picky? “Because data scientists need to have a broad skillset and they command high salaries,” is one common response. “Simply taking a few Coursera courses doesn’t suddenly make you a data scientist.” This backlash has led to the term “fake data scientist” — a term I find distasteful. Employers are biased towards applicants with advanced degrees, a requirement that beginning data scientists don’t fully understand. The mathematics and statistics required for using machine learning techniques are pretty rudimentary. So why an advanced degree? Because these demonstrate an ability to tackle projects. And I don’t mean the “capstone projects” you encounter in courses. A Masters’ thesis or Doctoral dissertation represents a considerable investment of effort over months and an ability to perform many of the tasks of data science that aren’t taught in classes. These projects also need to be novel. You can’t simply do the tired “sentiment analysis of tweets” analysis that everyone seems to have tackled at one point. Facebook groups, LinkedIn news feeds, and question-and-answer sites like Quora and Reddit are filled with beginning data scientists asking how much they really need to know. Ask five data scientist this question and you’re bound to get five different answers. In fact, data science blogger Erin Shellman once lamented that one needs to know “everything about all the disciplines” to be adequately prepared for interviews. So what is a beginning data scientist to do?
Let’s Give Beginning Data Scientists A Fair Break
While some practicing data scientists may see the large number of candidates looking to break into our field as something akin to an onslaught of the unwashed hordes—and I swear I must get unsolicited resumes from would-be data scientists every day—I think we need to cut these newcomers a break and start considering whether they do indeed have the basic skills needed to do the day-to-day work of a junior data scientist (or can acquire them quickly). Like I said before, the level of math needed to understand machine learning algorithms is pretty low. Anyone graduating with a Bachelor’s degree in a scientific field has enough knowledge. The vast majority of our day-to-day work is data munging and that certainly doesn’t require an advanced degree. What is required is experience on long-term, complex projects so, for you job seekers out there, make sure you have some of these listed on your resume and LinkedIn profiles. I’ve heard it said that the most necessary skill for a data scientist is curiosity since our entire field is about digging through mountains of data to extract insights. To that, I would add the skills of work ethic, a healthy level of skepticism, and excellent communication skills (written, oral, and interpersonal). These are oftentimes labeled “soft skills”. Label notwithstanding, these should be considered as critical as the typical hard, technical skills. Job seekers, again, take note. You won’t learn this stuff from a Coursera Specialization but you’d better be able to demonstrate this. The dam that has kept junior data scientists out is breaking and I think this is a good thing. And I’m excited and proud to be playing a role in this revolution. Beginning data scientists shouldn’t have to suffer through a maze to get a rewarding job. That’s why I operate this blog and recently created my flagship course, BreakIntoDataScience.com. The course is a one-stop-shop, end-to-end system on how to get hired. If companies won’t relax the gates and start letting beginning data scientists in, at least I can show them how they can climb the wall. And, quite frankly, the world needs you. Like IDC says, there’s a huge future ahead of us. Let’s see some unity in our field and encouragement and mentorship of the next generation of data scientists. And for those of you looking for a job, keep at it. The rewards are well worth your efforts.