In the end... What are data scientists made of?

Are you confused between Ponicode and CircleCI? It’s not you, it’s us. Ponicode was acquired by CircleCI as of March 2022. The content and material published prior to this date remains under Ponicode’s name. When in doubt: Ponicode = CircleCI.

As you may know, the Ponicode team is a melting pot made up of people coming from various backgrounds where data scientists and software engineers work hand in hand towards the same goal: designing an AI-powered solution allowing you to reach high quality code faster and easily. Why combine these two profiles within the same team? What are their area of expertise and what does this combination bring to the team and product? 

First let’s take a closer look at the data scientist job and evolution nowadays. 

What does a data scientist job consist of? 

The role of data scientists can vary depending on the type of structure they’re evolving in. For instance, Google and other GAFA companies don’t actually hire data scientists but only software engineers, but why? Do we even need data scientists then?

Yes. Strictly speaking, data scientists are computer scientists specialized in statistics. Depending on the job description, they work on prediction, modelisation or on educating and evangelizing crowds around data usage whether in the company they’re working for or in completely different jobs such as Marketing for instance, they don’t actually always put their work into production in that case. Data scientist jobs easily stretch from engineering to consulting and are usually made of hybrid skills and capabilities. 

Now, what does it mean to be a data scientist today?

The job has expanded in the last few years. An increasing number of people are now shifting careers to work in data science equipped with basic knowledge, driven by performance rather than quality of execution. We can also call them the “Kaggle generation”, named after the website organizing famous competitions. 

What is truly missing from their training is code quality. Let me explain myself and tell you what I mean by code quality and why data scientists generally get it wrong.

There are several key components to code quality, let’s start with naming. 

The code communicates between machines and people, this means that naming is crucial for others to know what your code is really about and how it should behave. Data scientists tend to learn a more mathematical code that doesn’t need to be explicit to people and will adopt naming conventions like df, x or y — told you, not explicit. 

To code architecture now which is also important to build maintainable code in the long run, a key pillar of code quality. Data scientists are usually more familiar with notebooks which have a poorly maintainable structure whatsoever, consequently making it harder to maintain. 

Also, when reviewing and assessing their code, data scientists will do it from a statistical point of view, which is very good from a mathematical angle. But not so much from a software perspective because the code might crash due to edge cases. 

Indeed, data scientists are taught to discard edge cases in mathematical models and are not always aware to test their models at least not before pushing them to production. Eventually, they find themselves with unstable code which could easily put the entire model in jeopardy by breaking it in the worst case scenario. That would mean starting the work all over again right from the beginning which would take a lot of valuable time and effort. 

Far from us the idea to convey a bad reputation to data scientists, so here is why they should definitely work in software factories and what they can bring to the team! 

Data scientists have the culture of figures and will always want to know how technical decisions impact the product and they have the training to do it. This brings a whole new dimension to the work as they are introducing statistics, which is sorely missed and not leveraged enough in software factories yet. 

As strong mathematicians and problem solvers, they will bring a technological thinking to the table and are always eager to read about the latest research and developments hence uncovering new findings to level up the product they are working on! 

Code quality being a recent concept and way to work, especially in data science, they are able to challenge the existing methods and find new ones.

For these reasons, data scientists should definitely work in a software factory among software engineers because they will bring a fresh look and challenge the way softwares are built. Combining data scientists & software engineers is key to bringing innovation and new thinking on the product. At least, that is what we have been able to observe here at Ponicode and why we want to share this testimony today. 

PS: Moreover, data science alone is not always enough. Combining it to real world applications is the thrill of the job. And what is more challenging than leveraging data science on code quality and software engineering? 

And that’s why data scientists are more than welcome in the crazy Ponicode team! 

Find out more about the job 🦄

Green blobred blob