When should Data Scientists unit test?

There are three significant stages in a machine learning pipeline: pre-processing, model training, and model evaluation.


1) Pre-processing is the stage of model building where we transform raw data into usable data.  Let’s take the example in the travel industry where we need websites with dynamic pricing. Companies have a huge set of raw data at their disposal that can be used to optimise prices. They know that sunny and warm destinations are more popular than cold and gloomy ones when the summer comes. This kind of information needs to be pre-processed in order to be fed to the machine learning model. These pre-processing pipelines are made of complex code and this code needs to be thoroughly tested to avoid bugs and flaws. Ensuring robust pre-processing code that feeds clean information enables data scientists to be sure that what happens in the production environment is equivalent to what happened when they were training their models in their sandbox environment.


Unit tests are paramount at this stage as a well-coded pre-processing pipeline is key to moving safely to the next stage. Let’s take another example where we are pre-processing texts to feed to a model. The text pre-processing pipeline will take care of correcting grammar mistakes in the text, then putting the text in lowercase and dividing it into smaller elements. Each of these pre-processing steps must be unit tested in order to ensure that, in the future, if we add new pre-processing steps, the new elements will not weaken the existing pre-processing pipeline. This is where Ponicode can be very useful because it enables you to write non-regression tests and each time you write a new data pre-processing function, you can quickly and directly generate unit tests for this function. From a maintainability, legacy code and collaboration perspective, this means that other data scientists, or your future self, can change existing pre-processing pipelines with the peace and security of knowing they’re not breaking any previous elements while identifying new flaws fast and efficiently.


2) Model training is the stage where we feed a machine learning algorithm with data to help identify and learn good values for all attributes involved. At this stage, unit tests help to make sure that the model in production is stable.

A machine learning model is generally defined by weights, these are the parameters of the model. We have to make sure that our training pipeline is working and that it trains our model correctly and adjusts it appropriately.

In this context, we can generate unit tests which check whether, after having executed our training pipelines, the weights adjust correctly. These are groundbreaking tests that look to see if weights are shifting and doing so in the right direction. These are unit tests that really check whether the model is training correctly or not.


3) Evaluation. This stage is self-defined. Evaluation is usually tailor-made to the model. Data scientists write customised performance indicators that reflect their goals as closely as possible. From this perspective, we can write unit tests that will verify that those KPIs are well coded. This will allow the addition of new elements to our evaluation pipeline in the future without breaking any existing indicators.

Tests and specifically unit tests can consequently really reinforce the robustness of your model and make it time and collaboration proof. Following code quality best practices is an essentiel ground work to cover as we see an increasing burden of legacy code with data projects. Questioning the reliability of the model without making sure that the code around it has been well tested since counter productive. And for the data scientists and machine learning engineers who do not yet know their way around unit tests we build Ponicode for Data Science: a low code VS Code extension to build robust unit test files in a few clicks.