Building a computer vision model is not a trivial task and if you are building one as you read this article we understand how difficult this process can be. There are common pitfalls encountered during the different stages of building a model and this article aims to examine them with suggestions to avoid them and increase you or your team’s productivity.
Pitfall 1: Underestimating the project scope
Before beginning a computer vision project, it is important for the team of data scientists to take a good look at the requirements, metrics and scope and draft out plans using project management frameworks to manage the whole process.
Here is an example: A second-hand car dealership is excited about using AI in its business. They want to give clients the ability to view a car with their phone and show details of the vehicle and the proposed price. They would like to build a computer vision model that detects its model and offers some price specs.
It seems exciting at first until the team building this finds out that they didn’t consider:
- If a public annotated car database for all varieties of cars is available or how they can access it.
- Since it is a fairly used car, how do people tell the price when there’s a scratch? Do they have to get extra car type data with scratches to help the model tell when there’s a scratch?
- Can the buyer know how much mileage the car has by looking at it?
This will probably take so much time and money. But this problem can be tackled by reframing the specific problem statement, listing requirements with product metrics in mind and visualizing the different stages of creating the product to ensure all the team members are on one page. This will reduce friction between team members and with more review of the plan ensure the project is a success with a reduced number of hitches. A better problem statement might be “We want to create a model that can detect all Mercedes cars and price” for a start before looking at the context and scope of the entire project.
This stage involves lots of brainstorming and it is great to get it right as soon as the project kicks off but in real world scenarios it might not be as perfect as you would like it to be. Don’t be scared to revisit your project goals, metrics and scope.
Pitfall 2: Faulty data labeling and annotation process
A trivial example of data collection might be scraping the internet for 500 pictures of pigs, cats and dogs for an animal classifier and after this specific task, image annotation is usually next to enable the model to work with the data. Data science teams might not see this as a pitfall because it is a no-brainer to use quality data for a model but for cautionary reasons, it is still great to reiterate the importance of this process to avoid errors because the model is dependent on the data provided.
It is always important to note that while labeling or outsourcing this task:
- The labeller might not have the required skills to do the work.
- Different labellers have varying context in the way they see the data given to them.
- The cost of paying labellers or the precision risk while using automatic automation tools should be considered in relation to the project goals/timeline.
This can lead to costly mistakes in the project, short and long term.
To avoid this problem:
- Data science teams should not assume that their initial guidelines are interpreted correctly by lablers.
- They should monitor progress for quality and give feedback when needed to improve the annotation process.
- If it doesn’t take much time, the team should take out time to do it themselves and discuss any issues and improve on it.
However, there is a range of incorrectly labeled data your model can ignore, but if more data is incorrectly labeled, the team would have to look through the data and label them manually. Currently, with a variety of labeled open source data for computer vision tasks, the likelihood of making this mistake is reduced.
This process is as important as modeling the data so pay much attention to this process and be involved if need be. Create clear guidelines and monitor the annotation process to ensure quality from the start of the task till the end.
Pitfall 3: Data leakage
Overfitting is the main reason why data leakage is a pitfall. This problem primarily happens when
- A team makes a mistake of sharing data from the test set with the training set and vice versa while splitting data in the preprocessing stage. This implies that when the model is used on the test set, it has an unrealistically high accuracy because the model has seen data from the test set already. It’s like seeing your Christmas gift a night before you are allowed to see it (you already know what you are getting for Christmas) . It is an obvious case and it is difficult to detect this mistake because every team wants higher accuracy and it gives just that. When the model is used on unseen data, it does not do well as expected.
For example, in creating a classifier, the output variable can be mistakenly used as a predictor feature during feature selection. The good news is that in the real world, this seldomly happens and can easily be identified during the modeling process.
- A team makes the mistake of using suggestive or giveaway features during feature selection. Consider predicting a medical condition and in feature selection a ‘type of surgery’ feature which suggests the kind of condition a patient might have is chosen. Appendectomy suggests that the condition is appendicitis, mastectomy suggests breast cancer. This happens regularly and is more difficult to solve.
How to avoid data leakage:
- Be suspicious when your model is doing too well and check the data distribution again and split properly. In this case, the model might be memorizing the leaked data presented to it instead of learning and generalizing.
- Pay more attention to the features and extract the appropriate features to avoid selecting output of giveaway features. Features that have a very high correlation have to be examined more closely and carefully worked with.
- Data preprocessing should be done separately for the training and test sets
- Cross validation is also recommended by data scientists with some reservations
Having a very accurate model is important but be on guard to detect data leakages which don’t readily present themselves as immediate problems most times until it is tested with unseen data. Pay attention to the data processing stage to ensure this pitfall is avoided. Beware of giveaway features and ensure you don’t leak data from test set into training set or vice versa. When your model is too good to be true, be suspicious.
Pitfall 4: Unmindful of data bias
The optimal Machine Learning dataset allows you to train a model that will perform well once you deploy it, ensuring the model makes the proper prediction regardless of the example. However, suppose particular examples are under- or overrepresented in the dataset. In that case, you may find out that your computer vision model may not be generalizable to production cases resulting in decreased accuracy in those scenarios.
Some of the source of biases you should look out for when working on computer vision projects include:
- Selection bias: When you train your CV model on only a skewed subset of scenarios it may encounter in production.
- Measurement bias: When there are differences in the way you collect image data for training and the way production data will be collected for model prediction.
- Confirmation bias: When the data collection method mirrors real-world biases and value distribution in real life, the model reinforces undesirable behaviors.
The detection of the bias and recognizing its origins are the most important steps in resolving the various biases. Data slicing evaluations are one technique to accomplish this. Data slicing can assist you in better understanding the model’s performance on various sub-datasets.
You can slice or subset the dataset and study the model’s behavior on each slice or subset. You may then look at any groups for which the sliced metrics differ significantly from the total dataset.
Pitfall 5: Poor model evaluation and testing
Accuracy is the first thing on everyone’s mind when a computer vision model or any other Machine Learning model is being built. This is not necessarily a wrong thing. A model can be evaluated with just one metric. In real-life examples, there might be more than one metric needed to satisfy all requirements of a client project.
Consider a model created to alert a driver when his/her car is too close to a car. First, the model should be accurate in recognizing that there’s a car in front of it and determine the distance from the car. Also, the run-time to trigger the alert will be in question. If it is too slow, the driver might not have time to act. A number will be given to identify if the model is considered too slow or fast. In this case, accuracy becomes the optimizing metric and speed the ‘satisficing’ metric (an acceptable threshold).
There are metrics for the different models used and varying ways to evaluate them. To avoid this pitfall:
- Read about the algorithm you are applying and check how best to evaluate it
- Test the model on unseen test data to check if its satisficing metric is being met. In the case of the car, evaluate if the time the model takes is good enough.
Also, you can use different libraries to evaluate the model quickly. Deepchecks library makes it straightforward for Machine Learning engineers and data scientists to assess data and models to ensure that every team picks the best features and use the best metrics to evaluate.
In addition, Deepchecks accompanies you through various validation needs such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model, and comparing different models.
Proper evaluation and testing should be done. Instead of focusing only on accuracy, make sure you are creating a wholesome model which takes into consideration both the optimizing and satisficing metrics.
Pitfall 6: Neglecting model error analysis
Project teams most times only check the model’s accuracy, celebrate if it’s good or change different parameters till it meets the accuracy level needed. They forget that the errors give more insight into how the model works.
For example, the training error for a computer vision classifier can be 1%, and when the classifier is tried on the dev set it raises a 12% error. At this point data scientists examine the images the classifier got wrong and either change them or enhance them in a way that reduces the error. If the training set and dev set come from identical distributions, this can signal a significant variance issue meaning the algorithm might not be generalizing well enough. When they are not from the same distribution, this doesn’t hold. Also, depending on the project, the model bias might be high and knowing what it means and how to sort it out will improve the model drastically.
Error analysis can be done:
- Manually by randomly selecting data and their predicted labels from any of the subsets you are interested in and see for yourself how many were wrong and note the issues.
- Using a correlation chart which shows the number of correct and wrong predictions.
- By using class activation maps (CAM) which helps data scientists know the relevance of each image to the class.
This gives some insight into how your model is performing and enables teams to correct errors as they see fit.
If your model is not doing so well, don’t panic. Know exactly what’s wrong by doing a thorough error analysis which can be manual in some cases or done programmatically. This can enhance your model’s performance and it is critical to the success of the final model.
Pitfall 7: Selecting an inappropriate transfer learning technique for your task
Another aspect of developing computer vision models that practitioners tend to overlook is that developing well-performing CV models as much as they are a science, is also an art as well. There are two primary types of transfer learning techniques a practitioner could adopt when they are trying to use pre-trained models on small amounts of datasets:
- Transfer learning through feature extraction,
- Transfer learning through fine-tuning.
In some cases where practitioners would use pre-trained models, they might choose to do feature extraction with the models over fine-tuning. While the type of transfer learning they adopt depends on the problem they are solving, choosing feature extraction over fine-tuning the final layers would almost certainly lead to a less-performing model.
When in doubt over what transfer learning technique you should use, always gear towards fine-tuning the final layers of the model to solve your problem, especially if you are training on a small amount of image data.
Pitfall 8: Performing large-scale data augmentation on the CPU
Most teams will often find that they have to train their computer vision models on large datasets. In most cases, they might load the data into memory and might want to perform data augmentation with the CPU. If the dataset is not much, running your data augmentation jobs through CPU compute might not be a hassle until your dataset starts scaling up. For larger datasets, you should avoid running your data processing jobs on your CPU.
You can better optimize your data augmentation jobs by moving them from CPU to GPUs. For example, NVIDIA has the DALI library that makes it easy for you to load datasets from memory to a GPU device such as NVIDIA in your training pipeline.
Computer vision deep learning models are often data hungry, and as your data scales, you need to make sure you are optimizing your pipeline to efficiently process your dataset and load it to the model for training.
You can learn how to start using NVIDIA DALI in this video guide.
Pitfall 9: Thinking deployment is the final step
There is a common myth that Machine Learning models automatically correct themselves after deployment and little should be done to the model. This might be true for fields like reinforcement learning and even with this method, the model parameters are modified after a period of time to perform optimally.
The model generalizes using a somewhat small percentage of data compared to the vast data the model hasn’t encountered. Its accuracy steadily reduces with time and it becomes less ideal to continue using it. As user input becomes complex, models find this difficult to adapt to without human intervention. The dynamic society we live in might change the interpretation of data. For example consider a scenario where policies change due to the highly debated issues of privacy in computer vision, what will a team do in cases like that? Teams cannot afford to be absent in the life cycle of their products.
To ensure that the model is updated:
- Roles should be assigned to ensure that the model is monitored. This can be done with help of different tools depending on its ease of use, flexibility, monitoring functionalities, overhead and its alert system.
- Root cause analysis can be utilized to find out the main causes of an issue and fixed with a proper plan of action.
- Create key indicators to track and ensure that there is a system of monitoring these indicators.
We finally agree that deployment is not the end of the road. The model needs to be updated with time and this can only be done efficiently when people are tasked with maintaining the model in production. In a scenario where something goes wrong it is necessary to know where it is coming from and fix it as fast as possible to limit business related losses.
Pitfall 10: Neglecting monitoring model usage/cost
There are often situations where you may be running data processing or training jobs on the Cloud or on an on-prem infrastructure. If you are running these jobs on the Cloud, chances are that you rented compute devices and perhaps accelerators such as GPUs and TPUs. One pitfall that you might likely fall into is not monitoring how your jobs are utilizing both memory and compute—this will definitely be costly in terms of money and time.
In cases where you are running your image data processing jobs on distributed clusters, actively monitoring your resource usage can give you an insight into how you can effectively utilize the resource. For instance, if you are using multi-processing to take advantage of your computer resource unless you monitor your resource usage, you may not know if that is a better technique than using multi-threading.
If you are running jobs on your GPU, you want to ensure that the jobs are maximizing the GPUs and saving your time and money. If you are running your jobs on the Cloud, you should monitor the usage of memory and compute to understand how your processes are utilizing these resources. You can consider using purpose-built experiment management tools such as Weights and Biases and MLFlow to monitor your resource usage if your default Cloud option does not cut it.
You probably must have made some of these mistakes before but it doesn’t hide the fact that you or your teammates are rock stars and need to keep up the good work while looking out for these pitfalls in the model deployment process. The aim of this article was to remind you of the common challenges faced and we sincerely hope you take note of these to avoid them in the future.
We have shared lots of information on pitfalls but if you are a curious cat like me, you can also visit our blog about Machine Learning topics which might directly or indirectly help you avoid pitfalls we have listed above. The reason for this is that the field constantly evolves and you can find great information and stay updated.
If you have experienced some challenges we have not outlined here or would like to ask more questions, you can join our community for amazing data scientists and Machine Learning engineers here. For now, take it one step at a time building amazing solutions.