If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

Computer Vision Models: Workflow and Tools

This blog post was written by Preet Sanghavi as part of the Deepchecks Community Blog. If you would like to contribute your own blog post, feel free to reach out to us via blog@deepchecks.com. We typically pay a symbolic fee for content that’s accepted by our reviewers.


“A picture is worth a thousand words.” Just like we humans interpret a lot of information from photographs, intelligent machines also possess the capability to learn from images. While Artificial Intelligence (AI) allows machines to think, Computer Vision (CV) enables machines to learn patterns.

A CV pipeline (a.k.a. Vision Pipeline) is a means of automating workflow. It follows these phases to make predictions using image data.

General Computer Vision Pipeline

Source

Phase 1: Input Data Selection

This is the acquisition of data. A computer vision model uses data in the form of images and image frames so it is important to choose relevant sources to properly train the model.

Phase 2: Data Pre-processing

Here, wepre-process the acquired data. This ensures that the collected data has been processed well enough by passing it through a number of filters to make it suitable for training and testing purposes.

Phase 3: Defining the Problem Statement

We define the problem to be solved using the pre-processed images. Facial recognition, image enhancement, or image data augmentation are some of the popular examples.

Phase 4: Feature Extraction

Just like any Machine Learning (ML) Model, feature extraction is also a part of building the CV pipeline. It involves automated curation and selection by the model of key features or data points that can contribute toward resolving the problem statement.

Phase 5: Training

This phase involves training the model by updating the different weights of each and every feature. The higher the training accuracy, the better the model. This is the penultimate stage of the ML pipeline.

Phase 6: Testing

This is the ultimate stage of any CV pipeline. It involves making key decisions, predictions or judgements based on the weights assigned to different features during training of the model and testing the accuracy based on the ground truth values. The dataset is generally divided into two parts, the training and the testing in an 8:2 ratio, respectively.

Now, let us try to explore different tools that help us build CV models.

Open source package for ml validation

Build Test Suites for ML Models & Data with Deepchecks

Get StartedOur GithubOur Github

Top 6 Tools for Computer Vision Modeling

OpenCV

OpenCV is the most renowned and widely used CV library. It provides access to more than 2,000 different algorithms that can be used to build your own model.

Some commonly use-cases of the OpenCV library:

  • Face detection and recognition
  • Red-eye removal from images
  • Object identification
  • Object dimension identification
  • Object Tracking

OpenCV has been supported across plenty of languages including Python, Matlab, Java and C++ and can be used on multiple operating systems like Windows, Ubuntu, Linux or MacOS.

Tech-giants like Google, Amazon, Tesla regularly make use of this library to hone their algorithms. One of the key advantages of OpenCV is that it allows users to use the library for free and is open-sourced.

Check out this colab for a deeper dive in OpenCV.

TensorFlow

TensorFlow  is another widely used Deep Learning platform with a set of tools, functions and libraries. It is also considered to be one of the premier tools for CV modeling. Moreover, just like OpenCV, TensorFlow can be used across multiple operating systems like Windows, Ubuntu, MacOS and can gel well with languages like Python, C, C++ or Java.

TensorFlow is mainly used to fasten the overall time-dependency of ML systems depending on CV algorithms. It is an open-source platform that helps i significantly decrease i the model size and augment the overall model accuracy.

The following code imports the MNIST dataset from TensorFlow and makes use of the Sequential model to make predictions on it.

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
  loss='sparse_categorical_crossentropy',
  metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

Python Code for Training a CV model using MNIST dataset

The only downside to TensorFlow is that it uses a large number of resources for practical operation.

There are several overlaps between TensorFlow and OpenCV when it comes to CV Modeling. However, there are two key points that help distinguish between the use-cases:

  1. TensorFlow is superior to OpenCV when it comes to training a model for a specific task using custom datasets because of the wide array of options and ability to re-train models using the DNN module. You can read more about it here.
  2. OpenCV is preferred for deployment of models involving tasks like image segmentation, object detection, and classification as it provides deployment as a part of C++ apps, API’s or a Software Development Kit because of enhanced performance.

SimpleCV

While OpenCV and TensorFlow provide many applicational options, SimpleCV is an excellent option for easy to use and explainable CV modeling.

SimpleCV is an open-sourced framework that simplifies the access to major libraries like OpenCV. The most straight-forward method to realize the use-case of SimpleCV is by going through their tutorial here.

from SimpleCV import Camera
# Initialize the camera
cam = Camera()
# Loop to continuously get images
while True:
    # Get Image from camera
    img = cam.getImage()
    # Make image black and white
    img = img.binarize()
    # Draw the text "Hello World" on image
    img.drawText("Hello World!")
    # Show the image
    img.show()

Python Code for drawing text on an image using the built-in drawText function within the Camera module of SimpleCV.

The following image shows the SimpleCV threshold function. The threshold method sets each pixel in an image to black or white depending on its brightness.

SimpleCV threshold function

Source

BoofCV

BoofCV is a unique Java-based software tool that enables the modeling of real-time CV applications. It is free to use and open-sourced under the Apache license. It provides the user with end-to-end support for developing simplified or complex CV models using a simple user interface.

BoofCV is organized into several packages: image processing, features, geometric vision, calibration, recognition, visualization, and IO. Creating and printing QR codes, camera calibration, and image downsampling are some of the examples of different use-cases of BoofCV.

SimpleCV vs. BoofCV:

  1. SimpleCV is a simplified framework that allows users to perform CV tasks without having to learn concepts like bit-depth, file formats or storage. It works as a gateway for beginners to explore the use-cases of OpenCV.
  2. BoofCV is an advanced Java-based CV library that can be used using pre-built Jars on Mavern Central. BoofCV is known for its superior performance compared to other CV libraries as it supports memory and data structure recycling. You can read more about it here.

DeepFace

DeepFace is a free and open-sourced library extensively used for Facial Recognition. It provides the simplest interface by allowing to work with complex CV tasks with just a single line of code.

These are the four key application possibilities associated with DeepFace.

1. Facial Verification
Facial Verification is the process of comparing two faces to check if they are matching or not. This can be used to validate whether a face on a physical document to the face is present on any Identification document.

2. Facial Identification
Commonly referred to as facial recognition, this involves identifying a face from a set of thousands of images. In order to make it work, facial verification is run on pairs of thousands of images for higher accuracy.

3. Facial Analysis
Facial Analysis involves a collection of information from an image and identifying the key visual characteristics (i.e., Intensity, Brightness, etc) of an image. This can be used across multiple applications like emotional classification from facial images, gender classification, or age prediction.

Real-time Applications of DeepFace:

There are plenty of real-time applications of facial recognition and identification. Closed-circuit Television (CCTV) footage and security cameras are some examples where instant identification of a face can be useful in reducing crime rates and felonies.

pip install deepface

Command to install the Deepface module:

from deepface import DeepFace
result = DeepFace.verify(img1_path = "img1.jpg", img2_path = "img2.jpg")

Python Code to verify images using DeepFace using the built-in Verify function

The output of the aforementioned code can be illustrated like this:

Source

While OpenCV, BoofCV, and other CV libraries and frameworks help us with face detection tasks, they are not specifically made for it. Face detection involves learning a plethora of complex features and making inferences based on the learned features. DeepFace is a specialized library built for this purpose. Moreover, DeepFace being “lightweight” means that complex operations can be carried out using a single line of code and is a language independent package.

YOLO

“You Only Look Once.” is one of the most widely used object detection tools in the market, making it highly reliable. YOLO is specifically used for Object Detection in CV.

Object Detection YOLO

Source: Real-time Object Detection with YOLO

Here is an intricate illustration of YOLO:

How YOLO v3 works

Source

The key advantage of YOLO as compared to other object detection frameworks and libraries is the fact that it performs inferences at a very high speed (approximately 45 frames per second). This is because YOLO’s architecture was built such that it performs classification and bounding box regression at the same time. Read more about it here.

In this article, we have overviewed six different tools for CV modeling. These tools help us build simple and complex applications ranging from Object Detection to Deep Facial Analysis. It is important to test and gauge each platform depending on the problem statement and resource availability.

To explore Deepchecks’ open-source library, go try it out yourself! Don’t forget to ⭐ their Github repo, it’s really a big deal for open-source-led companies like Deepchecks.

Subscribe to Our Newsletter

Do you want to stay informed? Keep up-to-date with industry news, the latest trends in MLOps, and observability of ML systems.

Related articles

How to Choose the Right Metrics to Analyze Model Data Drift
How to Choose the Right Metrics to Analyze Model Data Drift
What to Look for in an AI Governance Solution
What to Look for in an AI Governance Solution
×

Event
Identifying and Preventing Key ML PitfallsDec 5th, 2022    06:00 PM PST

Days
:
Hours
:
Minutes
:
Seconds
Register NowRegister Now