According to The Financial Cost of Fraud Report fraud costs businesses and individuals around the world more than US$5 trillion every year. And while fraudulent transactions represent only a small percentage of any organization’s revenue, they can still run into the millions of dollars and carry regulatory implications. Over the last decade, more and more organizations are moving from manually written, rule-based systems to AI fraud detection systems that utilize machine learning.
Requiring thousands of computations to be performed in milliseconds, fraud detection can be extremely complex. There is a need to continuously improve the model’s ability to distinguish between normal and abnormal behavior using both supervised and unsupervised ML techniques. The problems are typically approached by anomaly detection techniques (e.g. Isolation Forest combined with business logic), classification algorithms (e.g. XGBoost after artificially balancing the classes), or a combination of the two.
In many cases, the ML pipeline depends on a range of data sources with varying formats, when some are owned by third parties and may be changed without notice. Dirty data may go undetected since preprocessing methods are used to fix many of the problems—such as inconsistencies, noise, or missing values. Moreover, new types of fraud periodically emerge that weren’t represented in the training data during the last time the model was trained. These different issues are hard to detect in any ML system, but the challenge intensifies when the class imbalance is extreme when false-negative predictions go undetected, or when the pipeline includes a component of unsupervised learning.
How Deepchecks Can Help
Whether your fraud detection system is based on supervised, unsupervised, semi-supervised machine learning, Deepchecks can be used to ensure the end-to-end quality of your ML pipeline, as well as enable you to deal with new patterns of fraudulent behavior. Deepchecks connects to the different components of your training and production pipelines (raw features, processed features, predictions, labels), and learns their behavior over time. It then enables you to:
- Increase pipeline transparency, including characteristics and metrics related to raw and third-party data sources
- Detect in real-time concept drift, out of distribution samples, anomalies, and data integrity issues
- Reduce false positives based on the combined analysis of both training and production datasets
- Define a rule-based model to distinguish whether input and output data is legitimate or not, with the system triggering alerts and displaying insights.
- Segment your client base for tailored monitoring using preemptive analysis and re-training reports.
If you’re dealing with transactional data, and are dealing with issues such as class imbalance, undetected false-negative samples, or components of the system which are unsupervised – Deepchecks is the best choice for ensuring your machine learning system’s quality.