Michaël Romagné | Machine Learning Engineer

E-commerce is a wild world where merchants want to grow their business by selling their products online, but have to deal with malicious actors who seek buying them for "free" and reselling these products to real customers with a big discount. This is a win-win transaction for them, but a big loss for the merchant. In fact, most Payment Service Providers state that it is the responsibility of the e-commerce merchant to detect and block Fraud, and they may be charged if they do not do so. That's where Machine Learning comes into play.

The Risk of Fraud for merchants

First of all, it is worth diving in the definition of a Fraud. Let's assume a transaction made by a shopper on a website. The merchant receives funds, the goods are delivered to the customer and life goes on. For most transactions, that's it. However in rare cases, several weeks or months later, a random cardholder may be surprised by an unknown charge on his bank account that he never intentionaly did and dispute it to his bank. In that case, the bank will initiate a chargeback, that will be sent to the merchant through the card network. If the merchant can not show that the chargeback is an abuse from the cardholder, he has to return the funds to the cardholder, with additional chargeback fees charged by the Payment Service Provider (a few dozens euros by transaction). On top of that, if an e-merchant is victim of too many frauds, the banks will accept less payments for this company, as illustrated by Microsoft in the figure below.

To make things even worse, when the e-merchant is aware of a fraud, he may not be able to withdraw the product from the fraudster (shopper) account. This is particularly true in the case of physical products, but also for some virtual products.

You can imagine how this money can then be used, from isolated fraudsters trying to make easy money to organized groups willing to finance criminal organisations. This is an ethical issue that businesses are willing to tackle. But the most important problem for these companies is the exponential growth of the number of frauds when a breach is open. That is what fraudsters are looking for : easy money with easy breaches. And as in every security domain, the best defense for companies is too create as much friction as possible on the fraudster path, to deter the malicious actor from attacking their business and make them attack other more vulnerable merchants.

How Machine Learning can help

The friction concept is interesting for Machine Learning Engineers. It means that we do not have to develop a perfect model catching every single fraud. Be sure that if an attacker is motivated enough, he will manage to pass through your model or infrastructure.
For business stakeholders, this is sometimes hard to accept, as fraud raises strong feelings. Nobody wants to be victim of fraud. However, it is key to look at the cost of frauds and put the fear emotions in perspective. Indeed, it may turn out that in some segments of your e-commerce business, fraud chargebacks are not that expensive, depending on your Payment Service Provider agreement. In this case, you should be much more laxist : if you try to block more frauds, you will necesarily block more legit customers.

In these situations, you want to perform a cost analysis to evaluate your model impact and have an estimate of the Return On Investment of the project. You associate gains and loss to each decision of your model : how much money do you save when you block a fraudster ? How much do you lose when you block a legit customer ? You may also be replacing an already existing fraud detection tool, usually a rule engine that has been in your company for years. You can start by computing metrics like the fraud rate and the block rate of this system. Then, you may notice that it is composed of complex handwritten hard rules, accumulated after years of fraud management by payment and fraud experts. The rules are not versioned and the knowledge is only held by a few experts, making hard to interpret and maintain. This is a good use case for ML success.

There are still some challenges to overcome:

The e-commerce transactions must be analyzed in real-time, with a very low latency (you have to compute the multiple features and run inference).
Adversarial context : Fraudsters will continuously adapt to beat your model.
As described in the chargeback flow, the fraud labels come from banks with a big delay (from several weeks to a few months), so dataset labels are corrupted.
If a transaction is blocked in real time, you will never get any label, because the payment is not completed.
Ideally, there are very few frauds among all your transactions, leading to highly skewed datasets.

So the Machine Learning problem we are trying to solve has slightly evolved. We want to replace a rule-engine by a machine learning model, by taking advantage of a lot of transactional data, with the objective of controlling the chargeback rate and accept as many legit customers as possible.

How can you tackle this problem?

A Feature Store

Merchants can take advantage of their transactional data, thanks to Machine Learning. For Fraud Detection, Feature Engineering is key. Thus, ML teams must spend time with Fraud experts to determine the characteristics of each fraud attack that the merchant has undergone in its history. Here are some risk factors that can be translated in features:

The customer already frauded in the past.
A high number of transactions in a short amount of time.
A high amount spent with an account created very recently.
Surprising geographical information, not aligned with the customer history.

These features have to be computed in real-time, and require several table joins that take time. Thus, in order to match the low latency criteria, you must have an efficient way of getting these values. That's what a Feature Store can bring. By computing parts of the features prior to the transaction (in daily batches for features that allow it), you can retrieve feature values much more rapidly.

Futhermore, The Feature Store is a more reliable way of computing features for this kind of application: the definitions are centralized, they are tested and the values are monitored. It ensures an alignment between offline (training) and online (real-time inference) feature values. Below is a simple schema describing a type of Feature Store. If you want more detail, you can have a look at this great talk by Jeanine Harb, Data Engineer.

XGBoost for the win

In Fraud Detection, datasets are highly imbalanced and delays in the identification of fraudulent transactions make datasets corrupted. For these reasons, semi-supervised learning and weakly supervised learning can be explored, with Deviation Networks for Anomaly Detection for instance. In these paradigms, it is assumed that all legit transactions in your dataset, for which you did not receive any fraud flag yet, are actually unlabeled data. Indeed, they have a chance to be a fraud someday if someone complains to his bank. The only sure labels are frauds because you had a chargeback from banks. Deviation networks push anomalies far from the normal data distribution, attracting similar data points. In this algorithm, it is thus assumed that frauds are anomalies, meaning that some feature values differ from legit transactions. Unfortunately apart from easy patterns, most frauds do not vary a lot from normal data due to the adversarial behavior of fraudsters. You have to continuously work on Feature Engineering to counter attack. Nevertheless, this exploration is valuable in order to analyze past data and dig in "legit" transactions looking similar to known frauds, allowing you to clean your dataset labels.

On tabular data, the secret sauce remains having strong features correlated with fraud patterns and train Gradient Boosted Trees. Add undersampling to rebalance your dataset before optimising a XGBoost model, and do not take the last few weeks of transactions in your training set as the data is too corrupted. Feature Engineering and collaboration with fraud experts is always the most effective strategy to refine your models, ensuring alignment with business objectives by catching fraud efficiently and accepting most of legit users.

Moreover, there are awesome explainability tools such as explainerdashboards. This is very convenient to debug a tree-based model, fully understand it and explain the model's decisions when there is a customer inquiry. On top of that, you can lastly define unit tests on key segments of your dataset to protect you against performance regression when a new model is deployed.

Explainability is key for Fraud Detection, so you may even prefer to tune a rule engine, leveraging the pattern mining algorithm FP-growth for instance. This algorithm gets the associations of categorical features that happen the most with fraud. You can then send these rules to the fraud experts who can validate them before going in production. With a clean and tested rule registry and regularly updated rulesets, this approach is great. It is used by Uber in their product Uber RADAR.
To be complete on this Rule Engine vs Machine Learning subject, I would also like to share Jeremy Jordan's talk, one of the best I have seen as he sums up all the aspects of applying ML in Security, showcasing that the Rule Engine should be the basis of these systems.

Model evaluation and Business KPIs

When a transaction is blocked, the payment flow ends, thus you will get no label for it. It means that you only have the labels for transactions that your model accepted, which will be fraud or legit. In statistical terms, these are True Negatives (Legit transactions that were accepted by the model) and False Negatives (Frauds that were accepted by the model). To compute classification metrics, you also need some positive instances, things that were blocked by the model.
A solution to overcome this challenge is the Control Group : For a subsample of all the transactions in a day, you bypass the model decision and just log the Model score so that the payments of this subset are completed and you can analyze your model's decisions. Of course, you have to be careful on how you select the customers / transactions you select for the Control Group in order to forbid fraudsters from taking advantage of it. This allows you to rigorously assess the impact of your fraud detection system and refine your strategies.
The concept of Control Group is extensively presented by Stripe in a PyData talk.

In the end, the success of Fraud Detection projects is reflected in key metrics such as a gain in net sales due to less legit transactions being blocked, and the valuable time saved for fraud experts. You also decentralize knowledge so that the in-house tool can be owned by more people.

A Live product to support

Now that your Fraud Detection model is deployed (and the platform is built), the Fraud Detection product is live and the team has to maintain it. First, It means Monitoring it using tools like Grafana for real-time monitoring and Tableau or Streamlit for dashboards. Alerts need to be set properly and thresholds fine tuned so that you can react as soon as possible when there is an incident (fraud attack, platform down...)

Then, it means having the good set of tools to retrain and deploy Machine Learning models when there is an emerging fraud pattern. For this, you need :

A tool to launch remote jobs on powerful Machines. You can use Skypilot or Okteto (Kubernetes only), and DVC to run reproducible pipelines with data versioning.
An Experiment tracking tool : ClearML is great for this and much more. Mlflow is also a good option.
A robust CI to deploy your models : Model performance unit tests to avoid regressions and Gitlab CI for the release jobs.

Last words

This is just another story of Machine Learning models in production showing that Modeling is only the tip of the iceberg in production use cases. It also reminds us that Decision Trees still rock in business and it is a must have skill to master classical ML algorithms.
On top of modeling, there is so much to discover in MLOps, DevOps, Data Engineering, Software Engineering, making the Machine Learning Engineer role a wonderful place for curious and creative people.

ML for Fraud Detection