Combat fraudsters at Ubisoft

A recipe for online merchants.

Duration: 10 min | Date: Mar 13, 2024

During 3 years, I have immersed myself in the Gaming World at Ubisoft. The company has developed some of the most iconic video games with the Assassin's Creed series, Rainbow Six Siege, Far Cry, Rayman, Just Dance and many others, making it the french biggest video game studio and one of the major ones in the world.
From the start, Ubisoft has placed Innovation and Data at the center of its business. They started to track gaming metrics very early on, making it a rich place for Data practicioners : recommendation systems, chat moderation, in-game cheater detection, in-game trained bots are examples of this diversity.

At Ubisoft, my primary focus was E-commerce Fraud Detection. Ubisoft sells lots of games or in-game products (virtual currency, skins, DLCs) on their website, Ubiconnect gaming application and Gaming platforms like Steam. Some of these products have a big success and are appealing for fraudsters, who seek buying them for "free" and reselling the products to real gamers with a big discount. This is a win-win transaction for them, but a big loss for the merchant. In fact, most Payment Service Providers state that it is the responsibility of the e-commerce merchant to detect and block Fraud, and they may be charged if they do not do so. That's where Machine Learning comes into play.

The Risk of Fraud


Chargeback

First of all, it is worth diving in our definition of a Fraud. Let's assume a transaction made by a shopper in Ubisoft e-commerce ecosystem. Ubisoft receives funds, the goods are delivered to the customer/player and he can have fun. For most transactions, that's it. However in rare cases, several weeks or months later, a random cardholder may be surprised by a Ubisoft charge on his bank account that he never intentionaly did and dispute it to his bank. In that case, the bank will initiate a chargeback, that will be sent to Ubisoft through the card network. If Ubisoft can not show that the chargeback is an abuse from the cardholder, they have to return the funds to the cardholder, with additional chargeback fees charged by the Payment Service Provider (15euros in case of Paypal, 20 euros in case of Worldpay). On top of that, if an e-merchant is victim of too many frauds, the banks will accept less payments for this company, as illustrated by Microsoft in the figure below.

acceptance

When Ubisoft is aware of a fraud, they may not even be able to withdraw the product from the fraudster (shopper) account. Virtual currency packs for instance are not revocable and represent 80 to 90 % of the in-game transactions, thus are highly targeted by fraud. The malicious actors can then sell these products at a very low price and make a good margin.

You can imagine how this money can be used, from isolated fraudsters trying to make easy money to organized groups willing to finance criminal organisations. This is an ethical issue that businesses are willing to tackle. But the most important problem for these companies is the exponential growth of the number of frauds when a breach is open. That is what fraudsters are looking for : easy money with easy breaches. And as in every security domain, the best defense for companies is too create as much friction as possible on the fraudster path, to deter the malicious actor from attacking their business and make them attack other more vulnerable merchants.

How Machine Learning can help

The friction concept is interesting for Machine Learning Engineers. It means that we do not have to develop a perfect model catching every single fraud. Be sure that if an attacker is motivated enough, he will manage to pass through your model or infrastructure.
For business stakeholders, this is sometimes hard to accept, as fraud raises strong feelings. Nobody wants to be victim of fraud. However, it is key to look at the cost of frauds and put the fear emotions in perspective. Indeed, it may turn out that in some segments of your e-commerce business, fraud chargebacks are not that expensive. That is the case for Ubisoft on Steam in-game transactions for instance, in which Steam is liable for frauds and do not forward chargebacks to game producers. In this case, you should be much more laxist : if you try to block more frauds, you will necesarily block more legit customers.

tp

In these situations, you want to perform a cost analysis to evaluate your model and have an estimate of the Return On Investment of the project. You associate gains and loss to each decision of your model : how much money do you save when you block a fraudster ? How much do you lose when you block a legit customer ?
In this project, the baseline is the performances of the product previously used to detect fraud. We know that this tool manages to limit the chargeback rate at a low level but at the cost of a very high False Positive Rate. This is due to complex handwritten hard rules, accumulated after years of fraud management by payment and fraud experts. The rules are not versioned and the knowledge is almost only held by a single person, making it a perfect use case for ML success.

There are still some challenges to overcome: So the Machine Learning problem we are trying to solve has slightly evolved. We want to replace a rule-engine by a machine learning model, by taking advantage of a lot of transactional data, with the objective of controlling the chargeback rate and accept as many legit customers as possible.

How Do We Tackle It?

A Feature Store

Ubisoft has a lot of gaming and transactional data that Machine Learning Engineers can exploit. For Fraud Detection, Feature Engineering is key. Thus, we spent a lot of time with Ubisoft Fraud experts to determine the characteristics of each fraud attacks Ubisoft has undergone in its history. Here are some risk factors we spotted and translated in features:

These features have to be computed in real-time, and require several table joins that take time. Thus, in order to match the strict SLA, we needed to have an efficient way of getting these values. That's what a Feature Store can bring. By computing parts of the features in daily batches, we can retrieve feature values much more rapidly. For instance, you can add the timestamps of the new transactions that the customers made each day in an individual time series and maintain these arrays of all transactions done by customers in tables. During inference, you just have to apply the feature function on those.

Futhermore, The Feature Store is a more reliable way of computing features for this kind of application: the definitions are centralized, they are tested and the values are monitored. It ensures an alignment between offline (training) and online (real-time inference) feature values. Below is a simple schema describing the Feature Store. If you want more detail, you can have a look at this great talk by Jeanine Harb, former Data Engineer in the team.

fsv2

XGBoost for the win


In Fraud Detection, we have highly imbalanced datasets and delays in the identification of fraudulent transactions making our dataset corrupted. For these reasons, we explored semi-supervised learning and weakly supervised learning, with Deviation Networks for Anomaly Detection for instance. We assumed that all our legit transactions, for which we did not receive any fraud flag yet, were actually unlabeled data. Indeed, they have a chance to be a fraud someday if someone complains to his bank. The only sure labels are frauds because we had a chargeback from banks. Deviation networks push anomalies far from the normal data, attracting similar data points. When we use this algorithm, we assume that frauds are anomalies, meaning that some features differ from legit transactions. Unfortunately apart from easy fraud patterns, most frauds do not vary a lot from normal data due to the adversarial behavior of fraudsters who try to mimick normal transactions. Nevertheless, this exploration was valuable in order to analyze past data and dig in "legit" transactions looking similar to known frauds, allowing us to clean our dataset labels.

Despite many tests to improve performances, on tabular data the secret sauce remains having strong features correlated with fraud and train Gradient Boosted Trees. We added a undersampling step to rebalance our dataset before optimising a XGBoost model, and we did not take the last few weeks of transactions in our training set as the data was too corrupted. Feature Engineering and our collaboration with fraud experts has always been the most effective strategy to refine our model, ensuring its alignment with business objectives by catching fraud efficiently and accepting most of legit users.

Moreover, we used explainability tools such as explainerdashboards. This is very convenient to debug our model, fully understand it and explain the model's decisions when there is a customer inquiry. On top of that, we added unit tests on key segments of our dataset to protect us against performance regression when we deploy a new model.

exp

Explainability is key for this project, so after moving from a very complex rule engine (previous product) to a Machine Learning model with great Feature Engineering, we finally started to come back to a rule engine, leveraging the pattern mining algorithm FP-growth. We get the associations of categorical features that happen the most with fraud. We then send these rules to the fraud experts who can validate them before going in production. With a clean and tested rule registry and regularly updated rulesets, this approach is great. It is used by Uber in their product Uber RADAR.
To be complete on this Rule Engine vs Machine Learning subject, I would also like to share Jeremy Jordan's wonderful talk, one of the best I have seen as he sums up all the aspects of applying ML in Security, showcasing that the Rule Engine should be the basis of these systems.

Model evaluation and Business KPIs

When a transaction is blocked, we do not get any label as the payment ends. It means that we only have the labels for transactions that our model accepted, which will be fraud or legit. In statistical terms, these are True Negatives (Legit transactions that were accepted by the model) and False Negatives (Frauds that were accepted by the model). To compute Classification metrics, we also need some positive instances, things that were blocked by the model.
Thus, we implemented a Control Group : For a subsample of all the transactions in a day, we will bypass the model decision and just log the Model score so that the payments of this subset are completed and we can still analyze our model's decisions. This allowed us to rigorously assess the impact of our fraud detection system and refine our strategies.
The concept of Control Group is extensively presented by Stripe in a PyData talk.

In the end, the success of our project is reflected in key metrics such as a 5% gain in net sales and the valuable time saved for fraud experts. Thanks to this project, we also noticed human errors made on the rule based system leading to an increase in false positives, and we decentralized knowledge so that the tool can be owned by more people. Lastly, we replaced the third party tool on 80% of all PC transactions, a total of 80M euros per year.

A Live product to support

Now that the platform is ready and the model is deployed, the Fraud Detection product is live and the team has to maintain it. First, It means Monitoring it using Grafana for real-time monitoring and Tableau for dashboarding as most people in the team and business stakeholders were familiar with it. Alerts need to be set properly and thresholds fine tuned so that we can react as soon as possible when there is an incident (fraud attack, platform down...)

Then, it means having the good set of tool to retrain and deploy Machine Learning models when there is an emerging fraud pattern. For this, we need :

mle_stack

Last words

This is just another story of Machine Learning models in production showing that Modeling is only the tip of the iceberg in production use cases. It also reminds us that Decision Trees still rock in business and it is a must have skill to master classical ML algorithms.
On top of modeling, there is so much to discover in MLOps, DevOps, Data Engineering, Software Engineering, making the Machine Learning Engineer role a wonderful place for curious and creative people.
I thank Ubisoft again for the opportunity of working on this project with such a great team !