A recipe for online merchants.
Duration: 10 min | Date: Mar 13, 2024
During 3 years, I have immersed myself in the Gaming World at Ubisoft. The company has developed some of the most iconic video games with the Assassin's Creed series, Rainbow Six Siege, Far Cry, Rayman, Just Dance and many others, making it the french biggest video game studio and one of the major ones in the world.
From the start, Ubisoft has placed Innovation and Data at the center of its business. They started to track gaming metrics very early on, making it a rich place for Data practicioners :
recommendation systems, chat moderation, in-game cheater detection, in-game trained bots are examples of this diversity.
At Ubisoft, my primary focus was E-commerce Fraud Detection. Ubisoft sells lots of games or in-game products (virtual currency, skins, DLCs) on their website, Ubiconnect gaming application and Gaming platforms like Steam.
Some of these products have a big success and are appealing for fraudsters, who seek buying them for "free" and reselling the products to real gamers with a big discount. This is a win-win transaction for them, but a big loss for the merchant.
In fact, most Payment Service Providers state that it is the responsibility of the e-commerce merchant to detect and block Fraud, and they may be charged if they do not do so. That's where Machine Learning comes into play.
First of all, it is worth diving in our definition of a Fraud. Let's assume a transaction made by a shopper in Ubisoft e-commerce ecosystem. Ubisoft receives funds, the goods are delivered to the customer/player and he can have fun. For most transactions, that's it. However in rare cases, several weeks or months later, a random cardholder may be surprised by a Ubisoft charge on his bank account that he never intentionaly did and dispute it to his bank. In that case, the bank will initiate a chargeback, that will be sent to Ubisoft through the card network. If Ubisoft can not show that the chargeback is an abuse from the cardholder, they have to return the funds to the cardholder, with additional chargeback fees charged by the Payment Service Provider (15euros in case of Paypal, 20 euros in case of Worldpay). On top of that, if an e-merchant is victim of too many frauds, the banks will accept less payments for this company, as illustrated by Microsoft in the figure below.
You can imagine how this money can be used, from isolated fraudsters trying to make easy money to organized groups willing to finance criminal organisations. This is an ethical issue that businesses are willing to tackle. But the most important problem for these companies is the exponential growth of the number of frauds when a breach is open. That is what fraudsters are looking for : easy money with easy breaches. And as in every security domain, the best defense for companies is too create as much friction as possible on the fraudster path, to deter the malicious actor from attacking their business and make them attack other more vulnerable merchants.
The friction concept is interesting for Machine Learning Engineers. It means that we do not have to develop a perfect model
catching every single fraud. Be sure that if an attacker is motivated enough, he will manage to pass through your model or infrastructure.
For business stakeholders, this is sometimes hard to accept, as fraud raises strong feelings. Nobody wants to be victim of fraud.
However, it is key to look at the cost of frauds and put the fear emotions in perspective.
Indeed, it may turn out that in some segments of your e-commerce business, fraud chargebacks are not that expensive. That is the case for Ubisoft on Steam in-game transactions for instance,
in which Steam is liable for frauds and do not forward chargebacks to game producers. In this case, you should be much more laxist : if you try to block more frauds, you will necesarily block more legit customers.
Ubisoft has a lot of gaming and transactional data that Machine Learning Engineers can exploit. For Fraud Detection, Feature Engineering is key. Thus, we spent a lot of time with Ubisoft Fraud experts to determine the characteristics of each fraud attacks Ubisoft has undergone in its history. Here are some risk factors we spotted and translated in features:
Futhermore, The Feature Store is a more reliable way of computing features for this kind of application: the definitions are centralized, they are tested and the values are monitored. It ensures an alignment between offline (training) and online (real-time inference) feature values. Below is a simple schema describing the Feature Store. If you want more detail, you can have a look at this great talk by Jeanine Harb, former Data Engineer in the team.
Despite many tests to improve performances, on tabular data the secret sauce remains having strong features correlated with fraud and train Gradient Boosted Trees. We added a undersampling step to rebalance our dataset before optimising a XGBoost model, and we did not take the last few weeks of transactions in our training set as the data was too corrupted. Feature Engineering and our collaboration with fraud experts has always been the most effective strategy to refine our model, ensuring its alignment with business objectives by catching fraud efficiently and accepting most of legit users.
Moreover, we used explainability tools such as explainerdashboards. This is very convenient to debug our model, fully understand it and explain the model's decisions when there is a customer inquiry. On top of that, we added unit tests on key segments of our dataset to protect us against performance regression when we deploy a new model.
When a transaction is blocked, we do not get any label as the payment ends.
It means that we only have the labels for transactions that our model accepted, which will be fraud or legit.
In statistical terms, these are True Negatives (Legit transactions that were accepted by the model) and False Negatives (Frauds that were accepted by the model).
To compute Classification metrics, we also need some positive instances, things that were blocked by the model.
Thus, we implemented a Control Group : For a subsample of all the transactions in a day, we will bypass the model decision and just log the Model score so that the payments of this subset are completed
and we can still analyze our model's decisions.
This allowed us to rigorously assess the impact of our fraud detection system and refine our strategies.
The concept of Control Group is extensively presented by Stripe in a PyData talk.
In the end, the success of our project is reflected in key metrics such as a 5% gain in net sales and the valuable time saved for fraud experts. Thanks to this project, we also noticed human errors made on the rule based system leading to an increase in false positives, and we decentralized knowledge so that the tool can be owned by more people. Lastly, we replaced the third party tool on 80% of all PC transactions, a total of 80M euros per year.
Now that the platform is ready and the model is deployed, the Fraud Detection product is live and the team has to maintain it. First, It means Monitoring it using Grafana for real-time monitoring and Tableau for dashboarding as most people in the team and business stakeholders were familiar with it. Alerts need to be set properly and thresholds fine tuned so that we can react as soon as possible when there is an incident (fraud attack, platform down...)
Then, it means having the good set of tool to retrain and deploy Machine Learning models when there is an emerging fraud pattern. For this, we need :
This is just another story of Machine Learning models in production showing that Modeling is only the tip
of the iceberg in production use cases. It also reminds us that Decision Trees still rock in business and it is a must have skill
to master classical ML algorithms.
On top of modeling, there is so much to discover in MLOps, DevOps, Data Engineering,
Software Engineering, making the Machine Learning Engineer role a wonderful place for curious and creative people.
I thank Ubisoft again for the opportunity of working on this project with such a great team !