Real-Time Fraud Detection
Challenge
The challenge our team was facing was to identify fraudulent bank transactions in a dataset of 5.3 billion data points. The dataset was extremely imbalanced, being the fraudulent transactions just 0.015% of the complete dataset.
The customer had strong scalability and low latency requirements. They also wanted to enable real-time notifications to their customers when the machine learning system identifies a suspicious behavior.
The solution we provided was built on top of Azure. We had to manage both real time data and static information, after processing with Azure functions, we store the data in Azure Blob Storage. We used AzureML to programatically train a GPU version of a boosted tree algorithm, the compute target was a Data Science Virtual Machine.
The trained model was operationalized using Azure Kubernetes Service and we introduce a websocket framework to be able to notify the customer when a transaction was labeled as fraudulent.
Results
Time to production: | Reduced 165% |
Train time: | x150 faster (than baseline) |
AUC: | +56% (over baseline) |
F1 score: | +67% (over baseline) |
Process: |
Built repeatable and automatic ML pipeline |