Background

This project describes how Amazon S3, Amazon SageMaker and Amazon QuickSight can be utilized for Fraud Detection.

Why Fraud?

Fraud

In 2021, credit card fraud grew to the largest ever, with 393,207 reports of credit card fraud filed. This poses a major risk for lenders and borrowers alike, and cloud based analytics are a great solution to uncover this problem.

Dataset

https://www.kaggle.com/datasets/kartik2112/fraud-detection

This is a simulated credit card transaction dataset containing legitimate and fraud transactions from the duration 1st Jan 2019 - 31st Dec 2020. It covers credit cards of 1000 customers doing transactions with a pool of 800 merchants.

Tools and Pipeline

In order to run the full gamut of data storage, data processing, and data visualization we needed to leverage three tools from the AWS ecosystem:

Built With

Pipeline

Fraud

Amazon S3

S3 is used in our pipeline as the data and model result storage solution that allows our sagemaker and quicksight tools to integrate with one another. Folder structures can be created and files stored can be accessed by any AWS Ecosystem.

S3-Bucket Definition

Fraud

S3 Folder Structure

Fraud

Amazon Sagemaker

SageMaker is used as a Machine Learning tool using AWS cloud compute to build and run a fraud detection algorithm, using XGBoost, to help predict future fraud.

Sagemakers provides jupyter lab to build models and explore data using python and R. It provides sophisticated environments with packages from PyTorch to Tensorflow. It can also build automated models using XGB and provide end to end description of the data. Check out:

Once the model is built, it can be deployed to create an endpoint, this endpoint can be used to access the model and get predictions.

Fraud

Endpoint hosted

Fraud

Endpoint Output from sample data pushed to the model

Fraud

Amazon QuickSight

Quicksight is the final stage of visualization and allows us to build a dashboard that interactively displays information about our dataset, and the fraud results from our model.

Fraud Fraud Fraud Fraud

Conclusion

S3 provides stable data storage. Sagemaker can be used to create and deploy models very easily. Quicksight can be leveraged to generate dashboards and reports. All the 3 tools are easy to use and access.