FraudShield

xgboost v0.9.3 · holdout 20%

Fraud is rare.
That makes it hard.

Trained on the ULB public credit card fraud dataset — 284,807 transactions with only 0.172% labelled fraud. Tuned to minimise false positives while catching the long tail of class-imbalanced anomalies.

0.87
F1 Score
0.98
ROC-AUC
0.02%
False Positive Rate

Dataset Composition

0.17%
Fraud Rate
Legit: 284,315Fraud: 492

Transaction Analyser

Adjust PCA components to see live predictions

Model Comparison

Scores on stratified holdout (n=56,961)
ModelPrecRecallF1AUC
Logistic Regression0.870.620.720.94
Random Forest0.910.780.840.97
XGBoost0.930.850.870.98

XGBoost wins on the class-imbalanced F1 after SMOTE oversampling on the training set. Logistic Regression is included as a baseline — it reminds us how far a linear decision surface gets before precision collapses on PCA noise.

Confusion Matrix

XGBoost @ threshold 0.50 — 56,961 holdout samples
Predicted Legit
Predicted Fraud
True Legit
56,850
True Negative
12
False Positive
True Fraud
15
False Negative
85
True Positive

ROC Curve

TPR vs FPR · operating point at t=0.50
t=0.50 0.0 FPR 1.0 TPR
AUC = 0.98
Dataset ULB 2013 public Samples 284,807 Fraud 492 (0.172%) Resampling SMOTE k=5 (train only) Last trained 2025-01-14