← Back to Projects
·Machine Learning

Khmer MNIST Digit Classification

Custom Khmer handwritten digit classification pipeline with advanced preprocessing and a scratch-built ResNet model

PythonPyTorchResNetOpenCVNumPyScikit-learn
Khmer MNIST Digit Classification
3 images

Developed and improved a Khmer handwritten digit classification system by building a compact ResNet architecture from scratch and designing an advanced preprocessing pipeline tailored to noisy, real-world character images. Applied augmentation, normalization, and denoising strategies to improve robustness and generalization. Trained and validated the model with careful experimentation on architecture depth, optimization settings, and regularization, ultimately achieving 99.3% validation accuracy. This contribution demonstrates practical deep learning model design, data-centric optimization, and strong performance for Khmer OCR foundations.

Khmer MNIST Digit Classification

A Khmer handwritten digit classification project using advanced preprocessing and a custom ResNet model built from scratch.

Project Overview

This project focuses on Khmer handwritten digit recognition with a strong data-centric and model-centric approach. I contributed by improving the full training pipeline, including advanced preprocessing and a custom ResNet implementation, to increase classification performance and stability.

Key Features

  • Custom ResNet Architecture: Built a lightweight ResNet model from scratch for Khmer digit classification
  • Advanced Preprocessing: Applied denoising, normalization, and image enhancement for cleaner model input
  • Robust Training Pipeline: Used data augmentation and optimization tuning to improve generalization
  • High Validation Performance: Achieved 99.3% accuracy on the validation set
  • Low-Resource Adaptation: Designed the solution to perform well on Khmer handwritten character patterns

Technical Stack

  • Framework: PyTorch for model development and training
  • Model Design: Custom ResNet blocks implemented from scratch
  • Data Processing: OpenCV and NumPy for preprocessing and transformation pipeline
  • Evaluation: Scikit-learn metrics for validation analysis and performance tracking

Achievements

  • Improved the baseline model by redesigning architecture and preprocessing strategy
  • Built and trained a scratch ResNet model tailored for Khmer digit recognition
  • Reached 99.3% validation accuracy through iterative experimentation and optimization