Projects

Implementations, experiments, and explorations.

ML & Deep Learning

GPT-2 Speedrun: Single-Node Multi-GPU Pre-Training (DDP)GitHub →
PyTorch, Distributed Data Parallel (DDP), torch.compile, AMP (BF16/FP16)
  • Implemented an end-to-end GPT-2 (124M) pre-training stack with DDP gradient accumulation, cosine LR + warmup scheduling, checkpoint/resume, and optional initialization from HuggingFace GPT-2 weights.
  • Optimized throughput via torch.compile, fused AdamW (CUDA), TF32 matmul, Flash SDP attention when available, pinned-memory non-blocking transfers, and mixed precision (BF16/FP16 w/ GradScaler).
BeaconGradGitHub →
Python, NumPy
  • A NumPy-based tensor automatic-differentiation (autograd) engine with broadcasting-aware backprop, neural modules, and optimizers; validated gradients via finite-difference gradchecks and float64 PyTorch parity tests.
Optimized YOLOv11 for Document Layout Recognition and Inference
PyTorch, YOLO, TensorRT, onnxruntime, OpenVINO
  • Fine-tuned YOLOv11 on DocLayNet for document layout analysis (captions, footnotes, formulas, etc.).
  • Accelerated inference via TensorRT, ONNXRUNTIME, and OpenVINO, achieving scalable batch processing with threaded execution.
Expandable Subspace Ensemble for Class-Incremental LearningGitHub →
PyTorch, NumPy
  • Implemented a subspace expansion technique to retain previous classes without forgetting, benchmarked on CIFAR-10 from scratch.

Generative & Probabilistic

Discrete Walk-Jump Sampling for Protein DiscoveryGitHub →
PyTorch, Energy-Based Models, Langevin MCMC, Contrastive Divergence, Denoising Networks
  • Implemented Discrete Walk-Jump Sampling for antibody sequence generation using EBMs trained via contrastive divergence.
  • Employed Langevin MCMC for exploration and one-step denoising for refinement, optimizing sampling efficiency and sequence quality.
Concrete Score Matching: Generalized Score Matching for Discrete DataGitHub →
PyTorch, NumPy, Concrete Score Matching, Metropolis–Hastings
  • Implemented the CSM algorithm to learn score functions in discrete spaces.
  • Used Metropolis–Hastings sampling for data generation and visualized true vs. generated distributions.