Projects
Implementations, experiments, and explorations.
✦
ML & Deep Learning
GPT-2 Speedrun: Single-Node Multi-GPU Pre-Training (DDP)GitHub →
PyTorch, Distributed Data Parallel (DDP), torch.compile, AMP (BF16/FP16)
- Implemented an end-to-end GPT-2 (124M) pre-training stack with DDP gradient accumulation, cosine LR + warmup scheduling, checkpoint/resume, and optional initialization from HuggingFace GPT-2 weights.
- Optimized throughput via torch.compile, fused AdamW (CUDA), TF32 matmul, Flash SDP attention when available, pinned-memory non-blocking transfers, and mixed precision (BF16/FP16 w/ GradScaler).
BeaconGradGitHub →
Python, NumPy
- A NumPy-based tensor automatic-differentiation (autograd) engine with broadcasting-aware backprop, neural modules, and optimizers; validated gradients via finite-difference gradchecks and float64 PyTorch parity tests.
Optimized YOLOv11 for Document Layout Recognition and Inference
PyTorch, YOLO, TensorRT, onnxruntime, OpenVINO
- Fine-tuned YOLOv11 on DocLayNet for document layout analysis (captions, footnotes, formulas, etc.).
- Accelerated inference via TensorRT, ONNXRUNTIME, and OpenVINO, achieving scalable batch processing with threaded execution.
Expandable Subspace Ensemble for Class-Incremental LearningGitHub →
PyTorch, NumPy
- Implemented a subspace expansion technique to retain previous classes without forgetting, benchmarked on CIFAR-10 from scratch.
Generative & Probabilistic
Discrete Walk-Jump Sampling for Protein DiscoveryGitHub →
PyTorch, Energy-Based Models, Langevin MCMC, Contrastive Divergence, Denoising Networks
- Implemented Discrete Walk-Jump Sampling for antibody sequence generation using EBMs trained via contrastive divergence.
- Employed Langevin MCMC for exploration and one-step denoising for refinement, optimizing sampling efficiency and sequence quality.
Concrete Score Matching: Generalized Score Matching for Discrete DataGitHub →
PyTorch, NumPy, Concrete Score Matching, Metropolis–Hastings
- Implemented the CSM algorithm to learn score functions in discrete spaces.
- Used Metropolis–Hastings sampling for data generation and visualized true vs. generated distributions.
✦