6 Step Optimization of GeMMs in CUDA I aim to take a naive implementation of single-precision (FP32) General Matrix Multiplication (GeMM) and optimize it so its computations can be parallelized effectively on GPUs with CUDA C/C++.
Low-Precision Arithmetic in ML Systems Have you ever wondered how modern AI systems handle billions of calculations without melting your computer? The secret sauce lies in something called low-precision arithmetic. Let’s dive into what this means and why it matters.
CUDA 4: Profiling CUDA Kernels Some tools, metrics, and techniques for CUDA kernel profiling, making the optimization process more systematic and approachable.
CUDA 3: Your Checklist for Optimizing CUDA Kernels How to optimize CUDA kernels and how we can build intuition behind kernel optimizations.
7 Step Optimization of Parallel Reduction with CUDA Taking a simple parallel reduction and optimize it in 7 steps.
A Checklist for Your Next SWE Interview How to best prepare for your next software engineering interview–a month before, a week before, a day before!
Transform Your Networking Skills: 5 Steps to Building Powerful Connections for Recruitment Season How to connect with recruiters and mentors in a way that is meaningful and long-lasting.
Ultimate Timeline for Landing a Summer SWE Internship How you can prepare now to land your dream summer internship in summer 2025. Short and simple.
Why Backpropagation Falls Short of Its True Purpose Let’s uncover how backpropagation is drifting away from truly recreating the brain’s learning process.
Simplify Reinforcement Learning Models (Conceptually) A beginner-friendly guide to understanding key concepts and strategies in Reinforcement Learning, revealing how they seamlessly come together.
Introduction to Kolmogorov-Arnold Networks (KANs) Introduction to a new contender to MLPs, KANs and their new approach to neural network designing.
Maximizing Grace Hopper Conference 2025 What to expect and how to prepare for the Biggest Tech Conference for Women!