Rimika Writes

Sign in Subscribe

Latest

6 Step Optimization of GeMMs in CUDA

6 Step Optimization of GeMMs in CUDA

I aim to take a naive implementation of single-precision (FP32) General Matrix Multiplication (GeMM) and optimize it so its computations can be parallelized effectively on GPUs with CUDA C/C++.

Low-Precision Arithmetic in ML Systems

Low-Precision Arithmetic in ML Systems

Have you ever wondered how modern AI systems handle billions of calculations without melting your computer? The secret sauce lies in something called low-precision arithmetic. Let’s dive into what this means and why it matters.

CUDA 4: Profiling CUDA Kernels

CUDA 4: Profiling CUDA Kernels

Some tools, metrics, and techniques for CUDA kernel profiling, making the optimization process more systematic and approachable.

CUDA 3: Your Checklist for Optimizing CUDA Kernels

CUDA 3: Your Checklist for Optimizing CUDA Kernels

How to optimize CUDA kernels and how we can build intuition behind kernel optimizations.

CUDA 1: GPU v/s CPU

CUDA 1: GPU v/s CPU

Taking it a step further from the basics and comparing CPU's and GPUs!

CUDA 0: From OS to GPUs

CUDA 0: From OS to GPUs

Let's get started with CUDA and learn the basics of Parallel Programming

7 Step Optimization of Parallel Reduction with CUDA

7 Step Optimization of Parallel Reduction with CUDA

Taking a simple parallel reduction and optimize it in 7 steps.

A Checklist for Your Next SWE Interview

A Checklist for Your Next SWE Interview

How to best prepare for your next software engineering interview–a month before, a week before, a day before!

Transform Your Networking Skills: 5 Steps to Building Powerful Connections for Recruitment Season

Transform Your Networking Skills: 5 Steps to Building Powerful Connections for Recruitment Season

How to connect with recruiters and mentors in a way that is meaningful and long-lasting.

Ultimate Timeline for Landing a Summer SWE Internship

Ultimate Timeline for Landing a Summer SWE Internship

How you can prepare now to land your dream summer internship in summer 2025. Short and simple.

Why Backpropagation Falls Short of Its True Purpose

Why Backpropagation Falls Short of Its True Purpose

Let’s uncover how backpropagation is drifting away from truly recreating the brain’s learning process.

Simplify Reinforcement Learning Models (Conceptually)

Simplify Reinforcement Learning Models (Conceptually)

A beginner-friendly guide to understanding key concepts and strategies in Reinforcement Learning, revealing how they seamlessly come together.

Introduction to Kolmogorov-Arnold Networks (KANs)

Introduction to Kolmogorov-Arnold Networks (KANs)

Introduction to a new contender to MLPs, KANs and their new approach to neural network designing.

Maximizing Grace Hopper Conference 2025

Maximizing Grace Hopper Conference 2025

What to expect and how to prepare for the Biggest Tech Conference for Women!