The AI/ML Engineer’s Starter Guide to GPU Programming

Author: Alex Razvant — Senior Software/AI Engineer @ Axon Publication: The AI Merge (9.5K+ subscribers) Date: January 30, 2025 Access: Free article

Programming on GPUs from scratch by implementing CUDA Kernels in C++, CuPy Python, and OpenAI Triton.

Overview

A hands-on starter guide covering GPU programming from the ground up — no prior GPU experience required. Walks through implementing the same kernel in three different approaches.

Sections

NVIDIA CUDA (C++) — build a basic CUDA kernel, then a more complex one
CuPy (Python) — CUDA-equivalent operations using CuPy’s NumPy-like API
CUDA Alternatives — when to use what
OpenAI Triton — a higher-level language for GPU programming

Key Topics

Writing CUDA kernels in C++ from scratch
GPU memory management
Kernel launch configuration (grids, blocks, threads)
CuPy for Python-native GPU programming
OpenAI Triton as a more accessible alternative to raw CUDA

LLMs from Scratch
AI Engineering From Scratch

description	Programming on GPUs from scratch — implementing CUDA Kernels in C++, CuPy Python, and OpenAI Triton.
tags	gpu-programming, cuda, triton, cupy, ml-engineering, tutorial

Huy's Wiki

Explorer

The AI/ML Engineer's Starter Guide to GPU Programming

The AI/ML Engineer’s Starter Guide to GPU Programming

Overview

Sections

Key Topics

Graph View

Table of Contents

Backlinks

Huy's Wiki

Explorer

The AI/ML Engineer's Starter Guide to GPU Programming

The AI/ML Engineer’s Starter Guide to GPU Programming

Overview

Sections

Key Topics

Related

Graph View

Table of Contents

Backlinks