The AI/ML Engineer’s Starter Guide to GPU Programming
Author: Alex Razvant — Senior Software/AI Engineer @ Axon Publication: The AI Merge (9.5K+ subscribers) Date: January 30, 2025 Access: Free article
Programming on GPUs from scratch by implementing CUDA Kernels in C++, CuPy Python, and OpenAI Triton.
Overview
A hands-on starter guide covering GPU programming from the ground up — no prior GPU experience required. Walks through implementing the same kernel in three different approaches.
Sections
- NVIDIA CUDA (C++) — build a basic CUDA kernel, then a more complex one
- CuPy (Python) — CUDA-equivalent operations using CuPy’s NumPy-like API
- CUDA Alternatives — when to use what
- OpenAI Triton — a higher-level language for GPU programming
Key Topics
- Writing CUDA kernels in C++ from scratch
- GPU memory management
- Kernel launch configuration (grids, blocks, threads)
- CuPy for Python-native GPU programming
- OpenAI Triton as a more accessible alternative to raw CUDA