The AI/ML Engineer’s Starter Guide to GPU Programming

Author: Alex Razvant — Senior Software/AI Engineer @ Axon Publication: The AI Merge (9.5K+ subscribers) Date: January 30, 2025 Access: Free article

Programming on GPUs from scratch by implementing CUDA Kernels in C++, CuPy Python, and OpenAI Triton.

Overview

A hands-on starter guide covering GPU programming from the ground up — no prior GPU experience required. Walks through implementing the same kernel in three different approaches.

Sections

  1. NVIDIA CUDA (C++) — build a basic CUDA kernel, then a more complex one
  2. CuPy (Python) — CUDA-equivalent operations using CuPy’s NumPy-like API
  3. CUDA Alternatives — when to use what
  4. OpenAI Triton — a higher-level language for GPU programming

Key Topics

  • Writing CUDA kernels in C++ from scratch
  • GPU memory management
  • Kernel launch configuration (grids, blocks, threads)
  • CuPy for Python-native GPU programming
  • OpenAI Triton as a more accessible alternative to raw CUDA