GPU Architecture - Our Imaginary Model
-
Global Memory (GMEM)
Large memory. Any part of the GPU can access it. -
Streaming Multiprocessors (SMs)
Each SM holds blocks of warps (32 threads with contiguous thread IDs, executing in SIMT style). -
SM Local Resources
- Shared Memory (SMEM): Fast, user-managed memory shared across threads in a block.
- Registers: Each thread has its own private registers.
- Execution Units: Threads issue load/store/compute instructions via dedicated units.