GPU Architecture - Our Imaginary Model

Global Memory (GMEM)
Large memory. Any part of the GPU can access it.
Streaming Multiprocessors (SMs)
Each SM holds blocks of warps (32 threads with contiguous thread IDs, executing in SIMT style).
SM Local Resources
- Shared Memory (SMEM): Fast, user-managed memory shared across threads in a block.
- Registers: Each thread has its own private registers.
- Execution Units: Threads issue load/store/compute instructions via dedicated units.