A multiprocessor can process several blocks concurrently by Partitioning among them the sets of registers and the shared memory. More precisely, the number of registers available per thread is equal to the total number of registers per multiprocessor divided by the number of concurrent threads rounded up to the nearest multiple of 64, where the number of concurrent threads is equal to the number of concurrent blocks multiplied by the number of threads per block.
Source : www.nvidia.com
Download