Skeeloo's Laboratory: CUDA Quick Tips, Reference, and Cheat Sheets

Here are some quick tips and references I strung together while I'm learning CUDA

A. Size of a Grid:

gridDim.x (1Dimensional)
gridDim.x (2Dimensional, assuming a N x N Grid)

B. Size of a Block:

blockDim.x (1Dimensional)
blockDim.x (2Dimensional, assuming a N x N Block)

C. Thread Local Index within its block (assuming a 1Dimensional Block):

threadIdx.x

D. Block Local Index

blockIdx.x (1Dimensional)
blockIdx.x (2Dimensional) --> Current Column Index (Length) of a N x N Block
blockIdx.y (2Dimensional) --> Current Row Index (Height) of a N x N Block

E. Thread Global Index across the entire grid (assuming a 1 Dimensional Grid):

(blockDim.x * blockIdx.x) + threadIdx.x

F. Thread Local Index within its block (assuming a 2Dimensional Block):

F-1.Obtain current column index (assuming you have a N x N Block):

(blockIdx.x * blockDimx.x) + threadIdx.x

F-2. Obtain current row index (assuming you have a N x N Block):

(blockIdx.y * blockDimx.x) + threadIdx.y

Since you have a N x N Block, the Length and Height are the same.

Quick Example

N = 1024. You have to process N x N elements (1024 x 1024). You could decompose the grid as so: You could set the blockSize to 64. Then gridSize = numElements / blockSize --> gridSize = 1024 / 64 = 16. Maybe not the most efficient way, but since it's only an example it will do!

So your grid is composed of 4096 Blocks (64 x 64), and each Block is composed of 256 threads (16 x 16).

Total Blocks * Total Threasd per Block = 4096 * 256 = 1,048576 = N * N = 1024 * 1024.

To process each element serially, you would probably have a nested for loop:

for (each col)
for (each row)
process element

To access each element for processing in CUDA (assuming you are storing results in a 1D array):

(Global Row * Number of Elements) + Global Column
Global Row = (blockIdx.y * blockDimx.x + threadIdx.y)
Global Column = (blockIdx.x * blockDimx.x + threadIdx.x)
Number of Elements = N = Number of elements Length wise (1024 in my example)

More quick tips in the future ...

Skeeloo's Laboratory

About Me

Thursday, June 10, 2010

CUDA Quick Tips, Reference, and Cheat Sheets

No comments:

Post a Comment

Blog Archive