I was coding away on an assignment when I ran into a conundrum: I was getting weird results when attempting to copy data onto the device. There would be instances when arrays copied onto the device would be accessible, yet inaccessible during another run.
The fact I'm coding CUDA kernels on OS X gives way to a dilemma: cuda-gdb is not available (yet) on OS X. I have to rely on old school debugging techniques ... a code walk through and print statements! After numerous tests and frustrations ... I figured out I was running into a problem with global memory:
marklagatuz$ /Developer/GPU\ Computing/C/bin/darwin/release/deviceQuery
Device 0: "GeForce 9400M"
Total amount of global memory: 265945088 bytes
The above reads approximately 265MB of global memory. I had 4 arrays consisting of 67MB each being copied onto the device. I was clearly running into memory issues. This would explain why each time a different array would cause problems.
Lesson learned: Check your device(s) limitations before coding away! Then again ... you should be doing that anyways!
Monday, June 21, 2010
Tuesday, June 15, 2010
Learned Something New (or actually a review of something old)!
Since I'm forcing myself to think in terms of OO (Object Orientation), I forgot that computers are still 1 and 0's! As I'm reading code to understand design patterns, algorithms, and methods others folks are using, I came across something I've never used before (at least in my own code): shift operators.
(1 << 24) == 0001 1111 1111 1111 1111 1111 1111
- <<
- >>
(1 << 24) == 0001 1111 1111 1111 1111 1111 1111
CUDA + THRUST + Eclipse
Quickstart
Assumptions: A working CUDA environment (I'm using OS X for this example).
2. Select a location and unzip the thrust library. You can unzip the library into the default cuda include location (/usr/local/cuda/include). I prefer to unzip the library in my home directory, (specifically the Downloads directory) but it's up to the user!
3. Add the libraries within your project in Eclipse
The code compiled after removing /thrust from the -I on the command line (absolute path up to the thrust library).
--
References:
1. Thrust QuickStartGuide
Assumptions: A working CUDA environment (I'm using OS X for this example).
- nvcc --version --> should display CUDA Version, built date, and version of tools installed.
- ./deviceQuery from /Developer/GPU Computing/C/bin/darwin/release (for OS X) should produce output for your device
2. Select a location and unzip the thrust library. You can unzip the library into the default cuda include location (/usr/local/cuda/include). I prefer to unzip the library in my home directory, (specifically the Downloads directory) but it's up to the user!
- unzip thrust-v1.3.0.zip
3. Add the libraries within your project in Eclipse
- Project Name --> Properties
- C/C++ Build --> Settings
- CUDA NVCC Compiler --> Includes
- Add (On the same line as Include Paths - green + button)
The code compiled after removing /thrust from the -I on the command line (absolute path up to the thrust library).
--
References:
1. Thrust QuickStartGuide
- http://code.google.com/p/thrust/wiki/QuickStartGuide
Thursday, June 10, 2010
CUDA Quick Tips, Reference, and Cheat Sheets
Here are some quick tips and references I strung together while I'm learning CUDA
A. Size of a Grid:
B. Size of a Block:
C. Thread Local Index within its block (assuming a 1Dimensional Block):
D. Block Local Index
E. Thread Global Index across the entire grid (assuming a 1 Dimensional Grid):
F. Thread Local Index within its block (assuming a 2Dimensional Block):
F-1.Obtain current column index (assuming you have a N x N Block):
Quick Example
N = 1024. You have to process N x N elements (1024 x 1024). You could decompose the grid as so: You could set the blockSize to 64. Then gridSize = numElements / blockSize --> gridSize = 1024 / 64 = 16. Maybe not the most efficient way, but since it's only an example it will do!
So your grid is composed of 4096 Blocks (64 x 64), and each Block is composed of 256 threads (16 x 16).
Total Blocks * Total Threasd per Block = 4096 * 256 = 1,048576 = N * N = 1024 * 1024.
To process each element serially, you would probably have a nested for loop:
To access each element for processing in CUDA (assuming you are storing results in a 1D array):
More quick tips in the future ...
A. Size of a Grid:
- gridDim.x (1Dimensional)
- gridDim.x (2Dimensional, assuming a N x N Grid)
B. Size of a Block:
- blockDim.x (1Dimensional)
- blockDim.x (2Dimensional, assuming a N x N Block)
C. Thread Local Index within its block (assuming a 1Dimensional Block):
- threadIdx.x
D. Block Local Index
- blockIdx.x (1Dimensional)
- blockIdx.x (2Dimensional) --> Current Column Index (Length) of a N x N Block
- blockIdx.y (2Dimensional) --> Current Row Index (Height) of a N x N Block
E. Thread Global Index across the entire grid (assuming a 1 Dimensional Grid):
- (blockDim.x * blockIdx.x) + threadIdx.x
F. Thread Local Index within its block (assuming a 2Dimensional Block):
F-1.Obtain current column index (assuming you have a N x N Block):
- (blockIdx.x * blockDimx.x) + threadIdx.x
- (blockIdx.y * blockDimx.x) + threadIdx.y
Quick Example
N = 1024. You have to process N x N elements (1024 x 1024). You could decompose the grid as so: You could set the blockSize to 64. Then gridSize = numElements / blockSize --> gridSize = 1024 / 64 = 16. Maybe not the most efficient way, but since it's only an example it will do!
So your grid is composed of 4096 Blocks (64 x 64), and each Block is composed of 256 threads (16 x 16).
Total Blocks * Total Threasd per Block = 4096 * 256 = 1,048576 = N * N = 1024 * 1024.
To process each element serially, you would probably have a nested for loop:
for (each col)
for (each row)
process element
To access each element for processing in CUDA (assuming you are storing results in a 1D array):
- (Global Row * Number of Elements) + Global Column
- Global Row = (blockIdx.y * blockDimx.x + threadIdx.y)
- Global Column = (blockIdx.x * blockDimx.x + threadIdx.x)
- Number of Elements = N = Number of elements Length wise (1024 in my example)
More quick tips in the future ...
Tuesday, June 1, 2010
Quickstart: CUDA using Bayreuth University CUDA Toolchain for Eclipse
I've been trolling through Google for a simple solution in integrating CUDA with Eclipse, and found a University which built an Eclipse plugin. This is a fantastic solution because my previous attempts required me to create my own Makefile (which partially defeats the purpose of using Eclipse!)
Here is my Quickstart for the plugin
Assumptions:
*** UPDATE ***
When attempting to build my project, I was getting the following error message during the build phase:
make all
Building target: CUDAToolchainProject
ld: unknown option: -oCUDAToolchainProject
I tracked the problem down to not having "whitespaces" in between the following:
Invoking: C++ Linker
g++ -L/usr/local/cuda/lib -o "CUDAToolchainProject" ./src/cu_mandelbrotCUDA_D.o ./src/cu_mandelbrotCUDA_H.o -lcudart
ld: warning: in ./src/cu_mandelbrotCUDA_D.o, file is not of required architecture
ld: warning: in ./src/cu_mandelbrotCUDA_H.o, file is not of required architecture
ld: warning: in /usr/local/cuda/lib/libcudart.dylib, file is not of required architecture
Undefined symbols:
"_main", referenced from:
start in crt1.10.6.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make: *** [CUDAToolchainProject] Error 1
To mitigate this problem ... I changed the C++ Linker from g++ to nvcc
The build phase completed successfully and an executable was generated!
The next steps are optional (If you want to follow Eclipse's general project structure, follow the next steps
4. Create Source Folders (Trivial)
Resources
1. Bayreuth University Website
Here is my Quickstart for the plugin
Assumptions:
- A fully functional C/C++ working environment (within the Eclipse IDE and on the command line)
- A fully functional CUDA environment (including the CUDA Driver, Toolkit, and SDK
- This assumes you are using OS X (Linux should be quite similar)
- Help --> Install New Software -->
- Name =
( for me) - Location = http://www.ai3.inf.uni-bayreuth.de/software/eclipsecudaqt/updates
- Click on Uncategorized --> Toolchains for CUDA and QT Development
- Accept the License Agreement
- Restart Eclipse
- Go to Eclipse --> Preferences
- Click on C/C++ --> Environment
- Under Environment variables to set --> click Add
- Name = PATH (Note: Make sure PATH are all upper case)
- Value = /usr/local/cuda/bin
- Apply and OK
- Ctrl + mouse click --> New --> C++ project
- Under Project type box --> Executable --> select Empty Project
- Name your project
- Uncheck the following: Show project types and toolchains only if they are supported on the platform
- Under Toolchains --> select CUDA Toolchain
- Click Next
- Click on Advanced Settings
- Under C/C++ Build -->Environment --> Confirm PATH is set from previous step (should be USER: PREFS under Origin Column)
- Under C/C++ Build --> Settings --> Tool Settings Tab --> CUDA NVCC Compiler --> Includes --> add /usr/local/cuda/include
- Under C/C++ Build --> C++ Linker --> change Command from g++ to nvcc
- Under C/C++ Build --> C++ Linker --> Libraries --> add cudart to Libraries (-l) and add /usr/local/cuda/lib to Library search path (-L)
- Apply and OK
*** UPDATE ***
When attempting to build my project, I was getting the following error message during the build phase:
make all
Building target: CUDAToolchainProject
ld: unknown option: -oCUDAToolchainProject
I tracked the problem down to not having "whitespaces" in between the following:
- ${OUTPUT_FLAG}${OUTPUT_PREFIX}${OUTPUT}
- This is located at -->
--> Properties --> C/C++ Build --> Settings --> C++ Linker - Under Expert Settings --> Command line pattern
- ${COMMAND} ${FLAGS} ${OUTPUT_FLAG} ${OUTPUT_PREFIX} ${OUTPUT} ${INPUTS}
Invoking: C++ Linker
g++ -L/usr/local/cuda/lib -o "CUDAToolchainProject" ./src/cu_mandelbrotCUDA_D.o ./src/cu_mandelbrotCUDA_H.o -lcudart
ld: warning: in ./src/cu_mandelbrotCUDA_D.o, file is not of required architecture
ld: warning: in ./src/cu_mandelbrotCUDA_H.o, file is not of required architecture
ld: warning: in /usr/local/cuda/lib/libcudart.dylib, file is not of required architecture
Undefined symbols:
"_main", referenced from:
start in crt1.10.6.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make: *** [CUDAToolchainProject] Error 1
To mitigate this problem ... I changed the C++ Linker from g++ to nvcc
- Properties --> C/C++ Build --> Settings --> C++ Linker
- Command --> change from g++ to nvcc
The build phase completed successfully and an executable was generated!
The next steps are optional (If you want to follow Eclipse's general project structure, follow the next steps
4. Create Source Folders (Trivial)
- Ctrl + mouse click --> New --> Source Folder
- Name your folder
Resources
1. Bayreuth University Website
- http://www.ai3.inf.uni-bayreuth.de/software/eclipsecudaqt/updates
- http://forums.nvidia.com/index.php?showtopic=160564
- http://lifeofaprogrammergeek.blogspot.com/2008/07/using-eclipse-for-cuda-development.html
Labels:
Bayreuth University,
cuda,
Eclipse,
Eclipse CUDA Plugin,
Eclipse plugin,
OS X
Subscribe to:
Posts (Atom)