|
|
Calling Cuda Functions from FortranAuthor: Austen Duffy, Florida State University Cuda functions can be called directly from fortran programs by using a kernel wrapper as long as some simple rules are followed. 1. Data Types: Make sure you use equivalent data types, these basically follow from fortran --> C conventions. Make sure to specify fortran integers and reals, note that integer*2 is a short int in C, I have had alot of problems trying to use these so I would suggest using integer*4's instead. integer*4 --> int real*4 --> float real*8 --> double etc. 2. Function Names: Fortran functions are appended with _ so you need to account for this in your cuda function call, e.g. calling function 'kernel_wrapper( )' in fortran will be changed to 'kernel_wrapper_( )' in the pre-processing stage, and so your cuda function should be called 'kernel_wrapper_( )' instead. This does not apply to the cuda kernels since they will not be called in the fortran code. 3. Arrays: Fortran and C use a different storage structure, essentially he opposite of each other, i.e. fortran array(i,j,k) is equivalent to C array[k][j][i]. Since you can only work on 1-D arrays in CUDA, it may be easier to convert them to large vectors before calling the kernel wrapper. For example, if you are sending 3d arrays (say array1 and array2) to the GPU, copy them to temporary vectors in the main fortran program by
Where NX, NY and NZ are the sizes of the x, y and z dimensions respectively. The arrays can be copied back after the kernel call in the same manner if necessary. 4. Compilation: To compile, first use the nvcc compiler to create an object file from the .cu file using the -c option, e.g. 'nvcc -c cudatest.cu' will create a cudatest.o file, then you compile your fortran code making sure to link to the cuda libraries (-L) and includes (-I) on your machine e.g. nvcc -c cudatest.cu gfortran -L /usr/local/cuda/lib -I /usr/local/cuda/include -lcudart -lcuda fortest.f95 cudatest.o The included libraries may be in a different location on your machine. Note that if your code runs in double precision, you will need to add the nvcc compiler option -arch sm_13, which requires a version 1.3 GPU architecture. A sample code set complete with makefile is given below demonstrating 1,2 and 4 above. fortest.f95
cudatest.cu
Makefile |
|