Home   News   Free Software   Topics   Jobs   Links   About














Calling Cuda Functions from Fortran



Author: Austen Duffy, Florida State University



Cuda functions can be called directly from fortran programs by using a kernel wrapper as long as some simple rules are followed.

1. Data Types: Make sure you use equivalent data types, these basically follow from fortran --> C conventions. Make sure to specify fortran integers and reals, note that integer*2 is a short int in C, I have had alot of problems trying to use these so I would suggest using integer*4's instead.

integer*4 --> int
real*4 --> float
real*8 --> double
etc.


2. Function Names: Fortran functions are appended with _ so you need to account for this in your cuda function call, e.g. calling function 'kernel_wrapper( )' in fortran will be changed to 'kernel_wrapper_( )' in the pre-processing stage, and so your cuda function should be called 'kernel_wrapper_( )' instead. This does not apply to the cuda kernels since they will not be called in the fortran code.


3. Arrays: Fortran and C use a different storage structure, essentially he opposite of each other, i.e. fortran array(i,j,k) is equivalent to C array[k][j][i]. Since you can only work on 1-D arrays in CUDA, it may be easier to convert them to large vectors before calling the kernel wrapper. For example, if you are sending 3d arrays (say array1 and array2) to the GPU, copy them to temporary vectors in the main fortran program by


       DO i=1,NX
        DO j=1,NY
          DO k=1,NZ
          tempvect1((i-1)*NY*NZ+(j-1)*NZ+k)=array1(i,j,k)
          tempvect2((i-1)*NY*NZ+(j-1)*NZ+k)=array2(i,j,k)
          END DO
        END DO
       END DO


Where NX, NY and NZ are the sizes of the x, y and z dimensions respectively. The arrays can be copied back after the kernel call in the same manner if necessary.


4. Compilation: To compile, first use the nvcc compiler to create an object file from the .cu file using the -c option, e.g. 'nvcc -c cudatest.cu' will create a cudatest.o file, then you compile your fortran code making sure to link to the cuda libraries (-L) and includes (-I) on your machine e.g.

nvcc -c cudatest.cu
gfortran -L /usr/local/cuda/lib -I /usr/local/cuda/include -lcudart -lcuda fortest.f95 cudatest.o

The included libraries may be in a different location on your machine. Note that if your code runs in double precision, you will need to add the nvcc compiler option -arch sm_13, which requires a version 1.3 GPU architecture.



A sample code set complete with makefile is given below demonstrating 1,2 and 4 above.







fortest.f95


PROGRAM fortest

! simple program which creates 2 vectors and adds them in a 
! cuda function

IMPLICIT NONE

integer*4 :: i
integer*4, parameter :: N=8
real*4, Dimension(N) :: a, b

DO i=1,N
  a(i)=i*1.0
  b(i)=2.0
END DO

 print *, 'a = ', (a(i), i=1,N)

  CALL kernel_wrapper(a, b, N)

 print *, 'a + 2 = ', (a(i), i=1,N)

END PROGRAM 




cudatest.cu


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cuda.h>
#include <cuda_runtime.h>


// simple kernel function that adds two vectors
__global__ void vect_add(float *a, float *b, int N)
{
   int idx = threadIdx.x;
   if (idx<N) a[idx] = a[idx] + b[idx];
}

// function called from main fortran program
extern "C" void kernel_wrapper_(float *a, float *b, int *Np)
{
   float  *a_d, *b_d;  // declare GPU vector copies
   
   int blocks = 1;     // uses 1 block of
   int N = *Np;        // N threads on GPU

   // Allocate memory on GPU
   cudaMalloc( (void **)&a_d, sizeof(float) * N );
   cudaMalloc( (void **)&b_d, sizeof(float) * N );

   // copy vectors from CPU to GPU
   cudaMemcpy( a_d, a, sizeof(float) * N, cudaMemcpyHostToDevice );
   cudaMemcpy( b_d, b, sizeof(float) * N, cudaMemcpyHostToDevice );

   // call function on GPU
   vect_add<<< blocks, N >>>( a_d, b_d, N);

   // copy vectors back from GPU to CPU
   cudaMemcpy( a, a_d, sizeof(float) * N, cudaMemcpyDeviceToHost );
   cudaMemcpy( b, b_d, sizeof(float) * N, cudaMemcpyDeviceToHost );

   // free GPU memory
   cudaFree(a_d);
   cudaFree(a_d);
   return;
}




Makefile


Test: fortest.f95 cudatest.o
        gfortran -L /usr/local/cuda/lib -I /usr/local/cuda/include -lcudart 
		         -lcuda fortest.f95 cudatest.o
cudatest.o: cudatest.cu 
        nvcc -c -O3 cudatest.cu
clean:
        rm a.out cudatest.o cudatest.linkinfo