How to build a Simulink block using NVIDIA CUDA

Building Simulink Blocks to exploit the parallelism with NVIDIA GPUs

First you have to write a Simulink S-function, which will function as an interface between the NVIDIA CUDA code and the Simulink model. An example S-function which has one input port and one output port, each of which are 120x160 matrices, is given below.

#define S_FUNCTION_NAME simuCuda /* Defines and Includes */

#define S_FUNCTION_LEVEL 2

#include "simstruc.h"

static void mdlInitializeSizes(SimStruct *S)

{

ssSetNumSFcnParams(S, 0);

if (ssGetNumSFcnParams(S) != ssGetSFcnParamsCount(S)) {

return; /* Parameter mismatch reported by the Simulink engine*/

}

if (!ssSetNumInputPorts(S, 1)) return;

if (!ssSetNumOutputPorts(S,1)) return;

ssSetInputPortDirectFeedThrough(S, 0, 1);

ssSetInputPortRequiredContiguous(S,0,1);

ssSetInputPortMatrixDimensions(S,0,120,160);

ssSetOutputPortMatrixDimensions(S,0,120,160);

ssSetInputPortDataType(S, 0, SS_UINT8);

ssSetOutputPortDataType(S, 0, SS_UINT8);

ssSetNumSampleTimes(S, 1);

ssSetOptions(S, SS_OPTION_EXCEPTION_FREE_CODE);

}

static void mdlInitializeSampleTimes(SimStruct *S)

{

ssSetSampleTime(S, 0, INHERITED_SAMPLE_TIME);

ssSetOffsetTime(S, 0, 0.0);

//one time CUDA initialization

if(!InitCUDA()) {

ssPrintf("cuda initialization failed !!!\n");

}

//this function is called once for each input sample

static void mdlOutputs(SimStruct *S, int_T tid)

{

uint8 *x = (uint8*) ssGetInputPortSignal(S,0);

uint8 *y = (uint8*) ssGetOutputPortSignal(S,0);

processNewFrame(x, y);

}

static void mdlTerminate(SimStruct *S)

{}

/* Simulink/Real-Time Workshop Interface */

#ifdef MATLAB_MEX_FILE /* Is this file being compiled as a MEX-file? */

#include "simulink.c" /* MEX-file interface mechanism */

#else

#include "cg_sfun.h" /* Code generation registration function */

#endif

In the code above we perform one time CUDA initialization inside either the mdlInitializeSampleTimes or mdlInitializeSizes functions. We call the processNewFrame CUDA function for each data sample, which will in turn pass the input and output parameters to the GPU for processing.

The CUDA code required to process the Simulink samples can be included in the same source file. This combined source file (CUDA + Simulink S-function) named “simuCuda.cu” can be compiled from inside Matlab using the following command line. This will create a mex file named simuCuda.mex32.

nvmex -f nvmexopts.bat simuCuda.cu -IF:\cuda\include -I'F:\cuda\cuda sdk\common\inc' -LF:\cuda\lib -L'F:\cuda\cuda sdk\common\lib' -lcudart -lcutil32

Note: You have to list all the “include” and “library” directories of CUDA, using –L, -I and –l flags.

Copy all the CUDA dlls that corresponds to the library files that you are using (e.g. cudart.dll, cutil32.dll etc) to the same directory where the simuCuda.mex32 file is located.

Create a Simulink model and add an S-function block. In the block’s parameters give the name simuCuda. You can now run the model and observe the performance gain obtained by using the GPU.

The tools required for the above compilation are

· nvmex.pl – copy this file to the MATLAB “bin” directory

· nvmex_helper.m

· nvmex.m

· nvmexopts.bat

All of the above compilation tools and the combined source file, “simuCuda.cu” are included in the zip file below (for Matlab 2008a and up).

Source and Tools