Tuesday 7 March 2017

Secrets of Parallelisation using OpenCL - Intels SDK

Well this is exciting with my newly installed Scientific Fedora running on the Acer Aspire One Cloudbook (Low cost parallel computing) I have discovered how to use the N3050 Parallel features of this Intel chipset.

If you'd read my previous post you would know that although having a chipset capable of Paralell processing there are limitations on number precision solveable by using BOOST multiprecision. There are librarys developed to use OpenCL - Intels SDK for writing Parallel Programs. But I prefer not to use Boost - I like to keep things inhouse!

The main problem I had initially was that all my Algo's are written in C++ and use Objects with functions, I needed a way to get the Arrays contained in each Object or across many Objects processed by OpenCL to make each function parallel.

A good example is the Neuron Centered Algo that has each Neuron as a instantiation of the Neuron Class Object. So I did some testing and discovered that OpenCL uses C and will accept Structs which are the C equilivant of C++ Objects or Classes. A simple conversion between an Array of Objects to an Array of Structs allows you to do this easily and pass Object Variables  and Arrays of Object to be processed in Parallel using OpenCL.

A copy of the functions used by each Object in the Hierachy of Objects in your program can then be rewritten in C and listed within a seperate file with the *.cl prefix. This is then loaded into the Kernel together with a definition of each struct for each Object and its Variables.

After condensing some of the routines required to do this I then had a program that could update each Objects variables in parallel using a function listed in this file.

With this technique under my belt I can now examine my Algo and find all the functions and processes within functions that I can list and run in parallel. OpenCL simply needs to know the Struct type declaration at the start.

In the adjoining C++ coded Algorithm a Struct is declared for each object like so

class myclass{

public:

int valuex;
int valuey;
int valuez;

void function_x_y(){         //This is the function that is rewritten externally in functions.cl

valuez = valuex * valuey;

}

};

typedef struct{              //Here is the Struct that defines the Object
int valuex;
int valuey;
int valuez;

}myclass_;

int main(){

myclass* thisclass;                     //Create and Array of Objects
thisclass = new myclass[100]; 

myclass_* p_struct;                   //Create a matching Array of Structs
p_struct = new myclass_[100];

p_struct = (myclass_*)thisclass; //Type Convert Object to Struct


so that when you update in parallel the Struct you also update the Object in Parallel. Fantastico!

The definition of the Struct is made in the Open_CL.cc file and the adjoining C standard functions.cl file

 //Declar Structs for Classes
typedef struct{
int valuex;
int valuey;
int valuez;

}myclass_;

//Example function

__kernel void function_x_y_(__global myclass_* input,long n)
{

int gid = get_global_id(0);
if(gid<n)
input[gid].valuez = input[gid].valuey * input[gid].valuex * 8736;

}

Its early days and I have yet to bring all my Algo's over begining with the ANt and the fast CNN Condor.





No comments:

Post a Comment