[ACCEPTED]-Allocate 2D Array on Device Memory in CUDA-device
Accepted answer
I found a solution to this problem. I didn't 8 have to flatten the array.
The inbuilt cudaMallocPitch()
function 7 did the job. And I could transfer the array 6 to and from device using cudaMemcpy2D()
function.
For example
cudaMallocPitch((void**) &array, &pitch, a*sizeof(float), b);
This 5 creates a 2D array of size a*b with the 4 pitch as passed in as parameter.
The following 3 code creates a 2D array and loops over the 2 elements. It compiles readily, you may use 1 it.
#include<stdio.h>
#include<cuda.h>
#define height 50
#define width 50
// Device code
__global__ void kernel(float* devPtr, int pitch)
{
for (int r = 0; r < height; ++r) {
float* row = (float*)((char*)devPtr + r * pitch);
for (int c = 0; c < width; ++c) {
float element = row[c];
}
}
}
//Host Code
int main()
{
float* devPtr;
size_t pitch;
cudaMallocPitch((void**)&devPtr, &pitch, width * sizeof(float), height);
kernel<<<100, 512>>>(devPtr, pitch);
return 0;
}
Your device code could be faster. Try utilizing 4 the threads more.
__global__ void kernel(float* devPtr, int pitch)
{
int r = threadIdx.x;
float* row = (float*)((char*)devPtr + r * pitch);
for (int c = 0; c < width; ++c) {
float element = row[c];
}
}
Then you calculate the 3 blocks and threads allocation appropriate 2 so that each thread deals with a single 1 element.
Source:
stackoverflow.com
More Related questions
Cookie Warning
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.