parallel processing - how to parallelize dct (for loops) in cuda -
how parallelize 4 nested loops in cuda
in case of dct have 4 nested loops want dct function in cuda code
for(y = 0; y < height; y+=block_h) { for(x = 0; x < width; x+= block_w) { for(i = 0; < block_h; i++) { for(j = 0; j < block_w; j++) { block_in[i][j] = cur_frame[(x+j)+(width*(y+i))]; } } } }
there white paper nvidia, obukov , kharlamov: discrete cosine transform 8x8 blocks cuda goes dct8x8 in cuda samples. should have @ both.
Comments
Post a Comment