parallel processing - how to parallelize dct (for loops) in cuda -


how parallelize 4 nested loops in cuda
in case of dct have 4 nested loops want dct function in cuda code

for(y = 0; y < height; y+=block_h) { for(x = 0; x < width; x+= block_w) { for(i = 0; < block_h; i++) { for(j = 0; j < block_w; j++) { block_in[i][j] = cur_frame[(x+j)+(width*(y+i))]; } } } } 

there white paper nvidia, obukov , kharlamov: discrete cosine transform 8x8 blocks cuda goes dct8x8 in cuda samples. should have @ both.


Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -