Compiling CUDA PTX to binary for an older target -


from question known ptx portable across various architectures. believe allows migration going forward ex: sm_20 sm_30. have special use case go sm_20 sm_10. possible generate binary such cubin sm_10 target ptx compiled sm_20 target.

ptx forward compatible when compiled against specific architecture (i.e., using sm_* flag), not backward compatible. 1 way on specifying particular virtual architecture , generating binary images real architectures want target. example,

nvcc -arch=compute_20 -code=sm_20,sm_30,sm_35 

generates ptx compute 2.0 virtual architecture , generates binary images 2.0, 3.0, , 3.5 devices. please note compute 1.0 deprecated of cuda 7.0. known fat binary approach.

see code generation options difference between real , virtual architectures.


edit: actually, it's bit redundant specify -arch=compute_35 , -code=sm_35 because jit compiler have intervened , built you. long don't mind little fat in fat binary, suppose doesn't matter much.

edit2: code must greater or equal arch because ptx not backwards compatible. robert crovella pointing out stupid mistake.


Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -