Compiling CUDA PTX to binary for an older target -
from question known ptx portable across various architectures. believe allows migration going forward ex: sm_20 sm_30. have special use case go sm_20 sm_10. possible generate binary such cubin sm_10 target ptx compiled sm_20 target.
ptx forward compatible when compiled against specific architecture (i.e., using sm_*
flag), not backward compatible. 1 way on specifying particular virtual architecture , generating binary images real architectures want target. example,
nvcc -arch=compute_20 -code=sm_20,sm_30,sm_35
generates ptx compute 2.0 virtual architecture , generates binary images 2.0, 3.0, , 3.5 devices. please note compute 1.0 deprecated of cuda 7.0. known fat binary approach.
see code generation options difference between real , virtual architectures.
edit: actually, it's bit redundant specify -arch=compute_35
, -code=sm_35
because jit compiler have intervened , built you. long don't mind little fat in fat binary, suppose doesn't matter much.
edit2: code
must greater or equal arch
because ptx not backwards compatible. robert crovella pointing out stupid mistake.
Comments
Post a Comment