python - Ignore nested structures in numpy's array creation -
i want write vlen hdf5 dataset, using h5py.dataset.write_direct
speed process. suppose have list of numpy arrays (e.g. given cv2.findcontours
), , dataset:
dataset = h5file.create_dataset('dataset', \ shape=..., \ dtype=h5py.special_type(vlen='int32')) contours = [numpy array, ...]
for writing contours
destination given slice dest
, must first convert contours
numpy array of numpy arrays:
contours = numpy.array(contours) # shape=(len(contours),); dtype=object dataset.write_direct(contours, none, dest)
but works, if numpy arrays in contours have different shapes, e.g.:
contours = [np.zeros((10,), 'int32'), np.zeros((10,), 'int32')] contours = numpy.array(contours) # shape=(2,10); dtype='int32'
the question is: how can tell numpy create array of objects?
possible solutions:
manual creation:
contours_np = np.empty((len(contours),), dtype=object) i, contour in enumerate(contours): contours_np[i] = contour
but loops super slow, using map
:
map(lambda (i, contour): contour.__setitem_(i, contour), \ enumerate(contours))
i have tested second option, twice fast above, super ugly:
contours = np.array(contours + [none])[:-1]
here micro benchmarks:
l = [np.random.normal(size=100) _ in range(1000)]
option 1:
$ start = time.time(); l_array = np.zeros(shape=(len(l),), dtype='o'); map(lambda (i, c): l_array.__setitem__(i, c), enumerate(l)); end = time.time(); print("%fms" % ((end - start) * 10**3)) 0.950098ms
option 2:
$ start = time.time(); np.array(l + [none])[:-1]; end = time.time(); print("%fms" % ((end - start) * 10**3)) 0.409842ms
this looks kind of ugly, other suggestions?
in version
contours_np = np.empty((len(contours),), dtype=object) i, contour in enumerate(contours): contours_np[i] = contour
you can replace loop single statement
contours_np[...] = contours
Comments
Post a Comment