python: Dowloading and caching XML files - how to handle encoding declaration? -
from urllib.request import urlopen lxml import objectify
i trying write program download xml files cache , open them using objectify
. if download files using urlopen()
can read them in using objectify.fromstring()
fine:
r = urlopen(my_url) o = objectify.fromstring(r.read())
however, if download them , write them file, end encoding declaration @ top of file objectify
doesn't like. wit:
# download file my_file = 'foo.xml' r = urlopen(my_url) # save locally open(my_file, 'wb') fp: fp.write(r.read()) # open saved copy open(my_file, 'r') fp: o1 = objectify.fromstring(fp.read())
results in valueerror: unicode strings encoding declaration not supported. please use bytes input or xml fragments without declaration.
if use objectify.parse(fp)
works fine- soo-- go through , change client code use parse()
instead, feel not right approach. have other xml files stored locally .fromstring()
works fine-- based on cursory review appear have utf-8
encoding.
i don't know right resolution here- should change encoding when save file? should strip encoding declaration? should fill code try.. except valueerror
clauses? please advise.
the file needs opened in binary mode rather text mode.
open(my_file, 'rb') # b stands binary
as suggested exception: ... please use bytes input ...
Comments
Post a Comment