python: Dowloading and caching XML files - how to handle encoding declaration? -
from urllib.request import urlopen lxml import objectify i trying write program download xml files cache , open them using objectify. if download files using urlopen() can read them in using objectify.fromstring() fine:
r = urlopen(my_url) o = objectify.fromstring(r.read()) however, if download them , write them file, end encoding declaration @ top of file objectify doesn't like. wit:
# download file my_file = 'foo.xml' r = urlopen(my_url) # save locally open(my_file, 'wb') fp: fp.write(r.read()) # open saved copy open(my_file, 'r') fp: o1 = objectify.fromstring(fp.read()) results in valueerror: unicode strings encoding declaration not supported. please use bytes input or xml fragments without declaration.
if use objectify.parse(fp) works fine- soo-- go through , change client code use parse() instead, feel not right approach. have other xml files stored locally .fromstring() works fine-- based on cursory review appear have utf-8 encoding.
i don't know right resolution here- should change encoding when save file? should strip encoding declaration? should fill code try.. except valueerror clauses? please advise.
the file needs opened in binary mode rather text mode.
open(my_file, 'rb') # b stands binary as suggested exception: ... please use bytes input ...
Comments
Post a Comment