python 2.7 - How to use StreamXmlRecordReader to parse single & multiline xml records within a single file -
i have input file (txt) below <a><b><c>val1</c></b></a>||<a><b><c>val2</c></b></a>||<a><b> <c>val3</c></b></a>||<a></b><c>val4-c-1</c><c>val4-c-2</c></b><d>val-d-1</d></a> if observe input carefully, xml data record after third '||' split across 2 lines. i want use streamxmlrecordreader of hadoop streaming parse file -inputreader "org.apache.hadoop.streaming.streamxmlrecordreader,begin=<a>,end=</a>,slowmatch=true which unable parse 3rd record. i getting below error traceback (most recent call last): file "/home/rsome/test/code/m1.py", line 13, in <module> root = et.fromstring(xml_str.getvalue()) file "/usr/lib64/python2.6/xml/etree/elementtree.py", line 964, in xml return parser.close() file "/usr/lib64/python2.6/xml/etree/elementtree