You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When calling the content method on a FileAttachment object, if the returned xml from EWS contains a non-xml supported character such as the following:    
Then upon loading the FileIO into memory as a stream, the XML fails to parse and throws an exception.
I believe the line causing the error is here in exchangelib/util.py:
deffeed(self, data, isFinal=0):
"""Yield the current content of the character buffer."""DefusedExpatParser.feed(self, data=data, isFinal=isFinal)
returnself._decode_buffer()
The thrown error is here:
Error: Got an error entry for fetch incidents [Stderr: Process Process-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/xml/sax/expatreader.py", line 217, in feed
self._parser.Parse(data, isFinal)
xml.parsers.expat.ExpatError: reference to invalid character number: line 1, column 1062
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "<string>", line 14044, in process_main
File "<string>", line 13931, in sub_main
File "<string>", line 13733, in fetch_emails_as_incidents
File "<string>", line 13566, in parse_incident_from_item
File "/usr/local/lib/python3.10/site-packages/exchangelib/attachments.py", line 154, in content
self._content = fp.read()
File "/usr/local/lib/python3.10/site-packages/exchangelib/attachments.py", line 258, in readinto
chunk = self._overflow or next(self._stream)
File "/usr/local/lib/python3.10/site-packages/exchangelib/services/get_attachment.py", line 104, in stream_file_content
yield from self._get_response_xml(payload=payload, stream_file_content=True)
File "/usr/local/lib/python3.10/site-packages/exchangelib/util.py", line 366, in parse
yield from self.feed(buffer)
File "/usr/local/lib/python3.10/site-packages/exchangelib/util.py", line 377, in feed
DefusedExpatParser.feed(self, data=data, isFinal=isFinal)
File "/usr/local/lib/python3.10/xml/sax/expatreader.py", line 221, in feed
self._err_handler.fatalError(exc)
File "/usr/local/lib/python3.10/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:1062: reference to invalid character number
] (66)
This is more or less an abbreviated version of the code we are using to load the attachment:
ifitem.attachments:
forattachmentinitem.attachments:
ifisinstance(attachment, FileAttachment):
try:
ifattachment.content:
# file attachmentlabel_attachment_type="attachments"label_attachment_id_type="attachmentId"# save the attachmentfile_name=get_attachment_name(attachment.name)
exceptTypeErrorase:
ifstr(e) !="must be string or buffer, not None":
raisecontinueexceptxml.sax.SAXParseExceptionase:
print("Error during XML parsing:")
print("Message:", e.getMessage())
continue
I cannot include the .msg file, but I can include a sample XML file which reproduces the issue - data.xml.txt
Expected behavior
It really just depends on how the library should handle issues like this. If it's something that Microsoft would normally correct when sending the XML (doubtful), then we should raise the issue there and better handle the error here. If not, then we could try stripping control chars from the xml prior to attempting to parse.
Log output
If applicable, add relevant output from debug logging. - If I can redact sensitive info, I can provide.
Additional context
Python 3.10
exchangelib==5.0.3 but does reproduce in earlier versions.
The text was updated successfully, but these errors were encountered:
Congratulations on breaking the XML parser! That's not an easy task😃
That's pretty much been my experience with Microsoft Exchange lately with them deprecating RPS.
I think we should be able to handle this gracefully in exchangelib. I'll just need some time to write a test case and come up with a fix.
If there is anything you need, feel free to let me know. I have a replicating environment and can try sending an similar email if you need. In the mean time, I'll just catch and log the error 😄
Describe the bug
When calling the
content
method on a FileAttachment object, if the returned xml from EWS contains a non-xml supported character such as the following:



Then upon loading the FileIO into memory as a stream, the XML fails to parse and throws an exception.
I believe the line causing the error is here in
exchangelib/util.py
:The thrown error is here:
To Reproduce
We are an open source repo and the full code can be seen here if needed - https://github.com/demisto/content/blob/master/Packs/MicrosoftExchangeOnline/Integrations/EWSO365/EWSO365.py#L2092
This is more or less an abbreviated version of the code we are using to load the attachment:
I cannot include the .msg file, but I can include a sample XML file which reproduces the issue -
data.xml.txt
Expected behavior
It really just depends on how the library should handle issues like this. If it's something that Microsoft would normally correct when sending the XML (doubtful), then we should raise the issue there and better handle the error here. If not, then we could try stripping control chars from the xml prior to attempting to parse.
Log output
If applicable, add relevant output from debug logging. - If I can redact sensitive info, I can provide.
Additional context
Python 3.10
exchangelib==5.0.3 but does reproduce in earlier versions.
The text was updated successfully, but these errors were encountered: