Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use PullParser to parse several valid messages in a row #214

Closed
DmitryPogrebnoy opened this issue Oct 17, 2024 · 4 comments · Fixed by #220
Closed

Can't use PullParser to parse several valid messages in a row #214

DmitryPogrebnoy opened this issue Oct 17, 2024 · 4 comments · Fixed by #220

Comments

@DmitryPogrebnoy
Copy link
Contributor

I have the following use case:

require 'socket'
require 'rexml/parsers/pullparser'

# Server setup for demonstration purposes
server = TCPServer.new 2000
Thread.new do
  client = server.accept
  client.puts "<message>First valid and complete message</message>"
  sleep 1
  client.puts "<message>Second valid and complete message</message>"
  sleep 1
  client.puts "<message>Third valid and complete message</message>"
  client.close
end

# Client setup
socket = TCPSocket.new 'localhost', 2000

# REXML PullParser using socket
parser = REXML::Parsers::PullParser.new(socket)

begin
  while parser.has_next?
    event = parser.pull
    if event.start_element? and event[0] == 'message'
      text = parser.pull
      if text.text?
        puts "Received message: #{text[0]}"
      end
    end
  end
ensure
  socket.close
end

# Closing the server in this demonstration
server.close

Here I have a socket, and I pull messages from the server and parse it. Every message is valid and complete xml.
Before the release https://github.com/ruby/rexml/releases/tag/v3.3.2 this use case worked like a charm. After that it doesn't work with exception

'REXML::Parsers::BaseParser#pull_event': Malformed XML: Extra tag at the end of the document (got '<message') (REXML::ParseException)

This is a side effect after this changes - #161
It seems there is no proper solution for this use case now, and I can't use the actual version of this gem.

Possible solutions could be as follows:

  1. Addition parameter or flag to ignore the error above.
  2. Add a special call to refresh the parser after each pull event.
@DmitryPogrebnoy DmitryPogrebnoy changed the title Can't use PullParser to parse several valid message in a row Can't use PullParser to parse several valid messages in a row Oct 17, 2024
@kou
Copy link
Member

kou commented Oct 18, 2024

Adding PullParser#reset and users can call it explicitly is acceptable.
Do you want to work on this?

@DmitryPogrebnoy
Copy link
Contributor Author

I'll be ready to take it on in one week. If that's okay, no problem.

@kou
Copy link
Member

kou commented Oct 18, 2024

No problem. We aren't in a hurry.

@DmitryPogrebnoy
Copy link
Contributor Author

@kou Here is a PR #220. Please take a look.

DmitryPogrebnoy pushed a commit to DmitryPogrebnoy/rexml that referenced this issue Nov 7, 2024
@kou kou closed this as completed in #220 Nov 8, 2024
kou pushed a commit that referenced this issue Nov 8, 2024
GitHub: Fix GH-214 

This is for parsing XML documents stream. We can use one parser to parse
multiple XML documents with this feature.

Co-authored-by: Dmitry Pogrebnoy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants