Skip to content

Commit

Permalink
Fix performance issue caused by using repeated > characters inside …
Browse files Browse the repository at this point in the history
…`<?xml` (#170)

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are
repeated, which slows down the process. Therefore, the following is used
to read ahead to a specific part of the string in advance.
  • Loading branch information
Watson1978 authored Jul 16, 2024
1 parent 4ebf21f commit b8a5f4c
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 4 deletions.
3 changes: 2 additions & 1 deletion lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ class BaseParser

module Private
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
INSTRUCTION_TERM = "?>"
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
Expand Down Expand Up @@ -639,7 +640,7 @@ def parse_id_invalid_details(accept_external_id:,
end

def process_instruction(start_position)
match_data = @source.match(Private::INSTRUCTION_END, true)
match_data = @source.match(Private::INSTRUCTION_END, true, term: Private::INSTRUCTION_TERM)
unless match_data
message = "Invalid processing instruction node"
@source.position = start_position
Expand Down
6 changes: 3 additions & 3 deletions lib/rexml/source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def read_until(term)
def ensure_buffer
end

def match(pattern, cons=false)
def match(pattern, cons=false, term: nil)
if cons
@scanner.scan(pattern).nil? ? nil : @scanner
else
Expand Down Expand Up @@ -240,7 +240,7 @@ def ensure_buffer
# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
# - ">"
# - "XXX>" (X is any string excluding '>')
def match( pattern, cons=false )
def match( pattern, cons=false, term: nil )
while true
if cons
md = @scanner.scan(pattern)
Expand All @@ -250,7 +250,7 @@ def match( pattern, cons=false )
break if md
return nil if pattern.is_a?(String)
return nil if @source.nil?
return nil unless read
return nil unless read(term)
end

md.nil? ? nil : @scanner
Expand Down
11 changes: 11 additions & 0 deletions test/parse/test_processing_instruction.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
require "test/unit"
require "core_assertions"

require "rexml/document"

module REXMLTests
class TestParseProcessinInstruction < Test::Unit::TestCase
include Test::Unit::CoreAssertions

def parse(xml)
REXML::Document.new(xml)
end
Expand Down Expand Up @@ -69,5 +73,12 @@ def test_after_root

assert_equal("abc", events[:processing_instruction])
end

def test_gt_linear_performance
seq = [10000, 50000, 100000, 150000, 200000]
assert_linear_performance(seq, rehearsal: 10) do |n|
REXML::Document.new('<?xml version="1.0" ' + ">" * n + ' ?>')
end
end
end
end

0 comments on commit b8a5f4c

Please sign in to comment.