-
Notifications
You must be signed in to change notification settings - Fork 16
Nested attribute selectors #21
Nested attribute selectors #21
Conversation
I've added a commit with an attempt to fix the problem. I don't know if it's ideal though. |
src/Document/Query.php
Outdated
@@ -88,7 +88,7 @@ public static function cssToXpath($path) | |||
|
|||
// Arbitrary attribute value contains whitespace | |||
$path = preg_replace_callback( | |||
'/\[\S+["\'](.+)["\']\]/', | |||
'/\[\S+["\']([\w\s]+)["\']\]/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... what about .+?
(non greedy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that (first), but it didn't work. Maybe [^"\']
? The tests would pass, but that also seems pretty fragile to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I had a look on it, and .+?
doesn't work because \S+
is greedy. So I would suggest then:
'/\[\S+?["\'](.+?)["\']\]/'
or with modifier U
(see: http://php.net/manual/en/reference.pcre.pattern.modifiers.php):
'/\[\S+["\'](.+)["\']\]/U'
@webimpress I've updated the PR with your first suggestion. Thank you for your help! |
@@ -88,7 +88,7 @@ public static function cssToXpath($path) | |||
|
|||
// Arbitrary attribute value contains whitespace | |||
$path = preg_replace_callback( | |||
'/\[\S+["\'](.+)["\']\]/', | |||
'/\[\S+?["\'](.+?)["\']\]/', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking about this regexp here, and I think we should use backreference here to match match always "..."
or '...'
and not "...'
or '..."
. But it's separate issue/improvement ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think that may be necessary. The problem is that if you have a "
, and then a '
later, this will match... which will give you an incomplete match for the attribute value. Consider something like this: "javascript:alert('foo')". The above will return
"javascript:alert'` as the match. So a backreference is definitely needed here to ensure that we search until we find a matching closing character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weierophinney It is, yes. But that has nothing to with this PR. That problem already existed and should probably be fixed in a separate PR. If you want it to be fixed right here, you're welcome to update this one. I'm not good enough with regular expressions.
We need to match the initial quote, and then all characters up to but not including a matching final quote, taking into account escaped quotes as well. This patch uses a technique learned from http://blog.stevenlevithan.com/archives/match-quoted-string to do so, and updates existing regular expressions, including the one modified for #21, to properly match quoted strings, introducing several new test cases to verify the changes.
Thanks, @MidnightDesign! When merging, I added more tests, and included changes to ensure we properly identify quoted strings based opening quote. |
6ae9938/2.7.0 introduced a bug where nested attribute selectors (
div[class="foo"] div[class="bar"]
) aren't transformed to XPath properly anymore. Does anyone have an idea how we could fix this?