Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findall working incorrectly #5

Closed
Arti3DPlayer opened this issue Feb 19, 2016 · 19 comments
Closed

findall working incorrectly #5

Arti3DPlayer opened this issue Feb 19, 2016 · 19 comments
Labels

Comments

@Arti3DPlayer
Copy link

http://stackoverflow.com/questions/35466584/swift-regex-find-all?noredirect=1#comment58632890_35466584

Hi, have the following issue. Tried your library. But it returns incorrect results :(

@cezheng
Copy link
Owner

cezheng commented Feb 19, 2016

Hi, thanks for reporting the issue.

However, I think the library is working as it is suppose to be.

Take a look at the documentation of this method:

  Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

  See https://docs.python.org/2/library/re.html#re.findall

  - parameter pattern: regular expression pattern string
  - parameter string:  string to be searched
  - parameter flags:   NSRegularExpressionOptions value

  - returns: Array of matched substrings

which is not the same as that of Python's re.findall

As well as the declaration of the method:

public static func findall(pattern: String, _ string: String, flags: RegexObject.Flag = []) -> [String] 

Swift is different from Python, because it is statically strongly-typed(which makes Swift a much faster and safer language). It is implemented as a function to return an array of Strings, and you can not expect it to return an array of tuples.

If you do want to get the captured groups from it, try re.finditer to get all the MatchObjects then map them to tuples.

Remember, PySwiftyRegex only serves to let you deal with Regex using Python's re interfaces which makes your code cleaner, but after all it is written in Swift, not Python, and there are occasions in which they do not behave exactly the same.

@Arti3DPlayer
Copy link
Author

Thanks for answer

@Arti3DPlayer
Copy link
Author

Oh i tried as you said, and a lot of other functions, the problem that <div class="cl_price">[\s\.\w<>="$\/]*Your Price\s*\$(\d+.\d+)|<div class="cl_price">[\s\.\w<>="$\/]*\$(\d+.\d+) any regex function doesn't group results...

@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

@Arti3DPlayer try this

re.finditer(pattern, htmlString).map { $0.groups() }

@Arti3DPlayer
Copy link
Author

It returns:
[[nil, Optional(" </d")]]

But should return:

[nil, "239.00"]

@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

@Arti3DPlayer How did you write the regex string in Swift? It should be a little bit different from that of Python, and you might need to add some ''s

@Arti3DPlayer
Copy link
Author

it comes from web, in variable

@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

that might be the problem

@Arti3DPlayer
Copy link
Author

yes, but i printed it, and it looks like i showed

@Arti3DPlayer
Copy link
Author

If i did without or:

re.finditer("<div class=\"cl_price\">[\\s\\.\\w<>=\"$\\/]*\\$(\\d+.\\d+)", value).map { $0.groups() }

it doesn't work.. [[Optional(" </d")]]

@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

@Arti3DPlayer I have tried your regex, it seems to be a bug when matching string that contains multiple lines. I'll let you know after I fixed it

@Arti3DPlayer
Copy link
Author

maybe needed some flag ?

@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

@Arti3DPlayer You are definitely right, it's not a bug. Try this

re.finditer("<div class=\"cl_price\">[\\s\\.\\w<>=\"$\\/]*\\$(\\d+.\\d+)", string, flags: [.DotMatchesLineSeparators]).map { $0.groups() }

@Arti3DPlayer
Copy link
Author

Is this work for you ? For me not... I tried a lot of flags

@Arti3DPlayer
Copy link
Author

Oh, i found solution on swift:

let reg: NSRegularExpression
            do {
                reg = try NSRegularExpression(pattern: regexValue, options: [])
            } catch {
                return []
            }
            let nstext = value as NSString
            let matches = reg.matchesInString(value, options: [], range:  NSMakeRange(0, nstext.length))
            var collectMatches: Array<String> = []
            for match in matches {
                for n in 1...match.numberOfRanges-1 {
                    if match.rangeAtIndex(n).length > 0 {
                        let substring = nstext.substringWithRange(match.rangeAtIndex(n))
                        collectMatches.append(substring)
                    }
                }

            }

            return collectMatches

@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

@Arti3DPlayer

I figured out the reason.

For the html string of your url,

print(value.characters.count) // prints 5486
print((value as NSString).length) // prints 5585

this inconsistency produces incorrect result when I cast from NSRange to Swift's Range.

I guess I would have to figure out why casting from String to NSString will change its length first.

@cezheng cezheng added the bug label Feb 22, 2016
@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

@Arti3DPlayer
I have fixed this bug in version 0.2.2
518590c

Now

re.finditer(pattern, string).map { $0.groups() }

will return the correct value.

@Arti3DPlayer
Copy link
Author

thanks a lot

@cezheng
Copy link
Owner

cezheng commented Feb 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants