Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix the logic mistakenly decoded input domain name passed as tuple of bytes, fixes issue #31. Added some test cases and mentioning this in README. Before this, the tuple of bytes input matches the PSL if the bytes are valid UTF-8. ```python psl = PublicSuffixList("例.example") psl.publicsuffix("例.example") # "例.example" psl.publicsuffix("xn--fsq.example") # "xn--fsq.example" psl.publicsuffix((b"xn--fsq", b"example")) # (b"xn--fsq", b"example") psl.publicsuffix((b"\xe4\xbe\x8b", b"example")) # (b"\xe4\xbe\x8b", b"example") ``` Expected behavior should be: ```python psl.publicsuffix((b"\xe4\xbe\x8b", b"example")) # (b"example",) ``` The last case should not match in its entirety since the bytes object does not contain its encoding information. We should evaluate the binary input as-is, except for the ASCII case conversion defined in the evaluation rule. This can be problematic if the encoding of arbitrary input cannot be enforced and/or the input must be decoded from bytes to str using punycode. Assuming UTF-8 is incorrect in this context. In cases where evaluating binary as UTF-8 is required, the callers should re-encode the input to punycoded bytes tuples, or use a scalar str. Signed-off-by: ko-zu <[email protected]>
- Loading branch information