Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegExMatch output variable causes incorrect transformations #118

Open
Lexikos opened this issue Jul 22, 2023 · 8 comments
Open

RegExMatch output variable causes incorrect transformations #118

Lexikos opened this issue Jul 22, 2023 · 8 comments

Comments

@Lexikos
Copy link

Lexikos commented Jul 22, 2023

Input:

        if RegExMatch(uri, "^\[url=")
            RegExMatch(uri, "\G[^\]]*", uri, 6)
        else
        {
            MsgBox 1,, URI appears invalid:`n%uri%
            IfMsgBox Cancel
                return
        }

Incorrect output:

        if RegExMatch(uri, "^\[url=")
            RegExMatch(uri[0], "\G[^\]]*", &uri, 6)
        else
        {
            msgResult := MsgBox("uri[0] appears invalid:`n" uri[0], "", 1)
            if (msgResult = "Cancel")
                return
        }

There are multiple issues:

  • The call which replaces the string in uri with a RegExMatchInfo incorrectly has its input changed from uri to uri[0]. On input, the variable still contains a string.
  • uri still contains a string in the else branch, so should not have been transformed.
  • The text "URI" inside the quoted string should not have been transformed even if it was in the if branch.

I think a simpler, less error-prone approach would be for uri := uri && uri[0] to be inserted after the RegExMatch call. Appending && uri := uri[0] to the call would also work if the return value is not being stored.

@Lexikos
Copy link
Author

Lexikos commented Jul 22, 2023

Properties with the same name are also erroneously transformed.

Input:

RegExMatch(haystack, regex, output)
x := task.output

Incorrect output:

RegExMatch(haystack, regex, &output)
x := task.output[0]

@Lexikos
Copy link
Author

Lexikos commented Jul 22, 2023

Unrelated variables and parameter names elsewhere in the script (perhaps hundreds of lines away?) are also transformed.

Input:

a(s, r) {
    RegExMatch(s, r, m)
    return m
}
b(m) {
    return m
}

Incorrect output:

a(s, r) {
    RegExMatch(s, r, &m)
    return m[0]
}
b(m[0]) {
    return m[0]
}

@safetycar
Copy link
Contributor

For the maintainers, I will attempt to cleanup and clarify parts about _RegExMatch and ConvertPseudoArray.
I'll write back about things that might require more changes.

@mmikeww
Copy link
Owner

mmikeww commented Jul 22, 2023

For the maintainers, I will attempt to cleanup and clarify parts about _RegExMatch and ConvertPseudoArray. I'll write back about things that might require more changes.

Thanks for your note, its helpful so people don't do duplicate work. If you work in a branch on your own repo, you can just tag this issue and then we should see a notification here.

Remember to add the associated test cases. You could just use the examples provided by Lexikos

safetycar added a commit to safetycar/AHK-v2-script-converter that referenced this issue Jul 23, 2023
safetycar added a commit to safetycar/AHK-v2-script-converter that referenced this issue Jul 23, 2023
@safetycar
Copy link
Contributor

The current way of updating the names is a simple replacement on a full line. I see it difficult to improve much without some bigger changes.

The changes made are:

  • Putting the reason for name replacements in a less ambiguous way.
  • A small change to not replace properties.
  • Restrict recognition of named captures to mitigate intrusive replacements. The replacement will only be done with names that can be gathered in the function conversion (this leaves out contents that could be inside variables).

But sadly:

  • There are still replacements inside quoted text.
  • There is still no limit on the lifetime of the name conversion.
  • Attempting a simpler replacement for uri && uri[0] I noticed strange repetitions in other cases, things similar to uri && uri[0] && uri[0]. To add it as a line definitions it needs to be done more carefully. Attempting to add the definition inside the same line I needed to add wrapping parenthesis but this was causing subLoopFunctions code to get stuck.

To fix those it seems better to try to find a better approach than the current full line replacement, instead of creating more workarounds.

safetycar added a commit to safetycar/AHK-v2-script-converter that referenced this issue Jul 24, 2023
safetycar added a commit to safetycar/AHK-v2-script-converter that referenced this issue Jul 24, 2023
mmikeww pushed a commit that referenced this issue Jul 24, 2023
* Update tests related to RegExMatch and #118

* Update renaming of PseudoArrays to be more concrete (#118)

* Move failing tests about known issues with RegExMatch

---------

Co-authored-by: SafetyCar <[email protected]>
@mmikeww
Copy link
Owner

mmikeww commented Jul 24, 2023

partially completed in 69269b3

@andymbody
Copy link
Contributor

Unrelated variables and parameter names elsewhere in the script (perhaps hundreds of lines away?) are also transformed.

Input:

a(s, r) {
    RegExMatch(s, r, m)
    return m
}
b(m) {
    return m
}

Incorrect output:

a(s, r) {
    RegExMatch(s, r, &m)
    return m[0]
}
b(m[0]) {
    return m[0]
}

This will be corrected within Masking pull request coming soon (already tested and corrected).

@andymbody
Copy link
Contributor

andymbody commented Jun 25, 2024

I think PR #208 will have some affect on this issue. It is still not the full fix, but progress. More work will be needed to address the global scope of the issue outlined in this thread. Possibly a rewrite of how RegexMatch (and other structures are) is handled in general, rather than multiple band-aids (as @safetycar mentioned). The upcoming PR that will introduce class/function/string masking (and separate handling) may come in handy for this as well. It can be extended to include other structures such as IF, Regex, etc, which may provide alternate fronts of attack.

See commit #213 for partial solution to this thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants