Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rustfix] Two suggestions applying to the same span cause syntax errors #14699

Closed
blyxyas opened this issue Oct 16, 2024 · 4 comments · Fixed by #14747
Closed

[rustfix] Two suggestions applying to the same span cause syntax errors #14699

blyxyas opened this issue Oct 16, 2024 · 4 comments · Fixed by #14747
Labels
C-bug Category: bug Command-fix S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review

Comments

@blyxyas
Copy link
Member

blyxyas commented Oct 16, 2024

Problem

rust-lang/rust-clippy#13549

Two suggestions all making changes to the same span have the potential to cause a syntax error.

Steps

  1. cargo new --lib test-crate
  2. Write the following code into src/lib.rs:
struct Foo(i64); 
impl Into<i64> for Foo {
    fn into(self) -> i64 {
        self.0.clone()
    }
}`
  1. cargo clippy --fix

Possible Solution(s)

Making sure that a spans don't overlap each other when having made a suggestion. See more context in the linked issue

Notes

No response

Version

cargo version --verbose

@blyxyas blyxyas added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Oct 16, 2024
@saites
Copy link
Contributor

saites commented Oct 19, 2024

The issue in the clippy repo suggestions the problem lies with apply_suggestions, but I think the real issue is with apply_solution, or perhaps with how cargo fix uses apply:

cargo/src/cargo/ops/fix.rs

Lines 983 to 989 in cf53cc5

match fixed.apply(suggestion) {
Ok(()) => fixed_file.fixes_applied += 1,
// As mentioned above in `rustfix_crate`, we don't immediately
// warn about suggestions that fail to apply here, and instead
// we save them off for later processing.
Err(e) => fixed_file.errors_applying_fixes.push(e.to_string()),
}

/// Applies a suggestion to the code.
pub fn apply(&mut self, suggestion: &Suggestion) -> Result<(), Error> {
for solution in &suggestion.solutions {
self.apply_solution(solution)?;
}
Ok(())
}
/// Applies an individual solution from a [`Suggestion`].
pub fn apply_solution(&mut self, solution: &Solution) -> Result<(), Error> {
for r in &solution.replacements {
self.data
.replace_range(r.snippet.range.clone(), r.replacement.as_bytes())?;
self.modified = true;
}
Ok(())
}

When a Solution consists of multiple Replacements, calling apply_solution results in multiple calls to replace_range, and a failing call may have been preceded by one or more successful calls, ultimately leading to a partially applied Solution. The same principle is true with respect to apply if the Suggestion consists of multiple Solutions.

In this particular case, the Suggestion for from_over_into has a Solution that consists of multiple Replacements1. One of those overlaps with the (single) Replacement in the (single) Solution for clone_on_copy. The suggestion for clone_on_copy is applied first, replacing self.0.clone() with self.0. Then when attempting to apply the suggestion for from_over_into, the first several Replacements are applied successfully, but when reaching the overlapping span, apply returns Err(Error::AlreadyReplaced). The partially-applied Suggestion is written, cargo fix sees that the overall process returned an error, and so it attempts a second round -- now with code that cannot compile.

It seems to me that most reasonable fixes are:

  • Change apply and apply_solution to work atomically, so that if it returns a Result::Err, self.data is in the state it had been before the call.
  • Change rustfix_and_fix to handle the lack of atomicity2.

The advantage of the former is that it would prevent similar mistakes in the future, but it would require all callers pay the cost associated with whatever solution provides the atomic application -- it is reasonable that a caller might only wish to discard attempts that result in errors.

Making the change in rustfix_and_fix allows trading complexity for performance: the simplest version would simply clone the content before calling apply, and only keep the result if there were no errors. This could be made more efficient by only cloning if there are multiple Replacements to be applied. The logic could get even more detailed by checking whether Suggestions involve overlapping ranges -- though whether that would be more efficient is better left to profiling.

Footnotes

  1. The generated spans are:

    {"message":"replace the `Into` implementation with `From<Foo>`","spans":[[23,27],[28,31],[37,40],[50,54],[55,59],[64,67],[78,82]]}
    {"message":"try removing the `clone` call","spans":[[78,92]]
    

    as generated by

    cargo clippy --message-format=json | \
      jq -c '.message.children[]? | select(.spans | any) 
          | {message, spans: [ .spans[] | [.byte_start, .byte_end ]]}'
    
  2. Also, there maybe should be some documentation on apply/apply_solution warning that if it returns an Err, then Solutions/Suggestions may have been partially applied.

@weihanglo
Copy link
Member

Thanks @saites for this. I planned to draft out a similar reply and your writing is much better than what I had in my mind!

Perhaps the easiest approach to make it "just work" is this patch

diff --git a/src/cargo/ops/fix.rs b/src/cargo/ops/fix.rs
index ba3c9f2666c..571f7603369 100644
--- a/src/cargo/ops/fix.rs
+++ b/src/cargo/ops/fix.rs
@@ -980,12 +980,14 @@ fn rustfix_and_fix(
             {
                 continue;
             }
-            match fixed.apply(suggestion) {
-                Ok(()) => fixed_file.fixes_applied += 1,
-                // As mentioned above in `rustfix_crate`, we don't immediately
-                // warn about suggestions that fail to apply here, and instead
-                // we save them off for later processing.
-                Err(e) => fixed_file.errors_applying_fixes.push(e.to_string()),
+            for solution in &suggestion.solutions {
+                match fixed.apply_solution(solution) {
+                    Ok(()) => fixed_file.fixes_applied += 1,
+                    // As mentioned above in `rustfix_crate`, we don't immediately
+                    // warn about suggestions that fail to apply here, and instead
+                    // we save them off for later processing.
+                    Err(e) => fixed_file.errors_applying_fixes.push(e.to_string()),
+                }
             }
         }
         if fixed.modified() {

Though this needs more designs in rustfix_and_fix for reporting to users about what suggestion has failed.

We could also extend the CodeFix::apply to have an extra MergeStrategy argument, having variants like FailFast (the current) and PrioritizeFirst (first come first served), and so on.

@weihanglo weihanglo added S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. and removed S-triage Status: This issue is waiting on initial triage. labels Oct 19, 2024
@saites
Copy link
Contributor

saites commented Oct 21, 2024

No problem -- glad to help.

Perhaps the easiest approach to make it "just work" is this patch [...]

I think that'll have the same issue, as in this case, the Solution consists of multiple Replacements. Some of the Replacements will get applied, but when one fails, the code will be left in an invalid state.

I implemented a very lazy fix that successfully addresses the issue by adjusting the code in rustfix/lib.rs

     /// Applies a suggestion to the code.
     pub fn apply(&mut self, suggestion: &Suggestion) -> Result<(), Error> {
+        let mut prev = self.clone();
         for solution in &suggestion.solutions {
-            self.apply_solution(solution)?;
+            prev.apply_solution(solution)?;
         }
+        self.data = prev.data;
+        self.modified = prev.modified;
         Ok(())
     }

This can be made more efficient by first checking if the Suggestion has only Solution with only one Replacement, in which case no clone is necessary.

But I think a better and more efficient approach is to introduce transactional semantics directly on replace::Data. If it records changes as "pending", it could have a pair of commit() and restore() methods to keep or discard those changes. Using it could look like this:

     pub fn apply_solution(&mut self, solution: &Solution) -> Result<(), Error> {
         for r in &solution.replacements {
             self.data
-                .replace_range(r.snippet.range.clone(), r.replacement.as_bytes())?;
+                .replace_range(r.snippet.range.clone(), r.replacement.as_bytes())
+                .inspect_err(|_| self.data.restore())?;
-            self.modified = true;
         }
+        self.data.commit();
+        self.modified = true;
         Ok(())
     }

As for implementation, one approach could be to just extend the existing State enumeration with a pair of Uncommitted variants (Initial would need no such variation):

enum State {
     /// The initial state. No change applied.
     Initial,
+    /// Intends to be replaced.
+    UncommittedReplace(Rc<[u8]>),
+    /// Intends to be inserted.
+    UncommittedInsert(Rc<[u8]>),
     /// Has been replaced.
     Replaced(Rc<[u8]>),
     /// Has been inserted.
     Inserted(Rc<[u8]>),
}

impl State {
     fn is_inserted(&self) -> bool {
-        matches!(*self, State::Inserted(..))
+        matches!(*self, State::Inserted(..) | State::UncommittedInsert(..))
     }
}

I think it's most sensible for to_vec to only make use of committed changes, but if you wanted this change to be transparent to callers, you'd simply have the Uncommitted* variants work as the existing ones:

    pub fn to_vec(&self) -> Vec<u8> {
        if self.original.is_empty() {
            return Vec::new();
        }

        self.parts.iter().fold(Vec::new(), |mut acc, d| {
            match d.data {
                State::Initial => acc.extend_from_slice(&self.original[d.start..d.end]),
+               State::UncommittedReplace(ref d) | State::UncommittedInsert(ref d) |
                State::Replaced(ref d) | State::Inserted(ref d) => acc.extend_from_slice(d),
            };
            acc
        })
    }

As for replace_range, it would only insert uncommitted changes:

            // New part
            new_parts.push(Span {
                 start: range.start,
                 end: range.end,
                 data: if insert_only {
-                    State::Inserted(data.into())
+                    State::UncommittedInsert(data.into())
                 } else {
-                    State::Replaced(data.into())
+                    State::UncommittedReplace(data.into())
                 },
             });

If you are OK with a behavior change to Data (as I'm pretty sure it's only used by rustfix within lib.rs), then since each Suggestion is either applied successfully or leaves the data in its previous state, we can get rid of the special handling for duplicate replacements, and it'll still work for rust-lang/rust/issues/51211 and even fix /issues/13027.

             let part_to_split = &self.parts[index_of_part_to_split];

-            // If this replacement matches exactly the part that we would
-            // otherwise split then we ignore this for now. This means that you
-            // can replace the exact same range with the exact same content
-            // multiple times and we'll process and allow it.
-            //
-            // This is currently done to alleviate issues like
-            // rust-lang/rust#51211 although this clause likely wants to be
-            // removed if that's fixed deeper in the compiler.
-            if part_to_split.start == range.start && part_to_split.end == range.end {
-                if let State::Replaced(ref replacement) = part_to_split.data {
-                    if &**replacement == data {
-                        return Ok(());
-                    }
-                }
-            }
-
             if part_to_split.data != State::Initial {
                 return Err(Error::AlreadyReplaced);
             }

The commit method converts Uncommitted* parts to the original variants:

    /// Commit pending changes.
    pub fn commit(&mut self) {
        use State::*;
        for part in &mut self.parts {
            part.data = match part.data {
                UncommittedInsert(ref data) => Inserted(data.clone()),
                UncommittedReplace(ref data) => Replaced(data.clone()),
                Initial | Inserted(..) | Replaced(..) => continue,
            }
        }
    }

The restore would discard uncommitted inserts. Because replace_range currently expects no two Initial parts are adjacent, it has to convert uncommitted replacements into Initial parts and merge them when necessary:

    pub fn restore(&mut self) {
        use State::*;

        let mut restored = Vec::with_capacity(self.parts.len());

        for part in self.parts.drain(..) {
            match part.data {
                Replaced(..) | Inserted(..) => { 
                    // Keep committed changes.
                    restored.push(part); 
                },
                UncommittedInsert(..) => {
                    // Drop uncommitted inserts.
                },
                UncommittedReplace(..) | Initial => {
                    // Convert uncommitted replacements back to Initial parts.
                    // Dropping uncommitted replacements can leave adjacent Initial parts,
                    // so if the previously retained part was also Initial, merge this one with it.
                    if let Some(last @ Span{ data: Initial, ..  }) = restored.last_mut() {
                        last.end = part.end;
                    } else {
                        restored.push(Span { start: part.start, end: part.end, data: Initial });
                    }
                },
            }
        }

        self.parts = restored;
    }

To me, this set of changes seems like the best trade-off among fixing the issue, reducing changes, and maintaining performance. That said, if you're up for something more radical, I also think it'd be reasonable to change the implementation for replace.rs almost entirely -- instead of tracking parts the way it currently does, it could just push incoming changes into a collection sorted by start position, and check for conflicts at that moment (if we want to insert incoming just after the element prior, it conflicts if and only if $prior.start \le incoming.start \le prior.end$). Rendering the result would become a merge operation. Tracking committed versus uncommitted could just be a bool on the Span, and there wouldn't be a need for the State enumeration, since Initial data would be implicit (as the gaps between changes) and the difference between Replacement and Insertion is already implicitly based on whether start == end.

In any case, I've gone on long enough 😅. I'd be happy to submit a PR for (any or all of) the changes I discussed in this comment, if these ideas seem reasonable.

@weihanglo
Copy link
Member

@saites Thanks for the inspiring solution! I think I quite like the idea of transaction.

I think that'll have the same issue, as in this case, the Solution consists of multiple Replacements. Some of the Replacements will get applied, but when one fails, the code will be left in an invalid state.

I just realized that I was targeting at the wrong level of the code. The current Cargo never generates broken code because it'll revert when compile error happened, unless --broken-code was given, or ctrl-C was signaled before reverting. This is done at per-file level.

While your patch is at a lower Solution level, the fact that the entire solution rolls back makes the previous applied Suggestion stays valid. As a result Cargo is happy to try on next Suggestion, which could maximize the number of Suggestions cargo-fix applies in one iteration. This is exactly the PrioritizeFirst strategy I proposed but dealing it in a smarter way and lower level. Brilliant!

Really looking forward to your pull request :)

@weihanglo weihanglo added S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review and removed S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. labels Oct 22, 2024
@bors bors closed this as completed in 3dedb85 Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug Command-fix S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants