[rustfix] Two suggestions applying to the same span cause syntax errors #14699

blyxyas · 2024-10-16T22:50:39Z

Problem

Two suggestions all making changes to the same span have the potential to cause a syntax error.

Steps

cargo new --lib test-crate
Write the following code into src/lib.rs:

struct Foo(i64); 
impl Into<i64> for Foo {
    fn into(self) -> i64 {
        self.0.clone()
    }
}`

cargo clippy --fix

Possible Solution(s)

Making sure that a spans don't overlap each other when having made a suggestion. See more context in the linked issue

Notes

No response

Version

cargo version --verbose

The text was updated successfully, but these errors were encountered:

saites · 2024-10-19T02:12:07Z

The issue in the clippy repo suggestions the problem lies with apply_suggestions, but I think the real issue is with apply_solution, or perhaps with how cargo fix uses apply:

cargo/src/cargo/ops/fix.rs

Lines 983 to 989 in cf53cc5

    
           match fixed.apply(suggestion) { 
        
               Ok(()) => fixed_file.fixes_applied += 1, 
        
               // As mentioned above in `rustfix_crate`, we don't immediately 
        
               // warn about suggestions that fail to apply here, and instead 
        
               // we save them off for later processing. 
        
               Err(e) => fixed_file.errors_applying_fixes.push(e.to_string()), 
        
           }

cargo/crates/rustfix/src/lib.rs

Lines 236 to 252 in cf53cc5

    
           /// Applies a suggestion to the code. 
        
           pub fn apply(&mut self, suggestion: &Suggestion) -> Result<(), Error> { 
        
               for solution in &suggestion.solutions { 
        
                   self.apply_solution(solution)?; 
        
               } 
        
               Ok(()) 
        
           } 
        
           /// Applies an individual solution from a [`Suggestion`]. 
        
           pub fn apply_solution(&mut self, solution: &Solution) -> Result<(), Error> { 
        
               for r in &solution.replacements { 
        
                   self.data 
        
                       .replace_range(r.snippet.range.clone(), r.replacement.as_bytes())?; 
        
                   self.modified = true; 
        
               } 
        
               Ok(()) 
        
           }

When a Solution consists of multiple Replacements, calling apply_solution results in multiple calls to replace_range, and a failing call may have been preceded by one or more successful calls, ultimately leading to a partially applied Solution. The same principle is true with respect to apply if the Suggestion consists of multiple Solutions.

In this particular case, the Suggestion for from_over_into has a Solution that consists of multiple Replacements¹. One of those overlaps with the (single) Replacement in the (single) Solution for clone_on_copy. The suggestion for clone_on_copy is applied first, replacing self.0.clone() with self.0. Then when attempting to apply the suggestion for from_over_into, the first several Replacements are applied successfully, but when reaching the overlapping span, apply returns Err(Error::AlreadyReplaced). The partially-applied Suggestion is written, cargo fix sees that the overall process returned an error, and so it attempts a second round -- now with code that cannot compile.

It seems to me that most reasonable fixes are:

Change apply and apply_solution to work atomically, so that if it returns a Result::Err, self.data is in the state it had been before the call.
Change rustfix_and_fix to handle the lack of atomicity².

The advantage of the former is that it would prevent similar mistakes in the future, but it would require all callers pay the cost associated with whatever solution provides the atomic application -- it is reasonable that a caller might only wish to discard attempts that result in errors.

Making the change in rustfix_and_fix allows trading complexity for performance: the simplest version would simply clone the content before calling apply, and only keep the result if there were no errors. This could be made more efficient by only cloning if there are multiple Replacements to be applied. The logic could get even more detailed by checking whether Suggestions involve overlapping ranges -- though whether that would be more efficient is better left to profiling.

The generated spans are:

{"message":"replace the `Into` implementation with `From<Foo>`","spans":[[23,27],[28,31],[37,40],[50,54],[55,59],[64,67],[78,82]]}
{"message":"try removing the `clone` call","spans":[[78,92]]

as generated by

cargo clippy --message-format=json | \
  jq -c '.message.children[]? | select(.spans | any) 
      | {message, spans: [ .spans[] | [.byte_start, .byte_end ]]}'

↩

Also, there maybe should be some documentation on apply/apply_solution warning that if it returns an Err, then Solutions/Suggestions may have been partially applied. ↩

weihanglo · 2024-10-19T03:06:52Z

Thanks @saites for this. I planned to draft out a similar reply and your writing is much better than what I had in my mind!

Perhaps the easiest approach to make it "just work" is this patch

diff --git a/src/cargo/ops/fix.rs b/src/cargo/ops/fix.rs
index ba3c9f2666c..571f7603369 100644
--- a/src/cargo/ops/fix.rs
+++ b/src/cargo/ops/fix.rs
@@ -980,12 +980,14 @@ fn rustfix_and_fix(
             {
                 continue;
             }
-            match fixed.apply(suggestion) {
-                Ok(()) => fixed_file.fixes_applied += 1,
-                // As mentioned above in `rustfix_crate`, we don't immediately
-                // warn about suggestions that fail to apply here, and instead
-                // we save them off for later processing.
-                Err(e) => fixed_file.errors_applying_fixes.push(e.to_string()),
+            for solution in &suggestion.solutions {
+                match fixed.apply_solution(solution) {
+                    Ok(()) => fixed_file.fixes_applied += 1,
+                    // As mentioned above in `rustfix_crate`, we don't immediately
+                    // warn about suggestions that fail to apply here, and instead
+                    // we save them off for later processing.
+                    Err(e) => fixed_file.errors_applying_fixes.push(e.to_string()),
+                }
             }
         }
         if fixed.modified() {

Though this needs more designs in rustfix_and_fix for reporting to users about what suggestion has failed.

We could also extend the CodeFix::apply to have an extra MergeStrategy argument, having variants like FailFast (the current) and PrioritizeFirst (first come first served), and so on.

saites · 2024-10-21T19:05:36Z

No problem -- glad to help.

Perhaps the easiest approach to make it "just work" is this patch [...]

I think that'll have the same issue, as in this case, the Solution consists of multiple Replacements. Some of the Replacements will get applied, but when one fails, the code will be left in an invalid state.

I implemented a very lazy fix that successfully addresses the issue by adjusting the code in rustfix/lib.rs

     /// Applies a suggestion to the code.
     pub fn apply(&mut self, suggestion: &Suggestion) -> Result<(), Error> {
+        let mut prev = self.clone();
         for solution in &suggestion.solutions {
-            self.apply_solution(solution)?;
+            prev.apply_solution(solution)?;
         }
+        self.data = prev.data;
+        self.modified = prev.modified;
         Ok(())
     }

This can be made more efficient by first checking if the Suggestion has only Solution with only one Replacement, in which case no clone is necessary.

But I think a better and more efficient approach is to introduce transactional semantics directly on replace::Data. If it records changes as "pending", it could have a pair of commit() and restore() methods to keep or discard those changes. Using it could look like this:

     pub fn apply_solution(&mut self, solution: &Solution) -> Result<(), Error> {
         for r in &solution.replacements {
             self.data
-                .replace_range(r.snippet.range.clone(), r.replacement.as_bytes())?;
+                .replace_range(r.snippet.range.clone(), r.replacement.as_bytes())
+                .inspect_err(|_| self.data.restore())?;
-            self.modified = true;
         }
+        self.data.commit();
+        self.modified = true;
         Ok(())
     }

As for implementation, one approach could be to just extend the existing State enumeration with a pair of Uncommitted variants (Initial would need no such variation):

enum State {
     /// The initial state. No change applied.
     Initial,
+    /// Intends to be replaced.
+    UncommittedReplace(Rc<[u8]>),
+    /// Intends to be inserted.
+    UncommittedInsert(Rc<[u8]>),
     /// Has been replaced.
     Replaced(Rc<[u8]>),
     /// Has been inserted.
     Inserted(Rc<[u8]>),
}

impl State {
     fn is_inserted(&self) -> bool {
-        matches!(*self, State::Inserted(..))
+        matches!(*self, State::Inserted(..) | State::UncommittedInsert(..))
     }
}

I think it's most sensible for to_vec to only make use of committed changes, but if you wanted this change to be transparent to callers, you'd simply have the Uncommitted* variants work as the existing ones:

    pub fn to_vec(&self) -> Vec<u8> {
        if self.original.is_empty() {
            return Vec::new();
        }

        self.parts.iter().fold(Vec::new(), |mut acc, d| {
            match d.data {
                State::Initial => acc.extend_from_slice(&self.original[d.start..d.end]),
+               State::UncommittedReplace(ref d) | State::UncommittedInsert(ref d) |
                State::Replaced(ref d) | State::Inserted(ref d) => acc.extend_from_slice(d),
            };
            acc
        })
    }

As for replace_range, it would only insert uncommitted changes:

            // New part
            new_parts.push(Span {
                 start: range.start,
                 end: range.end,
                 data: if insert_only {
-                    State::Inserted(data.into())
+                    State::UncommittedInsert(data.into())
                 } else {
-                    State::Replaced(data.into())
+                    State::UncommittedReplace(data.into())
                 },
             });

If you are OK with a behavior change to Data (as I'm pretty sure it's only used by rustfix within lib.rs), then since each Suggestion is either applied successfully or leaves the data in its previous state, we can get rid of the special handling for duplicate replacements, and it'll still work for rust-lang/rust/issues/51211 and even fix /issues/13027.

             let part_to_split = &self.parts[index_of_part_to_split];

-            // If this replacement matches exactly the part that we would
-            // otherwise split then we ignore this for now. This means that you
-            // can replace the exact same range with the exact same content
-            // multiple times and we'll process and allow it.
-            //
-            // This is currently done to alleviate issues like
-            // rust-lang/rust#51211 although this clause likely wants to be
-            // removed if that's fixed deeper in the compiler.
-            if part_to_split.start == range.start && part_to_split.end == range.end {
-                if let State::Replaced(ref replacement) = part_to_split.data {
-                    if &**replacement == data {
-                        return Ok(());
-                    }
-                }
-            }
-
             if part_to_split.data != State::Initial {
                 return Err(Error::AlreadyReplaced);
             }

The commit method converts Uncommitted* parts to the original variants:

    /// Commit pending changes.
    pub fn commit(&mut self) {
        use State::*;
        for part in &mut self.parts {
            part.data = match part.data {
                UncommittedInsert(ref data) => Inserted(data.clone()),
                UncommittedReplace(ref data) => Replaced(data.clone()),
                Initial | Inserted(..) | Replaced(..) => continue,
            }
        }
    }

The restore would discard uncommitted inserts. Because replace_range currently expects no two Initial parts are adjacent, it has to convert uncommitted replacements into Initial parts and merge them when necessary:

    pub fn restore(&mut self) {
        use State::*;

        let mut restored = Vec::with_capacity(self.parts.len());

        for part in self.parts.drain(..) {
            match part.data {
                Replaced(..) | Inserted(..) => { 
                    // Keep committed changes.
                    restored.push(part); 
                },
                UncommittedInsert(..) => {
                    // Drop uncommitted inserts.
                },
                UncommittedReplace(..) | Initial => {
                    // Convert uncommitted replacements back to Initial parts.
                    // Dropping uncommitted replacements can leave adjacent Initial parts,
                    // so if the previously retained part was also Initial, merge this one with it.
                    if let Some(last @ Span{ data: Initial, ..  }) = restored.last_mut() {
                        last.end = part.end;
                    } else {
                        restored.push(Span { start: part.start, end: part.end, data: Initial });
                    }
                },
            }
        }

        self.parts = restored;
    }

To me, this set of changes seems like the best trade-off among fixing the issue, reducing changes, and maintaining performance. That said, if you're up for something more radical, I also think it'd be reasonable to change the implementation for replace.rs almost entirely -- instead of tracking parts the way it currently does, it could just push incoming changes into a collection sorted by start position, and check for conflicts at that moment (if we want to insert incoming just after the element prior, it conflicts if and only if $prior.start \le incoming.start \le prior.end$). Rendering the result would become a merge operation. Tracking committed versus uncommitted could just be a bool on the Span, and there wouldn't be a need for the State enumeration, since Initial data would be implicit (as the gaps between changes) and the difference between Replacement and Insertion is already implicitly based on whether start == end.

In any case, I've gone on long enough 😅. I'd be happy to submit a PR for (any or all of) the changes I discussed in this comment, if these ideas seem reasonable.

weihanglo · 2024-10-22T01:27:32Z

@saites Thanks for the inspiring solution! I think I quite like the idea of transaction.

I think that'll have the same issue, as in this case, the Solution consists of multiple Replacements. Some of the Replacements will get applied, but when one fails, the code will be left in an invalid state.

I just realized that I was targeting at the wrong level of the code. The current Cargo never generates broken code because it'll revert when compile error happened, unless --broken-code was given, or ctrl-C was signaled before reverting. This is done at per-file level.

While your patch is at a lower Solution level, the fact that the entire solution rolls back makes the previous applied Suggestion stays valid. As a result Cargo is happy to try on next Suggestion, which could maximize the number of Suggestions cargo-fix applies in one iteration. This is exactly the PrioritizeFirst strategy I proposed but dealing it in a smarter way and lower level. Brilliant!

Really looking forward to your pull request :)

blyxyas added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Oct 16, 2024

weihanglo added the Command-fix label Oct 17, 2024

weihanglo added S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. and removed S-triage Status: This issue is waiting on initial triage. labels Oct 19, 2024

weihanglo added S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review and removed S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. labels Oct 22, 2024

saites mentioned this issue Oct 30, 2024

Add transactional semantics to rustfix #14747

Merged

bors closed this as completed in 3dedb85 Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rustfix] Two suggestions applying to the same span cause syntax errors #14699

[rustfix] Two suggestions applying to the same span cause syntax errors #14699

blyxyas commented Oct 16, 2024

saites commented Oct 19, 2024

weihanglo commented Oct 19, 2024

saites commented Oct 21, 2024

weihanglo commented Oct 22, 2024

[rustfix] Two suggestions applying to the same span cause syntax errors #14699

[rustfix] Two suggestions applying to the same span cause syntax errors #14699

Comments

blyxyas commented Oct 16, 2024

Problem

Steps

Possible Solution(s)

Notes

Version

saites commented Oct 19, 2024

Footnotes

weihanglo commented Oct 19, 2024

saites commented Oct 21, 2024

weihanglo commented Oct 22, 2024