Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[idea] Improving accuracy of change application #7

Open
gveres opened this issue Oct 27, 2024 · 4 comments
Open

[idea] Improving accuracy of change application #7

gveres opened this issue Oct 27, 2024 · 4 comments

Comments

@gveres
Copy link
Contributor

gveres commented Oct 27, 2024

Hi all!
I've been having issues with the llm messing up the line numbers for the suggested changes (working with Claude Sonnet 3.5, the same model that Cursor.ai seems to be using). I have a version that clearly tags all lines with their original line numbers to make it more explicit to the LLM, which improves the situation somewhat, but issues still remain especially for large files.

After giving this some thought and discussing with Claude (of course) about possible improvements, I'm thinking about implementing something like this:

  • before sending code region to the llm, I'd assign random non-numeric IDs to each line and store those as text properties on the first character of each line
  • send the ids together with the code to the llm
  • instruct the llm to use these id-s when referring to lines
  • on change application, use a word-distance tolerance to find the referred lines even in case of slight id-messing/hallucination

It's quite a bit more involved than the line-number based solution, but should possibly lead to more accurate results maybe...
Any thoughts on this?

@lanceberge
Copy link
Owner

lanceberge commented Oct 27, 2024

Hey, I really appreciate the fact that you're using the package, that you've brought this issue to my attention, and that you've put all of this thought into improving it!

The current heuristic is to calculate the line change offset of each change. This is the difference between the lines added and the lines removed for each change. We then add 3 to represent the git merge >>>>, <<<<, ==== lines.

In theory, the current approach should be sufficient - just having the line numbers is likely to be the simplest and most minimal solution. My understanding of your approach is that we would need something like a map of line IDs to the previous line number. But then we still need to maintain these lines offsets to map changes to the previous line numbers - because the previous line numbers will be changing as changes get added to the buffer. Is there a different approach you have in mind to map these line IDs to the lines themselves?

Before implementing this, could you provide some detail on how to reproduce this issue? You mentioned that it's an issue mostly with large files. Were you calling elysium-query with a region selected or without it? If it's without it, then it sends the entire buffer to the query, otherwise, it only sends the selected region. If a region was selected, was it a rather large region (~100 lines)? Around how many sets of changes (how many git diff hunks) did elysium add to the buffer when you saw the issue

I suspect that the issue is a bug in the logic where I parse the feedback of the LLM and calculate the line offsets. If you can provide a repro, then I can try to replicate it with my unit tests and look into a solution. If that doesn't work, then I'd love to give further consideration to what you've proposed.

Again, thanks!

@gveres
Copy link
Contributor Author

gveres commented Oct 28, 2024

Hi Lance!

I'll put together a reproduction, but here's what my debugging shows:

  • the change extraction and application logic (in elysium itself) is solid
  • the LLM itself (Claude Sonnet in my case), despite very clear instructions seems to have line numbers wrong sometimes in the recommendations
  • I have an experimental branch, where I label all lines with line numbers before sending to the llm (so that it doesn't need to count) and it does improve the situation, but it still makes random +/-1 errors, sometimes a bit bigger
  • reading up on this I have found that LLMs are not strong in working with numbers (or even echoing them back), but are much more precise with "word-like" things
  • so this gave me the idea that if the LLM can refer to alphabetical IDs when talking about lines, it might actually become more accurate

I hope the above makes some sense - I'll do a bit of prototyping and see where it takes me.
I really like the package, I ultimately would like to have the recommend-diff-accept workflow working well in Emacs the way it does in cursor.ai and I think it's a very good foundation as-is.

@lanceberge
Copy link
Owner

Cool, that makes a lot of sense that a model optimized for parsing language would be better at handling alphabetical IDs than numeric. Is the overall idea to have a map of these alphabetical IDs to line numbers, then to maintain the same line-offset logic as the changes get patched in? Also I'd be happy to take a look at the branch that puts numbers before each line. This does seem like a much better technique than just sending the model the overall line numbers in the initial prompt then trusting it to maintain that.

Regarding improving the diff workflow, I haven't used cursor too much. I did have an idea of having a transient buffer with keybindings - something like a hydra that pops up after Elysium changes get applied. It would have bindings like n (next hunk) p (previous) a (accept) r (reject) A (accept all), etc. Would this level the playing field with cursor's diff-accepting or is there something else you had in mind?

@mavenlin
Copy link

I did have an idea of having a transient buffer with keybindings - something like a hydra that pops up after Elysium changes get applied. It would have bindings like n (next hunk) p (previous) a (accept) r (reject) A (accept all), etc. Would this level the playing field with cursor's diff-accepting or is there something else you had in mind?

Very emacsy idea! One idea I have is to reuse magit's popup for some of the functionalities. For example, we can simply apply all the suggested diffs and then use magit to interactively stage the hunks. Or maybe the conflict resolving mode of magit, where three buffers: ours, theirs and merged is shown. Maybe we could reuse some of these components to build a smoother flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants