Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Sub-optimal diff highlighting in a simple change #246

Open
waldyrious opened this issue Jul 11, 2020 · 2 comments
Open

🐛 Sub-optimal diff highlighting in a simple change #246

waldyrious opened this issue Jul 11, 2020 · 2 comments

Comments

@waldyrious
Copy link
Contributor

Adding a separate issue for the topic originally mentioned at #245:

Here's the delta output (notice the within-line highlights of removed/added text):

Screenshot 2020-07-11 at 22 37 38

And here's what diff-so-fancy/diff-highlight shows (not the best possible result, but a better default):

Finally, here's delta with --word-diff-regex = ., which is the most accurate highlighting:

I'd expect delta to provide better highlighting by default — not necessarily the best possible result shown above; doing the same as diff-highlight would have been satisfactory.

@waldyrious
Copy link
Contributor Author

Responding to @dandavison's comment in #245:

Do you think diff-highlight and diff-so-fancy in delta should default to using the original, simpler, diff highlight algorithm? On the one hand there is something appealing in being able to say that delta --diff-highlight aims for a pixel-for-pixel emulation, and OTOH I do find that the dynamic programming algorithm often gives more helpful results.

I would prefer the diff parsing algorithm to be kept orthogonal to the display style. diff-highlight's algorithm won't be better than delta's all the time (or, I imagine, most of the time), so it would be a disservice to users to default to a worse (on average) algorithm only because they prefer a given visual style.

@waldyrious
Copy link
Contributor Author

I just came across a similar issue, where diff-highlight's algorithm produces a better output than delta's:

delta:
Screenshot 2020-07-12 at 15 30 15

diff-highlight:
Screenshot 2020-07-12 at 15 30 47

There is indeed, in semantic terms, no shared content between the two larger hunks, so diff-highlight is "right"* to not highlight any content in them; but to be fair, markup characters and whitespace are shared between the hunks, by the very nature of HTML. Maybe delta's algorithm could try to ignore such markup characters (and maybe whitespace) when calculating the similarity between two blocks?

* Not by its merits, but for the same reason a stopped clock is right twice a day — it just doesn't try to match differently-sized hunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant