-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reintroduce header
patterns for filetype detection
#3208
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The original meaning of foundDef was: "we already found the final syntax definition in a user's custom syntax file". After introducing signatures its meaning became: "we found some potential syntax definition in a user's custom syntax file, but we don't know yet if it's the final one". This makes the code confusing and actually buggy. At least one bug is that if we found some potential filename matches in the user's custom syntax files, we don't search for more matches in the built-in syntax files. Which is wrong: we should keep searching for as many potential matches as possible, in both user's and built-in syntax files, to select the best one among them. Fix that by restoring the original meaning of foundDef and updating the logic accordingly.
No need to parse a syntax YAML file if we are not going to use it, it's a waste of CPU cycles.
As a preparation for reintroducing header matches.
Replacing header patterns with signature patterns was a mistake, since both have their own uses. So restore support for header regex, while keeping support for signature regex as well.
Replacing header patterns with signature patterns was a mistake, since both are quite different from each other, and both have their uses. In fact, this caused a serious regression: for such files as shell scripts without *.sh extension but with #!/bin/sh inside, filetype detection does not work at all anymore. Since both header and signature patterns are useful, reintroduce support for header patterns while keeping support for signature patterns as well and make both work nicely together. Also, unlike in the old implementation (before signatures were introduced), ensure that filename matches take precedence over header matches, i.e. if there is at least one filename match found, all header matches are ignored. This makes the behavior more deterministic and prevents previously observed issues like zyedidia#2894 and zyedidia#3054: wrongly detected filetypes caused by some overly general header patterns. Precisely, the new behavior is: 1. if there is at least one filename match, use filename matches only 2. if there are no filename matches, use header matches 3. in both cases, try to use signatures to find the best match among multiple filename or header matches
Turning `header` patterns into `signature` patterns in all syntax files was a mistake. The two are different things. In almost all syntax files those patterns are things like shebangs or <?xml ... ?> or <!DOCTYPE html5> i.e. things that: 1. can be (and should be) used for detecting the filetype when there is no `filename` match (and that is actually the purpose of those patterns, so it's a regression that it doesn't work anymore). 2. should only occur in the first line of the file, not in the first 100 lines or so. In other words, the old `header` semantics was exactly what was needed for those filetypes, while the new `signature` semantics makes little sense for them. So replace `signature` back with `header` in most syntax files. Keep `signature` only in C++ and Objective-C syntax files, for which it was actually introduced.
To make it more clear. Why Buffer?
Purely cosmetic change: make the code a bit more readable by reducing its visual "density".
JoeKar
reviewed
Mar 24, 2024
Thank you for taking care and cleaning it up that far! |
JoeKar
approved these changes
Mar 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't find a problem so far and my test files are supported in the expected way.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replacing
header
patterns withsignature
patterns in #2819 was a mistake, since both are quite different from each other, and both have their uses. In fact, this caused a serious regression: for such files as shell scripts without*.sh
extension but with#!/bin/sh
inside (and other similar common use cases that have nofilename
matches and thus were relying onheader
matches), filetype detection does not work at all anymore.Since both header and signature patterns are useful, reintroduce support for header patterns while keeping support for signature patterns as well, and make both work nicely together.
Also, unlike in the old implementation (before signatures were introduced), ensure that
filename
matches take precedence overheader
matches, i.e. if there is at least one filename match found, all header matches are ignored. This makes the behavior more deterministic and prevents previously observed issues like #2894 and #3054: wrongly detected filetypes caused by some overly general header patterns.Precisely, the new behavior is:
multiple filename or header matches
Changed
signature
patterns back toheader
patterns in almost all syntax files, except C++ and Objective-C (for whichsignature
was actually introduced).Also done a bit of refactoring and bugfixing of the code in
UpdateRules()
(see commit messages for the details).Fixes #3201