-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically detect (TDL) file encoding from first line #169
Comments
@mcmillanmajora I think this is pretty straightforward, but let me know if anything is unclear. |
For extracting the encoding from the line, can I assume that the encoding will be bounded by |
Not quite. The
Also, I think those |
Okay, I think I've got a solution. What's the best way to go about setting up tests for this? I'm not entirely sure how to test it without making a bunch of test case files. |
Create a PyTest fixture that setups up a temporary directory and writes some files to it, e.g., one with the And this might be helpful if the documentation is confusing: https://stackoverflow.com/questions/36070031/creating-a-temporary-directory-in-pytest |
Btw I already have a solution for #172, so don't worry about that part. |
Sorry I rushed those comments before I went for dinner. To clarify, the pytest fixture should create a temporary directory and write the files each time the tests are run, and they will be cleaned up afterwards. Except for creating the files, much of that should be automatically handled by pytest. There will not be any files that are permanently created and checked into the repo. |
Here's some more info on using You can also see how it's used in the mrs_semi_test.py unit test file. It looks like magic because the Here's some example test files:
|
@mcmillanmajora did you have some code to commit? If so please submit a PR and we can fix remaining issues there. |
Note: temporary files opened with pytest appear to default to an ascii codecs so there's a unicode error when trying to write the non-ascii characters to the temporary files.
Unicode error fixed; was missing a fixture declaration.
TDL files seem to follow the emacs convention of having the first line specify the encoding, e.g.:
But often there's this odd string:
If we could detect these encoding statements, we could (re)open the file with the specified encoding, instead of making the user specify it themselves. Other settings on this line can probably be ignored, such as
Mode: TDL
andindent-tabs-mode: nil
.This may be useful for non-TDL files as well, so it could be implemented in
delphin.util
. Here's some pseudocode:Also see here: https://www.python.org/dev/peps/pep-0263/#defining-the-encoding
Edit: changed function to return encoding instead of an open file object
The text was updated successfully, but these errors were encountered: