-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use OsPath in NormalizedFilePath #446
Conversation
✅ Pull request refreshed |
@mergify refresh |
✅ Pull request refreshed |
I added a dimension to the test matrix, but Mergify is waiting for the old one. Strange. |
|
||
toNormalizedFilePath :: FilePath -> NormalizedFilePath | ||
toNormalizedFilePath fp = normalizedFilePath nuri nfp | ||
toNormalizedFilePath :: MonadThrow m => OsPathCompat -> m NormalizedFilePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelpj This change leads to about 70 compile errors in HLS. Do we have to fix them one by one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the failure here ultimately comes from decodeUtf
, right? Which ultimately represents the fact that a filepath is bytes and a URI should be text. So this is a meaningful failure. We could offer a partial version that really throws, but that also seems like a footgun for us in future.
The only other alternative I can see is to have URI/NormalizedUri also contain an OsPath
i.e. bytes rather than text. Unclear how bad that would be.
bea0b16
to
02fe4a8
Compare
@michaelpj After some rewrites, I'm finally happy with the current version. The most notable change is that now we always store a UTF-8 encoded And please only approve if you feel this is good. Let Mergify squash and merge it for us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking plausible! I worry that maybe this extra encoding/decoding might lose some of the benchmark improvements you were getting?
- name: Cabal configure | ||
shell: bash | ||
run: | | ||
if [ ${{ matrix.ospath }} = "true" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should include this as a flag in the package definition? force-ospath
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea!
Right v' -> | ||
return (NormalizedFilePath (internalNormalizedFilePathToUri v') v) | ||
|
||
encodeFilePath :: String -> ShortByteString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haddock?
fromNormalizedFilePath :: NormalizedFilePath -> FilePath | ||
fromNormalizedFilePath (NormalizedFilePath _ fp) = fp | ||
-- | Extracts 'FilePath' from 'NormalizedFilePath'. | ||
-- The function is total. The 'HasCallStack' constraint is added for debugging purpose only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confusing as it immediately calls error! I get the argument: the invariant of NFP is that it contains a correctly encoded string so this should never fail.
However, I also bet we call this a lot. What would happen if we also cached the decoded filepath in NormalizedFilePath
? Would it blow out memory usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, memory usage is the point of using ShortByteString
or OsPath
instead of FilePath
. Despite the encoding/decoding, both CPU time and memory usage decreased in benchmarks!
I saw most use sites of toNormalizedFilePath
just extracts the FilePath
and pass it to some IO function, so the CPU is not the bottleneck.
normalizedFilePathToOsPath :: MonadThrow m => NormalizedFilePath -> m OsPath | ||
normalizedFilePathToOsPath = unsafePerformIO' . encodeFS . fromNormalizedFilePath | ||
|
||
unsafePerformIO' :: (MonadThrow m, NFData a) => IO a -> m a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc?
@@ -249,7 +265,7 @@ genValidUnicodeChar = arbitraryUnicodeChar `suchThat` isCharacter | |||
where isCharacter x = x /= '\65534' && x /= '\65535' | |||
|
|||
normalizedFilePathSpec :: Spec | |||
normalizedFilePathSpec = do | |||
normalizedFilePathSpec = beforeAll (setFileSystemEncoding utf8) $ do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does something actually fail if you don't do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, default file system encoding ignores the failure. See: https://hackage.haskell.org/package/base-4.17.0.0/docs/src/GHC.IO.Encoding.html#initFileSystemEncoding
So in the real world, it won't throw any exception. (But I want to test the exception)
Constructs 'NormalizedFilePath' from 'OsPath'. Throws 'IOException' if the conversion fails. | ||
-} | ||
osPathToNormalizedFilePath :: MonadThrow m => OsPath -> m NormalizedFilePath | ||
osPathToNormalizedFilePath = fmap toNormalizedFilePath . unsafePerformIO' . decodeFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a little confused, would you mind writing a comment somewhere explaining the deal with the various encodings? The idea is something like: we always decode OsPaths using the system encoding into a string, and then we re-encode them as UTF8 to store them in NFP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have missed all the discussion in the PR, so I need a comment explaining why we don't do the obvious thing (which would be storing the OsPath
inside NormalizedFilePath
).
Oops, I missed the label! oh well, if you want to address any of the comments you can do a follow-up |
Does this still use old directory/base for file operations? Because all of them do encoding/decoding at the FFI boundary. The only thing you'll gain is lower memory footprint due to avoiding String. Can't say which one has the bigger impact. |
|
normalizedFilePath :: NormalizedUri -> FilePath -> NormalizedFilePath | ||
normalizedFilePath nuri nfp = NormalizedFilePath nuri nfp | ||
decodeFilePath :: ShortByteString -> Either UnicodeException String | ||
decodeFilePath = fmap T.unpack . T.decodeUtf8' . BS.fromShort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always speed this up with something like utf8-string
or rolling our own utf-8 decoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
osPathToNormalizedFilePath :: MonadThrow m => OsPath -> m NormalizedFilePath | ||
osPathToNormalizedFilePath = fmap toNormalizedFilePath . unsafePerformIO' . decodeFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of decodeFS
you could use the pure decodeWith
and isolate the unsafePerformIO
in a single global:
osPathToNormalizedFilePath :: MonadThrow m => OsPath -> Either EncodingException NormalizedFilePath
osPathToNormalizedFilePath = fmap toNormalizedFilePath . decodeWith systemEnc
{-# NOINLINE systemEnc #-}
systemEnc :: TextEncoding
systemEnc = unsafePerformIO getFileSystemEncoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied in #453. Thank you!
* use filepath-compat * make NormalizedFilePath use OsPath * remove filepath-compat, use ShortByteString instead * fix ci * enable OS_PATH * upgrade ghc & don't set OS_PATH for ghc 8.6.5 in CI * remove outdated comment * skip a bad ci combination * extract OsPath related CPP to a standalone module * fix OsPath.Compat * add empty NormalizedUri and NormalizedFilePath * add partial function for compatibility & use decodeFS * run stylish-haskell * lock index-state * bump versions of lsp and lsp-types * fix self version bound * always use utf8 ShortByteString in NormalizedFilePath * revert test changes * remove OsPath.Compat * bump lsp-test version according to pvp * add test cases for OsPath * fix test case * add some doc comments
Closes #445. Current status:
filepath-compat
to avoid CPP, and always useOsPath
inNormalizedFilePath
NormalizedFilePath
. One forFilePath
, another forOsPath