Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use OsPath in NormalizedFilePath #446

Merged
merged 27 commits into from
Aug 31, 2022
Merged

Conversation

kokobd
Copy link
Collaborator

@kokobd kokobd commented Jul 25, 2022

Closes #445. Current status:

  • existing tests pass
  • breaks backward compatibility due to the necessity to add MonadThrow to the type of some functions
  • use filepath-compat to avoid CPP, and always use OsPath in NormalizedFilePath
  • provide two sets of functions for NormalizedFilePath. One for FilePath, another for OsPath

lsp-types/src/Language/LSP/Types/Uri.hs Show resolved Hide resolved
lsp-types/src/Language/LSP/Types/Uri.hs Show resolved Hide resolved
lsp-types/src/Language/LSP/Types/Uri.hs Outdated Show resolved Hide resolved
lsp-types/src/Language/LSP/Types/Uri.hs Outdated Show resolved Hide resolved
@kokobd kokobd marked this pull request as ready for review August 11, 2022 21:53
@mergify
Copy link
Contributor

mergify bot commented Aug 14, 2022

refresh

✅ Pull request refreshed

@kokobd
Copy link
Collaborator Author

kokobd commented Aug 14, 2022

@mergify refresh

@mergify
Copy link
Contributor

mergify bot commented Aug 14, 2022

refresh

✅ Pull request refreshed

@kokobd
Copy link
Collaborator Author

kokobd commented Aug 14, 2022

I added a dimension to the test matrix, but Mergify is waiting for the old one. Strange.


toNormalizedFilePath :: FilePath -> NormalizedFilePath
toNormalizedFilePath fp = normalizedFilePath nuri nfp
toNormalizedFilePath :: MonadThrow m => OsPathCompat -> m NormalizedFilePath
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelpj This change leads to about 70 compile errors in HLS. Do we have to fix them one by one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the failure here ultimately comes from decodeUtf, right? Which ultimately represents the fact that a filepath is bytes and a URI should be text. So this is a meaningful failure. We could offer a partial version that really throws, but that also seems like a footgun for us in future.

The only other alternative I can see is to have URI/NormalizedUri also contain an OsPath i.e. bytes rather than text. Unclear how bad that would be.

@kokobd kokobd force-pushed the feat/os-path branch 2 times, most recently from bea0b16 to 02fe4a8 Compare August 28, 2022 05:38
@kokobd
Copy link
Collaborator Author

kokobd commented Aug 29, 2022

@michaelpj After some rewrites, I'm finally happy with the current version. The most notable change is that now we always store a UTF-8 encoded ShortByteString in NormalizedFilePath, so that toNormalizedFilePath :: FilePath -> NormalizedFilePath is still total.


And please only approve if you feel this is good. Let Mergify squash and merge it for us.

Copy link
Collaborator

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking plausible! I worry that maybe this extra encoding/decoding might lose some of the benchmark improvements you were getting?

- name: Cabal configure
shell: bash
run: |
if [ ${{ matrix.ospath }} = "true" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should include this as a flag in the package definition? force-ospath or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

cabal.project Show resolved Hide resolved
lsp-types/src/Language/LSP/Types/Uri.hs Show resolved Hide resolved
Right v' ->
return (NormalizedFilePath (internalNormalizedFilePathToUri v') v)

encodeFilePath :: String -> ShortByteString
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haddock?

lsp-types/src/Language/LSP/Types/Uri.hs Show resolved Hide resolved
fromNormalizedFilePath :: NormalizedFilePath -> FilePath
fromNormalizedFilePath (NormalizedFilePath _ fp) = fp
-- | Extracts 'FilePath' from 'NormalizedFilePath'.
-- The function is total. The 'HasCallStack' constraint is added for debugging purpose only.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing as it immediately calls error! I get the argument: the invariant of NFP is that it contains a correctly encoded string so this should never fail.

However, I also bet we call this a lot. What would happen if we also cached the decoded filepath in NormalizedFilePath? Would it blow out memory usage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, memory usage is the point of using ShortByteString or OsPath instead of FilePath. Despite the encoding/decoding, both CPU time and memory usage decreased in benchmarks!

I saw most use sites of toNormalizedFilePath just extracts the FilePath and pass it to some IO function, so the CPU is not the bottleneck.

normalizedFilePathToOsPath :: MonadThrow m => NormalizedFilePath -> m OsPath
normalizedFilePathToOsPath = unsafePerformIO' . encodeFS . fromNormalizedFilePath

unsafePerformIO' :: (MonadThrow m, NFData a) => IO a -> m a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc?

@@ -249,7 +265,7 @@ genValidUnicodeChar = arbitraryUnicodeChar `suchThat` isCharacter
where isCharacter x = x /= '\65534' && x /= '\65535'

normalizedFilePathSpec :: Spec
normalizedFilePathSpec = do
normalizedFilePathSpec = beforeAll (setFileSystemEncoding utf8) $ do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something actually fail if you don't do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, default file system encoding ignores the failure. See: https://hackage.haskell.org/package/base-4.17.0.0/docs/src/GHC.IO.Encoding.html#initFileSystemEncoding
image

So in the real world, it won't throw any exception. (But I want to test the exception)

Constructs 'NormalizedFilePath' from 'OsPath'. Throws 'IOException' if the conversion fails.
-}
osPathToNormalizedFilePath :: MonadThrow m => OsPath -> m NormalizedFilePath
osPathToNormalizedFilePath = fmap toNormalizedFilePath . unsafePerformIO' . decodeFS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a little confused, would you mind writing a comment somewhere explaining the deal with the various encodings? The idea is something like: we always decode OsPaths using the system encoding into a string, and then we re-encode them as UTF8 to store them in NFP?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have missed all the discussion in the PR, so I need a comment explaining why we don't do the obvious thing (which would be storing the OsPath inside NormalizedFilePath).

@mergify mergify bot merged commit 78dc807 into haskell:master Aug 31, 2022
@michaelpj
Copy link
Collaborator

Oops, I missed the label! oh well, if you want to address any of the comments you can do a follow-up

@hasufell
Copy link
Member

Looking plausible! I worry that maybe this extra encoding/decoding might lose some of the benchmark improvements you were getting?

Does this still use old directory/base for file operations? Because all of them do encoding/decoding at the FFI boundary. The only thing you'll gain is lower memory footprint due to avoiding String.

Can't say which one has the bigger impact.

@kokobd
Copy link
Collaborator Author

kokobd commented Aug 31, 2022

Does this still use old directory/base for file operations?

NormalizedFilePath in lsp-types doesn't use directory or some FilePath based IO functions. In HLS, we still use the old directory/base. (HLS can not upgrade before GHC 9.6)

@kokobd kokobd deleted the feat/os-path branch August 31, 2022 17:41
normalizedFilePath :: NormalizedUri -> FilePath -> NormalizedFilePath
normalizedFilePath nuri nfp = NormalizedFilePath nuri nfp
decodeFilePath :: ShortByteString -> Either UnicodeException String
decodeFilePath = fmap T.unpack . T.decodeUtf8' . BS.fromShort
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always speed this up with something like utf8-string or rolling our own utf-8 decoding

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +28 to +29
osPathToNormalizedFilePath :: MonadThrow m => OsPath -> m NormalizedFilePath
osPathToNormalizedFilePath = fmap toNormalizedFilePath . unsafePerformIO' . decodeFS
Copy link
Collaborator

@pepeiborra pepeiborra Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of decodeFS you could use the pure decodeWith and isolate the unsafePerformIO in a single global:

osPathToNormalizedFilePath :: MonadThrow m => OsPath -> Either EncodingException NormalizedFilePath
osPathToNormalizedFilePath = fmap toNormalizedFilePath . decodeWith systemEnc

{-# NOINLINE systemEnc #-}
systemEnc :: TextEncoding
systemEnc = unsafePerformIO getFileSystemEncoding

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in #453. Thank you!

thomasjm pushed a commit to codedownio/lsp that referenced this pull request Nov 3, 2022
* use filepath-compat

* make NormalizedFilePath use OsPath

* remove filepath-compat, use ShortByteString instead

* fix ci

* enable OS_PATH

* upgrade ghc & don't set OS_PATH for ghc 8.6.5 in CI

* remove outdated comment

* skip a bad ci combination

* extract OsPath related CPP to a standalone module

* fix OsPath.Compat

* add empty NormalizedUri and NormalizedFilePath

* add partial function for compatibility & use decodeFS

* run stylish-haskell

* lock index-state

* bump versions of lsp and lsp-types

* fix self version bound

* always use utf8 ShortByteString in NormalizedFilePath

* revert test changes

* remove OsPath.Compat

* bump lsp-test version according to pvp

* add test cases for OsPath

* fix test case

* add some doc comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to OsPath
4 participants