Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Haskell script with Stack uses multiple versions of one package #1957

Closed
pharpend opened this issue Mar 28, 2016 · 45 comments · Fixed by #2497
Closed

Running Haskell script with Stack uses multiple versions of one package #1957

pharpend opened this issue Mar 28, 2016 · 45 comments · Fixed by #2497

Comments

@pharpend
Copy link
Contributor

Hello, there

In Snowdrift, we have a database setup script called sdb.hs. It uses
stack along with the turtle package to run as a quasi-shell script.

When I tried to run it today to start the database server, I got this
error message:


sdb.hs:70:47:
    Couldn't match type ‘T.Text’
                   with ‘Text’
    NB: ‘T.Text’
          is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
        ‘Text’ is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
    Expected type: String -> Text
      Actual type: String -> T.Text
    In the first argument of ‘map’, namely ‘T.pack’
    In the second argument of ‘procs’, namely ‘(map T.pack as')’

sdb.hs:140:37:
    Couldn't match type ‘T.Text’
                   with ‘Text’
    NB: ‘T.Text’
          is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
        ‘Text’ is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
    Expected type: Char -> Text
      Actual type: Char -> T.Text
    In the first argument of ‘(<$>)’, namely ‘T.singleton’
    In the second argument of ‘(>>)’, namely
      ‘(T.singleton <$> (char '#' <|> newline))’

sdb.hs:221:41:
    Couldn't match type ‘T.Text’
                   with ‘Text’
    NB: ‘T.Text’
          is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
        ‘Text’ is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
    Expected type: Char -> Text
      Actual type: Char -> T.Text
    In the second argument of ‘(.)’, namely ‘T.singleton’
    In the first argument of ‘fmap’, namely ‘(Just . T.singleton)’

As you can see, stack is using multiple versions of text, and coming up with this ridiculous error.

Bulleted information:

Version 1.0.5, Git revision 51492566184b4f98106de0e8199c8962dace567e x86_64
  • Steps to reproduce:
git clone https://git.gnu.io/pharpend/snowdrift.git -b mech-sibling
cd snowdrift
./sdb.hs init
@silky
Copy link
Contributor

silky commented Mar 28, 2016

interestingly following your steps i can't reproduce this error, and i have the same version of stack

03:02 PM noon ∈ dev>stack --version
Version 1.0.5, Git revision 51492566184b4f98106de0e8199c8962dace567e x86_64

is there something else going on? can you provide more information about your system?

@pharpend
Copy link
Contributor Author

It's reasonably up-to-date (updated yesterday) Arch Linux. uname -a:

 Linux valentine 4.4.5-1-custom #1 SMP PREEMPT Sun Mar 27 13:50:03 MDT 2016 x86_64 GNU/Linux

PostgreSQL version 9.5.1-2.

~/.stack/global/stack.yaml:

flags: {}
packages: []
extra-deps: []
resolver: lts-3.7

@pharpend
Copy link
Contributor Author

I tried it on a fresh Ubuntu VM, couldn't reproduce it either. I was able to reproduce it in a separate directory on Arch. I'm going to work some more to do so. Maybe add a stack setup && stack build step somewhere? I'm currently sitting through the world's longest stack build command.

@pharpend
Copy link
Contributor Author

Updating the resolver in stack.yaml seemed to fix it on my Arch machine, somehow.

I don't want to close this until we figure out what went wrong, though. there is clearly a bug somewhere.

I opened a merge request at Snowdrift to update the resolver: https://git.gnu.io/snowdrift/snowdrift/merge_requests/177.

@mgsloan
Copy link
Contributor

mgsloan commented Mar 28, 2016

I don't think there's necessarily a bug here. The root cause is that runghc doesn't hide all the packages and tell ghc to explicitly use specific versions (see #1208). So, usually the existence of multiple versions of the same package is invisible to the user.

AFAIK, stack only has duplicate versions of the package when it exists in the global DB. Have you used cabal with your package databases? The shadowing of package DBs should cause a global DB version of text to fail, though. What does stack exec -- describe text say? It should give multiple descriptions of the package. I'm particularly interested in the package installation location.

Have you used cabal-install with your package DBs? That's one way this could happen.

@mgsloan mgsloan added this to the Support milestone Mar 28, 2016
@chreekat
Copy link
Member

As the author of the sdb.hs in question, let me know if there's something I could do differently.

Can I force stack to use the package-local sandbox when run as a shebang interpreter? This script is only designed to be used with the Snowdrift project anyway. (Though I suppose it might be useful to other Yesoders... it makes a ton of sense to me that people would use a local cluster for dev, rather than depending on a system-level instance of Postgres.)

  • Edit: of course, it's not useful to anyone currently since it has a bunch of Snowdrift-specific stuff baked in. :)

@mitchellwrosen
Copy link
Contributor

I just ran into this myself.

[1 of 1] Compiling Main             ( insert.hs, insert.o )

insert.hs:11:40:
    Couldn't match type ‘System.Random.MWC.Gen
                           (Control.Monad.Primitive.PrimState IO)’
                   with ‘mwc-random-0.13.3.2:System.Random.MWC.Gen
                           (Control.Monad.Primitive.PrimState IO)’
    NB: ‘System.Random.MWC.Gen’
          is defined in ‘System.Random.MWC’ in package ‘mwc-random-0.13.4.0’
        ‘mwc-random-0.13.3.2:System.Random.MWC.Gen’
          is defined in ‘System.Random.MWC’ in package ‘mwc-random-0.13.3.2’
    Expected type: mwc-random-0.13.3.2:System.Random.MWC.GenIO
      Actual type: System.Random.MWC.GenIO

Verbose output of stack -v runghc insert.hs

Version 1.0.4 x86_64
2016-04-04 18:09:53.131443: [debug] Checking for project config at: /home/mitchell/sentenai/stream-simulation/stack.yaml @(stack_COhPD0SUWBs7s1VUFLBNcf:Stack.Config src/Stack/Config.hs:761:9)
2016-04-04 18:09:53.131906: [debug] Loading project config file stack.yaml @(stack_COhPD0SUWBs7s1VUFLBNcf:Stack.Config src/Stack/Config.hs:779:13)
2016-04-04 18:09:53.135487: [debug] Run process: ldd /home/mitchell/.local/bin/stack @(stack_COhPD0SUWBs7s1VUFLBNcf:System.Process.Read src/System/Process/Read.hs:269:3)
2016-04-04 18:09:53.151454: [debug] Trying to decode /home/mitchell/.stack/build-plan-cache/x86_64-linux/lts-5.1.cache @(stack_COhPD0SUWBs7s1VUFLBNcf:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:55:5)
2016-04-04 18:09:53.172857: [debug] Success decoding /home/mitchell/.stack/build-plan-cache/x86_64-linux/lts-5.1.cache @(stack_COhPD0SUWBs7s1VUFLBNcf:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:64:13)
2016-04-04 18:09:53.173615: [debug] Trying to decode /home/mitchell/.stack/indices/Hackage/00-index.cache @(stack_COhPD0SUWBs7s1VUFLBNcf:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:55:5)
2016-04-04 18:09:53.443407: [debug] Success decoding /home/mitchell/.stack/indices/Hackage/00-index.cache @(stack_COhPD0SUWBs7s1VUFLBNcf:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:64:13)
2016-04-04 18:09:53.467318: [debug] Run process: ghc --info @(stack_COhPD0SUWBs7s1VUFLBNcf:System.Process.Read src/System/Process/Read.hs:269:3)
2016-04-04 18:09:53.534058: [debug] Run process: ghc --numeric-version @(stack_COhPD0SUWBs7s1VUFLBNcf:System.Process.Read src/System/Process/Read.hs:269:3)
2016-04-04 18:09:53.575019: [debug] Run process: ghc-pkg --no-user-package-db field --simple-output Cabal version @(stack_COhPD0SUWBs7s1VUFLBNcf:System.Process.Read src/System/Process/Read.hs:269:3)
2016-04-04 18:09:53.631077: [debug] Run process: ghc-pkg --no-user-package-db list --global @(stack_COhPD0SUWBs7s1VUFLBNcf:System.Process.Read src/System/Process/Read.hs:269:3)
2016-04-04 18:09:53.687518: [debug] Run process: runghc insert.hs @(stack_COhPD0SUWBs7s1VUFLBNcf:Stack.Exec src/Stack/Exec.hs:51:5)

Weirdly, stack runghc looks at my stack.yaml and proceeds to do nothing with it? That is,

$ stack list-dependencies | grep mwc-random
mwc-random 0.13.3.2

yet

$ ghc-pkg --no-user-package-db list --global | grep mwc-random
    mwc-random-0.13.4.0

Now, I have no idea why I have mwc-random-0.13.4.0 installed in the global package database, but isn't the resolver lts-5.1 supposed to override that?

Thanks.

@mgsloan
Copy link
Contributor

mgsloan commented Apr 5, 2016

It really should. On the surface, it looks like ghc is preferring the global DB, for some bizarre reason.

I asked some diagnostic questions in an early comment, and got no reply. I will reiterate them. Anyone affected by this, feel free to provide the output:

The shadowing of package DBs should cause a global DB version of text to fail, though. What does stack exec -- describe text say? It should give multiple descriptions of the package. I'm particularly interested in the package installation location.

Have you used cabal-install with your package DBs?

Furthermore, are you using a system install of ghc? I have a suspicion that this is much less likely to happen with stack managed ghc installs, because you're less likely to use cabal-install on the global DB

@mitchellwrosen
Copy link
Contributor

Ah, I didn't catch that comment. What is describe? I'm on arch linux and there is no describe package.

@mgsloan
Copy link
Contributor

mgsloan commented Apr 5, 2016

Oh I meant stack exec -- ghc-pkg describe text

@mitchellwrosen
Copy link
Contributor

Ah, here you go:

name: mwc-random
version: 0.13.3.2
import-dirs: /home/mitchell/.stack/snapshots/x86_64-linux/lts-5.1/7.10.3/lib/x86_64-linux-ghc-7.10.3/mwc-random-0.13.3.2-0SEz8XRK7wOF4DF0uaF48y
library-dirs: /home/mitchell/.stack/snapshots/x86_64-linux/lts-5.1/7.10.3/lib/x86_64-linux-ghc-7.10.3/mwc-random-0.13.3.2-0SEz8XRK7wOF4DF0uaF48y
data-dir: /home/mitchell/.stack/snapshots/x86_64-linux/lts-5.1/7.10.3/share/x86_64-linux-ghc-7.10.3/mwc-random-0.13.3.2
hs-libraries: HSmwc-random-0.13.3.2-0SEz8XRK7wOF4DF0uaF48y
depends:
    base-4.8.2.0-0d6d1084fbc041e1cded9228e80e264d
    primitive-0.6.1.0-b2a7b9f8d5591c0d4ce7ef238a3217d2
    time-1.5.0.1-edbd1a50e7922b396ada189ab8e8523b
    vector-0.11.0.0-299aefb173ce5a731565d31f609a0cfd
haddock-interfaces: /home/mitchell/.stack/snapshots/x86_64-linux/lts-5.1/7.10.3/doc/mwc-random-0.13.3.2/mwc-random.haddock
haddock-html: /home/mitchell/.stack/snapshots/x86_64-linux/lts-5.1/7.10.3/doc/mwc-random-0.13.3.2
pkgroot: "/home/mitchell/.stack/snapshots/x86_64-linux/lts-5.1/7.10.3"
---
name: mwc-random
version: 0.13.4.0
import-dirs: /usr/lib/ghc-7.10.3/site-local/mwc-random-0.13.4.0
library-dirs: /usr/lib/ghc-7.10.3/site-local/mwc-random-0.13.4.0
data-dir: /usr/share/x86_64-linux-ghc-7.10.3/mwc-random-0.13.4.0
hs-libraries: HSmwc-random-0.13.4.0-0lrQ1SkkNA85sa8eZ98xQk
depends:
    base-4.8.2.0-0d6d1084fbc041e1cded9228e80e264d
    primitive-0.6.1.0-b2a7b9f8d5591c0d4ce7ef238a3217d2
    time-1.5.0.1-edbd1a50e7922b396ada189ab8e8523b
    vector-0.11.0.0-299aefb173ce5a731565d31f609a0cfd
haddock-interfaces: /usr/share/doc/haskell-mwc-random/html/mwc-random.haddock
haddock-html: /usr/share/doc/haskell-mwc-random/html
pkgroot: "/usr/lib/ghc-7.10.3"

I do have a system ghc install, but only because the Arch Linux stack package depends on it. I've never used cabal-install to install anything globally.

@pharpend
Copy link
Contributor Author

pharpend commented Apr 6, 2016

Here's what happens when I run stack exec -- ghc-pkg describe text:

name: text
version: 1.2.2.1
id: text-1.2.2.1-d0adb978563e9f52dc308d1d0db7212c
key: text_HmqVQnZSpjaC156ABqPhne
license: BSD3
copyright: 2009-2011 Bryan O'Sullivan, 2008-2009 Tom Harper
maintainer: Bryan O'Sullivan <[email protected]>
homepage: https://github.com/bos/text
synopsis: An efficient packed Unicode text type.
description:
    .
    An efficient packed, immutable Unicode text type (both strict and
    lazy), with a powerful loop fusion optimization framework.
    .
    The 'Text' type represents Unicode character strings, in a time and
    space-efficient manner. This package provides text processing
    capabilities that are optimized for performance critical use, both
    in terms of large data quantities and high speed.
    .
    The 'Text' type provides character-encoding, type-safe case
    conversion via whole-string case conversion functions. It also
    provides a range of functions for converting 'Text' values to and from
    'ByteStrings', using several standard encodings.
    .
    Efficient locale-sensitive support for text IO is also supported.
    .
    These modules are intended to be imported qualified, to avoid name
    clashes with Prelude functions, e.g.
    .
    > import qualified Data.Text as T
    .
    To use an extended and very rich family of functions for working
    with Unicode text (including normalization, regular expressions,
    non-standard encodings, text breaking, and locales), see
    the @text-icu@ package:
    <http://hackage.haskell.org/package/text-icu>
category: Data, Text
author: Bryan O'Sullivan <[email protected]>
exposed: True
exposed-modules:
    Data.Text Data.Text.Array Data.Text.Encoding
    Data.Text.Encoding.Error Data.Text.Foreign Data.Text.IO
    Data.Text.Internal Data.Text.Internal.Builder
    Data.Text.Internal.Builder.Functions
    Data.Text.Internal.Builder.Int.Digits
    Data.Text.Internal.Builder.RealFloat.Functions
    Data.Text.Internal.Encoding.Fusion
    Data.Text.Internal.Encoding.Fusion.Common
    Data.Text.Internal.Encoding.Utf16 Data.Text.Internal.Encoding.Utf32
    Data.Text.Internal.Encoding.Utf8 Data.Text.Internal.Functions
    Data.Text.Internal.Fusion Data.Text.Internal.Fusion.CaseMapping
    Data.Text.Internal.Fusion.Common Data.Text.Internal.Fusion.Size
    Data.Text.Internal.Fusion.Types Data.Text.Internal.IO
    Data.Text.Internal.Lazy Data.Text.Internal.Lazy.Encoding.Fusion
    Data.Text.Internal.Lazy.Fusion Data.Text.Internal.Lazy.Search
    Data.Text.Internal.Private Data.Text.Internal.Read
    Data.Text.Internal.Search Data.Text.Internal.Unsafe
    Data.Text.Internal.Unsafe.Char Data.Text.Internal.Unsafe.Shift
    Data.Text.Lazy Data.Text.Lazy.Builder Data.Text.Lazy.Builder.Int
    Data.Text.Lazy.Builder.RealFloat Data.Text.Lazy.Encoding
    Data.Text.Lazy.IO Data.Text.Lazy.Internal Data.Text.Lazy.Read
    Data.Text.Read Data.Text.Unsafe
hidden-modules: Data.Text.Show
trusted: False
import-dirs: /usr/lib/ghc-7.10.3/site-local/text-1.2.2.1
library-dirs: /usr/lib/ghc-7.10.3/site-local/text-1.2.2.1
data-dir: /usr/share/x86_64-linux-ghc-7.10.3/text-1.2.2.1
hs-libraries: HStext-1.2.2.1-HmqVQnZSpjaC156ABqPhne
depends:
    array-0.5.1.0-960bf9ae8875cc30355e086f8853a049
    base-4.8.2.0-0d6d1084fbc041e1cded9228e80e264d
    binary-0.7.5.0-5784fd031a720c3b84e73006e444c7ca
    bytestring-0.10.6.0-c60f4c543b22c7f7293a06ae48820437
    deepseq-1.4.1.1-614b63b36dd6e29d2b35afff57c25311
    ghc-prim-0.4.0.0-6cdc86811872333585fa98756aa7c51e
    integer-gmp-1.0.0.0-3c8c40657a9870f5c33be17496806d8d
haddock-interfaces: /usr/share/doc/haskell-text/html/text.haddock
haddock-html: /usr/share/doc/haskell-text/html
pkgroot: "/usr/lib/ghc-7.10.3"

I'm also on Arch Linux, FWIW. I have used this installation since before stack existed, so there are several packages installed with cabal-install.

For me, updating the resolver fixed this for some reason. No idea why.

@mitchellwrosen
Copy link
Contributor

@pharpend Doesn't seem to be a problem there; ghc-pkg only sees one global installation of text. You may want to trim down that output (or delete it) for that reason - it's noise :)

@luigy
Copy link
Contributor

luigy commented Apr 20, 2016

For me, updating the resolver fixed this for some reason. No idea why.

Hmm I believe your compiler changed from 7.10.2 to 7.10.3 when updating the resolver from lts-3.7 to lts-5ish and therefore different global pkgdb

testing instead stack --resolver lts-3.7 exec -- ghc-pkg describe text would probably reveal more about what was going on before

It really should. On the surface, it looks like ghc is preferring the global DB, for some bizarre reason.

indeed! curious to see if describe reveals anything different this time

@pharpend
Copy link
Contributor Author

On Tue, Apr 19, 2016 at 10:16:03PM -0700, Luigy Leon wrote:

For me, updating the resolver fixed this for some reason. No idea why.

Hmm I believe your compiler changed from 7.10.2 to 7.10.3 when updating the
resolver from lts-3.7 to lts-5ish and therefore different global pkgdb

testing instead stack --resolver lts-3.7 exec -- ghc-pkg describe text would
probably reveal more about what was going on before

Oh, that's probably it! I managed to fix the issue, though. It seems like a
rather extreme edge case, so I'm okay with closing this issue.

@sid-kap
Copy link
Contributor

sid-kap commented Jul 19, 2016

I was just lurking this thread, and it doesn't seem like this got resolved. (Well, @pharpend seems to have fixed their problem but it seems like @mitchellwrosen did not.) Am I misreading something?

Personally, I've run into this many times. I'm in a stack project, and I write a quick script to test some of my library functions, and then run it with stack runghc. I get errors like the above, where stack seems to be using the global package version instead of the version in the current project. For me, the solution is always to turn my script into a Cabal executable (by adding it to the .cabal file). However, it would be nice for stack runghc to work here.

@pharpend
Copy link
Contributor Author

On 07/18/2016 11:42 PM, Sid Kapur wrote:

I was just lurking this thread, and it doesn't seem like this got
resolved. (Well, @pharpend https://github.com/pharpend seems to have
fixed their problem but it seems like @mitchellwrosen
https://github.com/mitchellwrosen did not.) Am I misreading something?

Personally, I've run into this many times. I'm in a stack project, and I
write a quick script to test some of my library functions, and then run
it with |stack runghc|. I get errors like the above, where stack seems
to be using the global package version instead of the version in the
current project. For me, the solution is always to turn my script into a
Cabal executable (by adding it to the |.cabal| file). However, it would
be nice for |stack runghc| to work here.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1957 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADEow0OLu5WtciMeLIzAaZB1YuUnnGG7ks5qXGPZgaJpZM4H5nPS.

It might be worth looking into Snowdrift's SDB:
https://git.snowdrift.coop/sd/snowdrift/blob/master/sdb.hs. It's an
elaborate local postgres database management script, which requires a
number of dependencies, but has to play nice alongside the rest of
Snowdrift.

They use Stack for everything.

@harendra-kumar
Copy link
Collaborator

This is pretty trivial to reproduce. I have global project using lts-6.7 as resolver with a different version of text as an extra dep. lts-6.7 snapshot has text-1.2.2.1 so I put text-1.2.2.0 as an extra dep.

resolver: lts-6.7
extra-deps:
- text-1.2.2.0

Now run this script:

import Turtle
import qualified Data.Text as T

main :: IO ()
main = sh $ echo $ T.pack "hello"
$ stack --verbosity silent runghc version-conflict.hs

version-conflict.hs:4:20:
    Couldn't match expected type ‘Text’
                with actual type ‘T.Text’
    NB: ‘Text’
          is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
        ‘T.Text’
          is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
    In the second argument of ‘($)’, namely ‘T.pack "hello"’
    In the second argument of ‘($)’, namely ‘echo $ T.pack "hello"’

In fact we can just run ghc with appropriate GHC_PACKAGE_PATH (retrieved by running stack path) and get the same result:

$ env GHC_PACKAGE_PATH=/vol/hosts/cueball/.stack/global-project/.stack-work/install/x86_64-linux/lts-6.7/7.10.3/pkgdb:/vol/hosts/cueball/.stack/snapshots/x86_64-linux/lts-6.7/7.10.3/pkgdb:/vol/hosts/cueball/.stack/programs/x86_64-linux/ghc-7.10.3/lib/ghc-7.10.3/package.conf.d /vol/hosts/cueball/.stack/programs/x86_64-linux/ghc-7.10.3/bin/ghc version-conflict.hs 
[1 of 1] Compiling Main             ( version-conflict.hs, version-conflict.o )

version-conflict.hs:4:20:
    Couldn't match expected type ‘Text’
                with actual type ‘T.Text’
    NB: ‘Text’
          is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
        ‘T.Text’
          is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
    In the second argument of ‘($)’, namely ‘T.pack "hello"’
    In the second argument of ‘($)’, namely ‘echo $ T.pack "hello"’

This is what seems to be happening:

  • ghc picks turtle from global project, which has a dependency on text-1.2.2.0
  • ghc picks text package from lts-6.7 snapshot and not from global project even though global project comes before the snapshot in GHC_PACKAGE_PATH. Not sure why it is doing that. If we force ghc to use correct text version via -package text-1.2.2.0 option it works fine.

Another weird behavior that I see is that using -package turtle option (notice no version specified) makes ghc pick turtle from lts-6.7 snapshot instead of the global project even though global project coming first in GHC_PACKAGE_PATH.

Are these bugs in GHC or something that I am missing? There seems to be something more to picking packages from databases than just GHC_PACKAGE_PATH.

@harendra-kumar
Copy link
Collaborator

We can fix this by passing specific versions of packages to runghc instead of relying on GHC_PACKAGE_PATH but the ghc behavior with respect to that needs to be explained.

@mitchellwrosen
Copy link
Contributor

So, you've compiled turtle against text-1.2.2.0, yet stack runghc is picking out text-1.2.2.1 even though your stack.yaml suggests otherwise.

Looking at the output of stack exec -- ghc-pkg list:

/home/mitchell/... -- snapshot database
   text-1.2.2.1
/home/mitchell/... -- project database
   text-1.2.2.0

So text-1.2.2.1 is preferred. I wonder if this is a stack exec bug because it should list the project database before the snapshot database, or if this is just a misuse of stack exec.

@harendra-kumar
Copy link
Collaborator

In my second invocation stack is not in picture at all, I am using ghc directly with an explicit GHC_PACKAGE_PATH. So there is no confusion whether stack is doing something wrong.

One possibility is that ghc is trying to pick a newer version of a package from all databases without considering that another package is depending on an older version. So it uses text-1.2.2.1 from the second db in path even though the first db has text-1.2.2.0.

About the ghc-pkg list command, the ghc manual says:

To check whether your GHC_PACKAGE_PATH setting is doing the right thing, ghc-pkg list will
list all the databases in use, in the reverse order they are searched.

So you might be interpreting the ghc-pkg list output incorrectly.

@mitchellwrosen
Copy link
Contributor

Ah yes, I was interpreting the output of ghc-pkg list backwards.

So, looking at the output of stack -v runghc, I see that it only uses the global package database. No snapshot, no project. This is probably on purpose but I think it could be better documented.

@harendra-kumar
Copy link
Collaborator

stack -v runghc does not give you much info about the databases being used. Use stack runghc -- -v instead:

$ stack runghc -- -v version-conflict.hs 2>&1|grep database
Using binary package database: /vol/hosts/cueball/.stack/programs/x86_64-linux/ghc-7.10.3/lib/ghc-7.10.3/package.conf.d/package.cache
Using binary package database: /vol/hosts/cueball/.stack/snapshots/x86_64-linux/lts-6.7/7.10.3/pkgdb/package.cache
Using binary package database: /vol/hosts/cueball/.stack/global-project/.stack-work/install/x86_64-linux/lts-6.7/7.10.3/pkgdb/package.cache

harendra-kumar added a commit that referenced this issue Aug 16, 2016
Current mechanism of using GHC_PACKAGE_PATH for runghc and ghc commands does
not seem to work well when we have multiple versions of the same package. GHC
does not always pick up the packages in the same order as GHC_PACKAGE_PATH.

This fix determines of the package-ids using ghc-pkg and then passes
package-ids on command line of ghc or runghc invocation. This works only when
the user explicitly passes --package to runghc or ghc commands. When --package
is not specified we have no easy way to determine what all packages will be
used by the file being compiled.

This will make sure that scripts which explicitly list all or multi-instance
packages will always run reliably.

fixes #1957 (Requires all packages to be listed explicitly)
@Blaisorblade
Copy link
Collaborator

@ezyang Would you know if GHC_PACKAGE_PATH behaves as documented? If a package appears in multiple package DBs, the order of DBs in GHC_PACKAGE_PATH appears to be ignored—here we conjecture GHC picks the most recent version instead. See quote below. (We can also file a GHC ticket if instructed).
Meanwhile, we're working this around with -package-id.

This is what seems to be happening:

ghc picks turtle from global project, which has a dependency on text-1.2.2.0
ghc picks text package from lts-6.7 snapshot and not from global project even though global project comes before the snapshot in GHC_PACKAGE_PATH. Not sure why it is doing that. If we force ghc to use correct text version via -package text-1.2.2.0 option it works fine.
Another weird behavior that I see is that using -package turtle option (notice no version specified) makes ghc pick turtle from lts-6.7 snapshot instead of the global project even though global project coming first in GHC_PACKAGE_PATH.

@harendra-kumar
Copy link
Collaborator

I sent an email to ghc-devs as well.

@ezyang
Copy link

ezyang commented Aug 16, 2016

Relevant code from ghc-pkg:


  e_pkg_path <- tryIO (System.Environment.getEnv "GHC_PACKAGE_PATH")
  let env_stack =
        case e_pkg_path of
                Left  _ -> sys_databases
                Right path
                  | not (null path) && isSearchPathSeparator (last path)
                  -> splitSearchPath (init path) ++ sys_databases
                  | otherwise
                  -> splitSearchPath path

Relevant GHC code


getPackageConfRefs :: DynFlags -> IO [PkgConfRef]
getPackageConfRefs dflags = do
  let system_conf_refs = [UserPkgConf, GlobalPkgConf]

  e_pkg_path <- tryIO (getEnv $ map toUpper (programName dflags) ++ "_PACKAGE_PATH")
  let base_conf_refs = case e_pkg_path of
        Left _ -> system_conf_refs
        Right path
         | not (null path) && isSearchPathSeparator (last path)
         -> map PkgConfFile (splitSearchPath (init path)) ++ system_conf_refs
         | otherwise
         -> map PkgConfFile (splitSearchPath path)

Haven't read the rest of this ticket yet.

@ezyang
Copy link

ezyang commented Aug 16, 2016

OK this has nothing to do with GHC_PACKAGE_PATH. If you say -package foo GHC will always pick the latest non-broken version of foo. If there are several something happens which I don't remember but maybe the manual says. Shadowing (when -package-id of two pkgs is the same) follows different behavior.

I think there may have been behavior changes in 7.10 (and again in 8.0) because 7.10 introduced package keys and 8.0 got rid of them.

harendra-kumar added a commit that referenced this issue Aug 16, 2016
Current mechanism of using GHC_PACKAGE_PATH for runghc and ghc commands does
not seem to work well when we have multiple versions of the same package. GHC
does not always pick up the packages in the same order as GHC_PACKAGE_PATH.

This fix determines the package-ids using ghc-pkg and then passes
package-ids on command line of ghc or runghc invocation. This works only when
the user explicitly passes --package to runghc or ghc commands. When --package
is not specified we have no easy way to determine what all packages will be
used by the file being compiled.

This will make sure that scripts which explicitly list all packages will
always run reliably even in presence of packages which have multiple instances
of the same version or multiple versions installed.

fixes #1957 (Requires all packages to be listed explicitly)
@harendra-kumar
Copy link
Collaborator

@ezyang -package foo picking the latest version explains only part of the puzzle. There are more questions to be answered:

  1. what if foo has same versions in more than one databases? Which one would be picked? Is GHC_PACKAGE_PATH considered?
  2. The manual does not document this aspect of -package behavior neither in -package nor in GHC_PACKAGE_PATH section. We will have to update that. But is this the right behavior for this option? Should it honor the GHC_PACKAGE_PATH order instead when it is specified? That will keep things consistent and without exceptions.
  3. what if there is no -package option specified? Should it pick using the GHC_PACKAGE_PATH order or still picks the latest version?

The manual does not seem to have answers for these or is incorrect. I have not looked at the code yet. Will take a look soon.

@mgsloan
Copy link
Contributor

mgsloan commented Aug 17, 2016

Very interesting, this explains a lot. It's too bad that ghc doesn't have what we need. I guess this is just one more way scripts can break if they lack --standalone (to-be-implemented, should be straightforward)

@ezyang
Copy link

ezyang commented Aug 17, 2016

@harendra-kumar

  1. Assuming that none of them shadow one another (have the same IPID), in GHC HEAD which one is picked is unspecified.
  2. This behavior for -package has existed a long time. We could change it, but that's a BC-breaking change, and my attitude is that you should use -package-id, and -package is strictly a "best effort" deal. I'll also remark that there's nothing really special about GHC_PACKAGE_PATH compared with passing -package-db flags.
  3. One again, GHC_PACKAGE_PATH is just a way to get -package-db flags to GHC. Other than that, we inherit the default behavior. If no package options are specified GHC will just pick the latest exposed packages. No guarantee that they're consistent; that's not GHC's job.

@harendra-kumar
Copy link
Collaborator

Thanks a lot @ezyang for explaining the behavior in detail. I think what we need to do is to fix the documentation to make it explicit. I read the documentation carefully but did not get any clear answers from that.

The documentation of GHC_PACKAGE_PATH gives an impression that the order is important which does not seem to be the case as per your explanation.

This list of package databases is used by GHC and ghc-pkg, with earlier databases in the list
overriding later ones. This order was chosen to match the behaviour of the PATH environment
variable; think of it as a list of package databases that are searched left-to-right for packages.

In fact it should not have been called PATH because that has a connotation of order.

@ezyang
Copy link

ezyang commented Aug 17, 2016

The order is important for shadowing (packages on the top of the stack shadow packages lower down). But shadowing only occurs two packages have the same installed package ID. In the common case, this never happens.

So, I think there was a point in time (7.8 and earlier) when this explanation did make sense, because back then shadowing was computed by package ID rather than package key (7.10) or installed package ID (8.0). But this was quite miserable because you basically could never have multiple copies of the same version of a package in the same database; to manage it, you'd need separate databases for each package. (There was a technical reason behind this too: the symbol name only incorporated package name and version, so GHC could get REALLY confused if you had two of these in the database and visible; it'd think types were equal when they shouldn't be.)

@ezyang
Copy link

ezyang commented Aug 17, 2016

For what it's worth, I think one could plausibly argue that GHC 8.2 should implement behavior along the lines of, "If there are multiple packages with the same name and version, if a user says -package pn, we should use the one that is on TOP of the stack." Do you really want this? There are plenty of other ways using -package can go wrong (for example, GHC can pick a bunch of packages whose deps don't actually coincide with each other.) It seems much safer to use a dep solver / Stackage index to decide precisely which packages you want.

@harendra-kumar
Copy link
Collaborator

Yeah we can externally determine the set of exact package-ids to use. But it requires invoking other programs like ghc-pkg which is not usually a big deal though but it would be more convenient if ghc itself has a simple way of achieving the same thing since it anyway goes through the dbs.

For stack the way pkg databases are stacked, the rule is really simple, we just need to choose packages in order in the pkgdb stack. So your proposed behavior for 8.2 should be sufficient for that. I need a few clarifications on that:

  • What does stack mean in LAST in stack? When GHC_PACKAGE_PATH is specified would it be that? When package dbs are specified on the command line, the order in which they are specified? When both are specified?
  • Will it work the same way even when no -package option is specified?

One of the problems in running scripts reliably is to specify the consistent versions of all packages used in the script. One way to achieve that would be to first determine all dependencies of the script and then pass correct versions of each package to GHC by first determining those externally. I do not know of a convenient way to determine the packages used (GHC API?, ghc -M?) in the script. On the other hand, if we have a way to tell ghc to always pick the packages in a certain order (e.g. GHC_PACKAGE_PATH) then this problem will be solved nicely for stack. We do not have to determine dependencies or right versions of packages externally. That's the reason I asked the second question above. Maybe we can pass a new flag to ghc to tell it to pick them in order instead of the old behavior.

@ezyang
Copy link

ezyang commented Aug 17, 2016

What does stack mean in LAST in stack? When GHC_PACKAGE_PATH is specified would it be that? When package dbs are specified on the command line, the order in which they are specified? When both are specified?

I edited my comment to s/LAST/TOP/. For command line -package-db, the top of the stack is RHS. For GHC_PACKAGE_PATH, the top is LHS.

The way they are combined is specified by this function:

getPackageConfRefs :: DynFlags -> IO [PkgConfRef]
getPackageConfRefs dflags = do
  let system_conf_refs = [UserPkgConf, GlobalPkgConf]

  e_pkg_path <- tryIO (getEnv $ map toUpper (programName dflags) ++ "_PACKAGE_PATH")
  let base_conf_refs = case e_pkg_path of
        Left _ -> system_conf_refs
        Right path
         | not (null path) && isSearchPathSeparator (last path)
         -> map PkgConfFile (splitSearchPath (init path)) ++ system_conf_refs
         | otherwise
         -> map PkgConfFile (splitSearchPath path)

  return $ reverse (extraPkgConfs dflags base_conf_refs)
  -- later packages shadow earlier ones.  extraPkgConfs
  -- is in the opposite order to the flags on the
  -- command line.

So I believe -package-db arguments get applied on top of GHC_PACKAGE_PATH.

Will it work the same way even when no -package option is specified?

It could.

I do not know of a convenient way to determine the packages used (GHC API?, ghc -M?) in the script.

I don't think this question is well-formed. What do you have to work with? Do you have a Stackage revision and a list of package names? Just some package names? Some package names with dependency bounds? If you have a Stackage revision, shouldn't stack know what the -package-id is?

On the other hand, if we have a way to tell ghc to always pick the packages in a certain order (e.g. GHC_PACKAGE_PATH) then this problem will be solved nicely for stack.

Where can I learn about Stack's package database organization? I don't know how it works so it is difficult for me to interpret this statement.

@harendra-kumar
Copy link
Collaborator

I don't think this question is well-formed. What do you have to work with?

Sorry about that, let me elaborate it a bit. This is in context of running a script using stack runghc. When we run a script which is not inside a stack project dir the global project config is used. The global config uses a package database stacked on top of a snapshot package database e.g. lts-6.12. The snapshot package db is fixed and the global package db can install custom package versions to override the snapshot.

When we run a script we set GHC_PACKAGE_PATH=<global package db>:<snapshot package db> before calling stack runghc <script>. We expect the script to use a package from the global pkg db if available even if it is a lower version and if it is not found in that db then use it from the next (snapshot) db.

Now since GHC does not pick packages in the db order what I have done is to figure out package-ids of the packages that we want to use and pass it on the command line to force GHC to use the right packages. But to be able to do that we need to know all the package names that the script needs so that we can pass all the package-ids on command line. We do not have that information in general, we have it only when the user explicitly specifies all packages using --packages stack specific option.

When the user does not explicitly specify the packages, how do we figure out the package names? Remember this is just a plain Haskell source file and there is no associated cabal file. So either we extract the package names from the script so that we can determine and pass correct package-ids or ghc does the work for us in choosing the packages in the order we want it to. With what you proposed above ghc itself should be able to pick the right packages for us the way we want.

@ezyang
Copy link

ezyang commented Aug 17, 2016

So, GHC got a new feature called "environment files" in January 5th this year; commit aa699b94e3a8ec92bcfa8ba3dbd6b0de15de8873 which I think is specifically targeted at your use-case. They were released with 8.0 it seems, and are documented here: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/packages.html#package-environments

I think the model is, as Stack builds and installs packages into the snapshot / other package db, it updates the corresponding environment file. Then a user simply gets to see all the packages in the environment file. I guess this is not exactly what you are asking for. But is it close enough?

@harendra-kumar
Copy link
Collaborator

This is pretty similar to what we are doing. The environment file allows us to put the command line stuff in a file, which can be useful if the command line gets too long for the shell to handle for example.

The unsolved problem as of now is what I wrote in the last para of my previous update. Since we have no good way of extracting imported packages from a Haskell file, we would want the problem to be solved in a way so that we do not have to do that. That is, influence GHC to choose packages in the way we want. I think what you proposed here will solve this.

@mgsloan
Copy link
Contributor

mgsloan commented Aug 19, 2016

Is having an environment file equivalent to passing in a bunch of -package-id flags? If so, then is that a good alternative? Are packages loaded lazily (by demand of import)? We could have it pass in every single available package.

Another option would be to just decide that it's rather unprincipled to run ghc without explicitly restricting the set of packages to what's needed for the script. In #1388 , a proposed --standalone flag is discussed. We could just make it the default. If people want this squirrely might-kinda-work behavior, they can use stack exec -- runghc

@harendra-kumar
Copy link
Collaborator

We could have it pass in every single available package.

That's like specifying the whole db in the environment file. It might work if the implementation designed for or is efficient enough for thousands of packages.

Making --standalone as default also sounds like a good idea to me, especially since we already have a way to do the opposite via stack exec -- runghc. That way stack runghc will become consistent with stack's reproducible results philosophy. That means we do not need to fix #2486, instead we can just change the default. Sounds good to me.

@mgsloan
Copy link
Contributor

mgsloan commented Aug 20, 2016

I like it too! The main issue is backwards compatibility, as this will break people's scripts. I'm not sure if that should hold it back, though, as the current behavior has all of the following downsides:

  1. The script likely requires that you already built something else, such as the project associated with the stack.yaml. Otherwise dependencies won't be available.

  2. As demonstrated in this issue, you can get your environment in a state where it's using the wrong packages. This issue may not even cause compilation issues, and instead happily use a different version than you think you are.

  3. It is generally difficult to be confident that a runghc script will work for others, because you may not realize it has a dep beyond those specified in the stack.yaml

@harendra-kumar
Copy link
Collaborator

I agree with all your points.

I was thinking of a solver like command for scripts; given a script it will automatically dump a stack runghc shebang comment required to run the script with all required packages explicitly specified on the command. We can also provide an option like --update-script which will automatically update the shebang comment in the source file. This will make it easy to fix broken scripts and of course will also be useful in regular script writing workflows.

Also once #1944 is fixed then PackageImports can be used as an alternative way to specify package dependencies in scripts.

@ezyang
Copy link

ezyang commented Aug 21, 2016

Hey @harendra-kumar , could you please open a ticket on GHC Trac (CC me) with a spec of the functionality requested (based on the discussion on this thread)? In particular, I am not sure if you wanted (1) GHC automatically prefers the package on top if the versions are the same, or (2) a new mode for handling -package flags where the instance of the package on the topmost stack is always the most preferred. You claim to want (1), but in the next sentence you stated that you want the topmost package picked always (even if the version is older). Which is totally not (1) is about.

@harendra-kumar
Copy link
Collaborator

I will raise a GHC trac ticket with specific details and cc you.

@harendra-kumar
Copy link
Collaborator

The GHC issue is now tracked via a GHC trac ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.