Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module listing discovery (wildcard glob) #5343

Open
parsonsmatt opened this issue May 24, 2018 · 44 comments
Open

Module listing discovery (wildcard glob) #5343

parsonsmatt opened this issue May 24, 2018 · 44 comments

Comments

@parsonsmatt
Copy link
Collaborator

Now that cabal has common stanzas, the only thing keeping me using hpack is that hpack will, by default, autodiscover all of your modules and stuff them into exposed-modules (you can disable this by explicitly defining that stanza). This feature is really convenient to me.

Mimicing that behavior seems like it might be against the Cabal philosophy, as it's implicit -- "If exposed-modules is not defined, collect all modules not listed in other-modules and make them exposed-modules."

A middle ground solution that provides convenience with a degree of explicitness is glob patterns for the module listings. Consider this syntax:

exposed-modules: *

This would find all modules in the source directory and add them as exposed-modules. Kinda like hpack's autodiscovery, but we're explicitly writing "please find all the modules for me."

You could also write:

exposed-modules:
    Control.Monad.*

which would only collect modules under the Control.Monad namespace into exposed-modules -- any other modules would need to be either explicitly added to exposed-modules or other-modules.

@23Skidoo
Copy link
Member

It'd be also convenient to be able to specify module names to be treated as exceptions from auto-exposing with glob patterns: other-modules: Control.Monad.Internal.*.

@gbaz
Copy link
Collaborator

gbaz commented May 24, 2018

Right. I'd think we'd want some logic relating the two. Explicit always takes precedence over glob. So other-modules explicit takes precedence over glob in exposed-modules, and exposed-modules explicit takes precedence over glob in other-modules. If both are globs or both are explicit, I'd suggest an error, rather than trying to be too clever and pick a potentially nonobvious direction to resolve the ambiguity?

@gbaz
Copy link
Collaborator

gbaz commented May 24, 2018

I also don't know if there are interaction effects with backpack that we'd have to worry about -- i.e is it genuinely always the case that given a glob we can simply scan a file-listing without inspecting any contents to determine what it expands to?

@parsonsmatt
Copy link
Collaborator Author

Yeah, given the following modules:

Foo
Foo.Bar
Baz
Baz.Internal

I'd expect the following:

exposed-modules: *
-- other-modules: [] -- no other-modules clause

-- expands to:

exposed-modules:
    Foo
    Foo.Bar
    Baz
    Baz.Internal
exposed-modules:
    Foo
    Foo.*
 other-modules:
    Baz.*

-- expands to:

exposed-modules:
    Foo
    Foo.Bar
other-modules:    
    Baz.Internal

and a warning/error that module Baz is not in either listing, as happens now

exposed-modules:
    *
other-modules:
    Baz.Internal

-- expands to:

exposed-modules:
    Foo
    Foo.Bar
    Baz
other-modules:
    Baz.Internal

Because the explicit other-modules overrides the * globbing.

exposed-modules:
    Foo
other-modules:
    *

-- expands to:
exposed-modules:
    Foo
other-modules:
    Foo.Bar
    Baz
    Baz.Internal

For the same reason as above.

I have no idea how this would interact with Backpack 🤔

@cdepillabout
Copy link
Contributor

It would be really nice to get this change into cabal.

This is the sole reason we are still using hpack at work.

@domenkozar
Copy link
Collaborator

domenkozar commented Aug 5, 2019

I'd be happy to chip in to get this one done. It's the last annoyance I have that hpack fixed :)

@phadej
Copy link
Collaborator

phadej commented Aug 5, 2019 via email

@domenkozar
Copy link
Collaborator

domenkozar commented Aug 5, 2019

What if glob would be allowed for local development but rejected by hackage. For hackage sdist could substitute glob with the actual files.

You kind of lose git repository to hackage mapping, but that was never the thing with revisions anyway.

@hasufell
Copy link
Member

hasufell commented Aug 6, 2019

You kind of lose git repository to hackage mapping, but that was never the thing with revisions anyway.

Yep, revisions are already a wart (knowing that they are a sad necessary fix), break freeze files if you don't freeze the index, cause problems for distributions, are infrastructure specific, ...

Let's not take this as an argument of good design to allow more semi-defined behavior.

I feel like hpack is exactly what fills the gap here and careful maintainers commit both the hpack and the .cabal files to their repository. Is this really cabals job? If not hpack, then it sounds more like an IDE feature.

@phadej
Copy link
Collaborator

phadej commented Aug 10, 2019

FWIW, I released https://hackage.haskell.org/package/cabal-fmt-0.1 which in addition to (opinionatedly) formatting your .cabal file can expand exposed-modules, e.g.

  hs-source-files: src
  -- cabal-fmt: expand src
  exposed-modules:
    Foo
    Foo.Bar

i.e. in the case when new modules are added, I run cabal-fmt --inplace my.cabal and exposed-modules are re-populated.

The functionality is bare-bones, let's see what will be needed. In one project, I simply moved non exposed-modules into other hs-source-dirs (actually other-modules, and main-is: Main.hs is in other dir now).

@chshersh
Copy link
Member

cabal-fmt looks helpful and useful. However, the UX of calling command to discover modules is not the same as automatic support of modules discovery by the build tool. In that case, I might as well write the module name manually.

hpack showed the benefits of such package configuration features like metadata deduplication and automatic module discovery. And if you can reduce duplication of stanza information within a single package using common stanzas, a lot of people are still using hpack because they found the automatic module discovery feature handy. For me, the drawbacks of using hpack outweigh the benefits of its features, and I'm not using it in my projects. However, this particular feature of the module listing discovery is very useful. It makes Haskell development experience smoother, more pleasant and more beginner-friendly, which is very important for Haskell.

As a maintainer of multiple open-source Haskell libraries and applications, I welcome every contribution. And I want various people to be able to contribute to my projects with as little hassle as possible. That's why I support both build tools cabal and stack. And I find the number of configuration files required for a single Haskell repository too big to my taste:

package-name.cabal
cabal.project
cabal.project.local
stack.yaml
stack.yaml.lock
package.yaml

and there could be more, the list is not exhaustive

I think that reducing the number of configuration files required for a project, and the number of tools and formats people need to know to develop Haskell projects while preserving the same features is a goal worth trying to achieve. It seems to me that this particular minor (in terms of implementation) feature of module listing discovery can push cabal much further and helps already fractured community to become closer to the consensus of package development.

@phadej
Copy link
Collaborator

phadej commented Aug 15, 2019

It seems to me that this particular minor (in terms of implementation) feature of module listing discovery can push ...

I'll be happy to see a PR.

My experience on Cabal and cabal-install (and downstream tooling using Cabal as a library) says very opposite. This change is not minor, as .cabal file interpretation becomes file-system state / tarball dependent. (The extra-source-files is different, at least for now, as those are only interesting for sdist command).

EDIT: note the fact that hpack "works" is because there is clear stage separation. package.yaml is compiled to pkgname.cabal, and pkgname.cabal is used as input to actual build tooling. Blurring the boundary, requiring to .cabal files to be interpreted before further consumption it tricky.

@gbaz
Copy link
Collaborator

gbaz commented Aug 21, 2019

I think the idea could be that sdist should expand globs before producing the tgz, as domenkozar suggested. This would mean that downstream tooling making use of sdist'ed packages would not have to worry about such things.

@chshersh
Copy link
Member

In @kowainik we've implemented an alternative experimental approach for automatic module listing discovery using the custom setup Cabal feature — the autopack Haskell library that automatically finds all Haskell files in the corresponding hs-source-dirs and populates exposed-modules.

So, if you don't want to maintain the list of exposed modules manually, you can give it a try.

@szabi
Copy link

szabi commented May 15, 2020

I’d be 100% against allowing files with globs to be uploded to Hackage, as ”which package exposes XYZ module” will be impossible to answer only considering the index.

A solution would be that Cabal autogenerates an explicit module listing metadata file at the same time, similar to the Path_<module_name>. Hackage and other discovery tools could use that.

@finlaydotb
Copy link

I created #7016 to mention how tedious it was to manually update the .cabal file upon addition or removal of modules.

Since then I tried Stack and I will not be going back to cabal-install again! Chiefly because of the fact that with Stack I do not need to manually update modules in cabal files...that alone is a killer UX feature worth the switch.

Do not know if this count much for data points regarding this feature request but I thought to share.

@hasufell
Copy link
Member

hasufell commented Oct 25, 2020

Since then I tried Stack and I will not be going back to cabal-install again!

It isn't really stack, it's still hpack you are interfacing with. Stack just runs it automatically for you. It seems this confuses users already about what their tooling is really doing. So I'm not even sure this is a feature or misfeature.

That said, it's very easy to add a patch to cabal-install to run hpack just like stack prior to doing anything. Hpack can be used as a library. But I'm confident this patch will get rejected.

@finlaydotb
Copy link

It isn't really stack, it's still hpack you are interfacing with.

I know. But the fact that Slack provides a unified UX...ie I do not have to go tinker with hpack myself, is a win.

@jwoudenberg
Copy link

jwoudenberg commented Dec 29, 2020

I found this issue while searching for ways to auto-generate exposed-modules. For me too it's the one showstopper for switching from hpack to cabal files in my personal projects and pushing for it in my company.

Would a patch be accepted that would add an option to Cabal to have it automatically run either hpack prior to reading a *.cabal file (updating the real file on the filesystem)? Or, alternatively, to have Cabal automatically run cabal-fmt? As far as I can tell that would cover all concerns raised in this conversation.

  • It would not require cabal file interpretation.
  • It'd be as-good a solution for discovery as the one offered using stack+hpack.
  • It'd allow everyone the option to choose between the performance of manual cabal modules management or convenience of auto-generation.

Sorry if this is already discussed at length somewhere else!

@ezyang ezyang changed the title Module listing discovery Module listing discovery (wildcard glob) Feb 24, 2021
@hasufell
Copy link
Member

hasufell commented May 31, 2021

Would a patch be accepted that would add an option to Cabal to have it automatically run either hpack prior to reading a *.cabal file (updating the real file on the filesystem)? Or, alternatively, to have Cabal automatically run cabal-fmt? As far as I can tell that would cover all concerns raised in this conversation.

Afaik this was discussed somewhere behind "closed doors" and there was an idea to provide hooks, similar to git, which you could easily use to run hpack, install GHC via ghcup and whatnot.

I believe it's not cabals job to integrate with every single tool some developer uses. Instead it's cabals job to define clear APIs, so you can realize workflows that aren't even thought of yet.

@cartazio
Copy link
Contributor

Well said Julian.

Also: explicitly listing modules either programmatically when doing code gen or explicitly in .cabal seems best. Tools that blindly auto find modules seem to create more headaches than they solve in my experience. Or maybe I’ve just had a really wide but eclectic range of bad experiences with implicit module discovery in various tools

@cdsmith
Copy link

cdsmith commented Jun 1, 2021

As long as the conversation is happening, I think there's a real mistake being made in ignoring convenience as a benefit.

I am a committed cabal user (unlike others talking about swapping to stack earlier), but I am constantly frustrated by how difficult cabal has to make common tasks, and this is a really great example. Instead of letting someone say what they mean (which is almost always "everything in this directory is a module in my package") we're talking about how to expose some knobs that let certain advanced users who know about these things integrate some third-party tools, which aren't installed by default, and the setup for which is about as much work and much harder to know about than just rolling your eyes one more time and listing all the damn modules in the damn cabal file. And I don't really understand why, except vague claims that it's not "cabal's job". That's not a reason, though. If cabal did it, then it would be cabal's job.

@AlistairB
Copy link

I agree with @cdsmith .

I think cabal should decide to either be:

  • A low level build tool designed to be used by high level build tools (ie. stack) and IDEs.
  • A fully featured build tool that has everything you need to build + maintain your project in a convenient package.

I strongly think it should be the latter, however if we are saying it is a low level build tool, then it should focus around that. It should not be used directly by users unless they want to manually rig together a highly custom setup. Instead users should be directed to stack or some other high level tool.

Whether it is cabal or not, something should automate adding new modules in a seamless fashion that the user doesn't need to think about (and that should be the default). Haskell development is already high setup cost + friction IMO.

@hasufell
Copy link
Member

hasufell commented Jun 1, 2021

I think cabal should decide to either be:

  • A low level build tool designed to be used by high level build tools (ie. stack) and IDEs.
  • A fully featured build tool that has everything you need to build + maintain your project in a convenient package.

There is a third possibility: cabal-install should be a project management tool that allows to realize more workflows than any other tool, without making odd choices for the user and without additional maintenance overhead.

I don't think anyone here is advocating for FreeBSD style command line tools.

I personally feel that there seems to be some force in the community trying to drive cabal-install into stacks direction, for the sake of "unification". I believe this will do more harm than good and would very likely lead to a 3rd build tool.

I agree that cabal-install should maybe formulate its philosophy and goals, although I'm not sure who would do that (and I think it's not the business of HF, to be clear). This would also save time for contributers, who have different expectations, give a starting point to discussions such as this one and would serve as a non-strict evaluation point.

@szabi
Copy link

szabi commented Jun 1, 2021

3\. It's against unix philosophy, where you focus on pipelining tools instead of integrating everything with everything. It's functional programming for the command line.

The main difference I see is that *nixes come with a set of basic tools that work together with which you can be quite effective already.

Others have mentioned friction in the usage and setup of Haskell. Enabling hooks is modular, but not having the tools available as default which enable frictionless development is a major turndown.

I really liked the suggestion in either this or the closed duplicate issue to have a mutually exclusive cabal-file keyword which would list excluded files (the list can be empty) and if that keyword is present, everything is included by default (and only the explicitly excluded not).

@hasufell
Copy link
Member

hasufell commented Jun 1, 2021

3\. It's against unix philosophy, where you focus on pipelining tools instead of integrating everything with everything. It's functional programming for the command line.

The main difference I see is that *nixes come with a set of basic tools that work together with which you can be quite effective already.

Others have mentioned friction in the usage and setup of Haskell. Enabling hooks is modular, but not having the tools available as default which enable frictionless development is a major turndown.

ghcup was built with that in mind, to be a modular counterpart to cabal. It could install hpack and anything else that's needed to bootstrap a development environment (of course there are boundaries to be considered as well). It could be invoked by hooks as well.

@cdsmith
Copy link

cdsmith commented Jun 1, 2021

The main point that was brought up was that if this is done sloppily, we're rendering cabal files as static soure of information void. That's pretty big.

So I think (?) the consensus is that generation of cabal files is the right approach and there exist 2 such tools already.

I agree with the first part. The big problem here seems to be that cabal files are BOTH a user interface for users, and a source of information for tools. It's quite understandable that these tools want a bunch of detailed information in the package index. But having humans produce that exact same file is limiting. In this sense, I agree that some kind of generated file is the right choice.

I think what you're getting wrong, though, is the question of which file should be generated. Telling people that the cabal file is no longer their user interface to cabal as a build system is a big change. You have to admit that, on face value, telling people to migrate all of their configuration to YAML and use hpack just because they want to use a wildcard in the other-modules field is a bit extreme. cabal-fmt looks a bit less intrusive, but it's still expressing things in special comments, which then don't get checked, etc.

The earlier proposal by @domenkozar and @gbaz and @szabi was to have cabal generate the file with the full module list. Whether that's an "elaborated" cabal file (that even gets substituted for the original when uploading), or some other format entirely, doesn't matter much from the user point of view. I imagine the people building these tools would prefer the elaborated cabal file, so they don't have to change things. Over time, these tools might even be simplified if the elaboration eliminated some other UI-focused features like common sections.

I don't understand your concern about advanced use cases. Hooks are simple, could be done globally and per-project, are extendable to any tool and workflow you like, could even be pre-installed.

My concern is that I don't think it's sufficient to make it possible to configure this. Only a small portion of Haskell users will ever set up these hooks. It adds one more thing that people have to be taught to do before they can use Haskell easily. They have to do it every single time they set up a Haskell development environment, which I've done easily dozens of times myself, or even set up a project, which I've done hundreds of times. Multiply that by the number of Haskell programmers... and for what? We're not talking about some huge change to how development works. We're just talking about using a wildcard in the other-modules field, when that almost always says what you want.

I don't disagree with your comments about making something like hpack a part of the cabal workflow. That is, indeed, a very opinionated change and shouldn't be forced onto everyone. But we're not talking about that. The fact that wanting to use a wildcard has turned into which cabal file generator to use... that's where this process has broken down, IMO.

@hasufell
Copy link
Member

hasufell commented Jun 1, 2021

I think the idea could be that sdist should expand globs before producing the tgz, as domenkozar suggested. This would mean that downstream tooling making use of sdist'ed packages would not have to worry about such things.

We already have projects that pull very large parts of their dependencies from git, example: https://git.io/JGExt

Do we know we don't break any tooling without doing the glob-expansion explicitly for all of them? What about nix, for instance. There's a cabal2nix that fetches a cabal file from a repo and transforms it. May it now potentially have to fetch the entire repo and run the conversion, before it can make use of the cabal metadata?

Hackage, for many use cases, isn't the only API anymore.

@domenkozar
Copy link
Collaborator

My big hope is for haskell/haskell-language-server#155 to use the new cabal API for manipulating cabal files.

In my editor I'd create a new module and let editor macro add it to .cabal file.

I don't think this is a general solution, but it works well for my use case.

@gbaz
Copy link
Collaborator

gbaz commented Jun 2, 2021

"Do we know we don't break any tooling without doing the glob-expansion explicitly for all of them?" This is a great question -- that said, I would hope that we could at most just say "sdist is now a part of repo fetching" and other tools would be able to adapt accordingly without much work, if necessary.

@philderbeast
Copy link
Collaborator

Hpack can be used as a library.

@hasufell that is how hpack-dhall works.

@hdgarrood
Copy link
Contributor

So it seems like the design where globs are permitted in exposed-modules and expanded during cabal sdist is promising and solves all of the problems we are aware of but is blocked on exact printing for cabal files? Is that an accurate summary of where this issue is?

I’d still really really like to see this happen; in my mind the evidence that people generally just don’t want to explicitly list out every module in their project is extremely strong:

  • the existence of various tools like hpack (including the stack integration), autopack, and cabal-auto-expose (which seems to be the same idea as autopack)
  • the 👍/👎ratio on this issue
  • the fact that few other languages/build systems require you to do this

@gbaz
Copy link
Collaborator

gbaz commented Nov 17, 2022

@hdgarrood is this you volunteering to work on exactprint? :-)

@hdgarrood
Copy link
Contributor

Probably not, sorry! I think the best I can do is bug some of my colleagues about it :(

@Mikolaj
Copy link
Member

Mikolaj commented Dec 15, 2022

Please do bug them. :) There's quite a bit of design discussion in the github project about it (or scattered among the tickets listed there), so it should be ready to actually implement.

@Martinsos
Copy link
Collaborator

It would be awesome to see this implemented -> in a bigger projects, when moving modules around and refactoring, updating exposed-modules becomes so much of a pain that it deters me a bit from even doing changes in the code. I will be looking at autopack now, but it would be great if cabal supported this natively -> it is weird having to explain to junior devs why they need to do this in cabal when in popular languages this is not needed.

@Mikolaj design discussion that is scattered -> how hard would be to gather that in one spot, to make it easier for somebody to contribute?

@Mikolaj
Copy link
Member

Mikolaj commented Mar 9, 2023

@Mikolaj design discussion that is scattered -> how hard would be to gather that in one spot, to make it easier for somebody to contribute?

That, and making a summary of what similar components are ready and what needs to be done, would likely be a major part of the work needed for the whole task, including coding.

@domenkozar
Copy link
Collaborator

https://github.com/tfausak/cabal-gild

@michaelpj
Copy link
Collaborator

So it seems like the design where globs are permitted in exposed-modules and expanded during cabal sdist is promising and solves all of the problems we are aware of but is blocked on exact printing for cabal files?

I'm interested as to whether this is the only thing we'd accept or whether we would also be okay with Domen's suggestion:

What if glob would be allowed for local development but rejected by hackage.

Hackage already has higher standards than cabal for what cabal files it will accept. It would be quite reasonable for it to reject module wildcards. It also cleaves the userbase at a natural fault line:

  1. People who publish libraries on Hackage will need to have explicit module listings, probably also during development, like today
  2. People who work on large industrial applications that are not published on Hackage will never need to have explicit module listings.

I guess the objection would be that this (probably?) requires support for module wildcards in Cabal-the-library and not just cabal-install? On the other hand, it doesn't rely on exact-printing, which is perpetually just out of reach...

@Ericson2314
Copy link
Collaborator

I think it would be good to put in Cabal the library for a different reason, which is an alternative implementation.

Instead of doing the globing ourselves, we could have GHC simply tell Cabal about all the modules it found. (We can get this from ghc -M today, even).

Nevermind backpack, there are things like Happy / Alex too where we end up getting autogenerated modules. Globbing would have to reproduce all that logic, but GHC knows what libraries is is going to build.


I would also like a version where we can say if different source directories for exposed or hidden modules. This works with either the we-glob or GHC-tells-us approach (since we can see where the modules are located). I think it's good practice to put different sorts of things in different directories anyways, and this sidesteps the question of exactly what sorts of globs we want, and what the priority order between different glob stanzas would be, since directories are trivially disjoint (and nesting source dirs should already be an error, right?).

@hasufell
Copy link
Member

Instead of doing the globing ourselves, we could have GHC simply tell Cabal about all the modules it found. (We can get this from ghc -M today, even).

I don't think this is a good suggestion.

First: we want less entanglement between GHC and Cabal.

Second: I doubt that even works properly with cabal conditionals around hs-source-dirs and the like.

@Ericson2314
Copy link
Collaborator

First: we want less entanglement between GHC and Cabal.

Yes, but is entirely a matter of opinion whether this is more or less entanglement. The installed .conf file already contains information which is computed with GHC's aid, after all.

Second: I doubt that even works properly with cabal conditionals around hs-source-dirs and the like.

Hmm? If you have conditional source dirs, then you get conditional which modules are provided. That is true with globs too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests