Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a check for volatile source-urls #512

Closed
wants to merge 1 commit into from

Conversation

patrickbkr
Copy link
Member

I know of one single module that actually meets the requirement (Inline::Perl5). I'm not sure there are any others. So if this change actually reflects a best practice, then the current state of the ecosystem is grave.

So I'd like to have some feedback if this change actually is a best practice. - Or if I still don't understand how the ecosystem works and just missed the point again.

Volatile source-urls will result in an inconsistent module state the users computer as soon as the contents of that source-url change without a matching change in the version number. Setting source-url to a git master branch is the typical faux-pas.
Instead one should change the source-urlto link to a never changing target (a tag, revision or some zip file).

Related to Raku/problem-solving#72.

@niner @ugexe @JJ

Volatile `source-url`s will result in an inconsistent module state the users computer as soon as the contents of that source-url change without a matching change in the version number. Setting `source-url` to a git master branch is the typical faux-pas.
*Instead* one should change the `source-url`to link to a never changing target (a tag, revision or some zip file).
@patrickbkr patrickbkr requested review from niner and ugexe June 25, 2020 19:24
@Altai-man
Copy link
Member

Altai-man commented Jun 25, 2020

Can we have a test for this? I am somehow sure this template is not read most of the time just like a lot of other texts now.
Other than this, I am not really sure we can demand any kind of urls before providing a user with simple instructions on possible ways to have non-volatile releases.

UPD: never mind the test, I think about 99.9% of our modules will fail it anyway. Ouch.

@patrickbkr
Copy link
Member Author

@Altai-man I actually plan to also add a Travis configuration to check for evil source-urls.

JJ is working on the docs - see Raku/doc#3481.

I do understand that we can't forbid people to do one thing without telling them how else to do it. So maybe this PR needs to be held back before that doc issue is solved.

@niner
Copy link
Contributor

niner commented Jun 25, 2020

Oh, yes, I think the intention is right. But we have some serious re-education to do...

@JJ
Copy link
Contributor

JJ commented Jun 26, 2020

@ugexe mentioned a few in the comments to the problem-solving repo; Ddt would be one of them. The question is not only the change of source URL, it's also that you need to add to here every time you release a new version. So we badly need an education campaign.
One of the things we could/should do is to start by re-releasing the community-maintained modules. That's something we can control, and would set an example. Baseline is, we need a roadmap for this...

@patrickbkr
Copy link
Member Author

Ddt looks fine on first sight, but is actually broken just as the others, all of the linked META files still have a source-url that link to the master branch.

Also I think we need some tool to automate the insertion of the META files into the ecosystem based on changes in the repo.

Reasoning: When the best we can come up with is a guide that instructs users to manually create a PR to the ecosystem repo for each new version, then from a usability perspective this is similarly cumbersome as releasing on CPAN. Also the number of PRs to the ecosystem repo wil increase by some factors if we still want to go with PRs.

How could such a tool access the ecosystem repo? Should it just create PRs that are still manually reviewed and merged? Should there be a bot account that just directly commits stuff?

@JJ
Copy link
Contributor

JJ commented Jun 26, 2020

The thing is, I'm not sure changing the template is the best first step. Going back to the roadmap idea, it should rather be the last...

@ugexe
Copy link
Contributor

ugexe commented Jun 26, 2020

In the grand scheme of things I'd still expect some sort of bleading-edge ecosystem, and that is essentially how p6c functions (even if that is not how people understand it). I'm not saying it couldn't be a different ecosystem, just that its current behavior does actually serve a purpose. If you wanted to make it easy for newcomers I'd probably require them to add some github hook such that automation would be notified of new github releases, but having a command line tool to automatically do everything after minimal user interaction would seem fairly user friendly as well. There is the technique of just scanning their repos for diffs in their META6.json version line and recording the commit, but I have always viewed that as a hacky way to demonstrate ecosystem automation MvP (I'd not promote it over the previously mentioned methods).

The one issue with automatic releases that I'm wary of is the size of the index (an inevitable problem regardless, but exacerbated by automation). We need a more performant way to make all the meta data necessary for the internal recommendation manager to decide what some long name references (including the ability to understand NYI S22 bits like emulates, so not just providing provides/api/auth/version). Presumably that format also needs a way to link to a secondary file with all the json (maybe via an offset recorded in the index) so we can also get at large (but usually not needed) things like the description for e.g. zef search Foo output. Note a new performant index does not need to be user friendly or considered spec (maybe spec to whatever ecosystem is using it, but as rakudo itself does not need to understand it we do not need to control it), it just needs to be something that can be parsed/read using OS-agnostic core raku.

@patrickbkr
Copy link
Member Author

I think I (we?) need to work out which role p6c should play. What use case do we want it to solve?

In my opinion the primary strengths of p6c and CPAN are ease of use and robustness respectively. To a limited extent I see value in both ecosystems. Creating a CPAN release of an average Raku library usually requires initial setup (to create a PAUSE account) and a single command given a respective tool like Mi6, Ddt or App::Assixt. So I don't think having a p6c that also requires initial setup (creating some GitHub hook) and a single command to create a release has much value as we have CPAN.

In my opinion every ecosystem that is readily available for public consumption should be reliable. So I think modules that are meant for normal public use and modules that are depended on by other modules absolutely must always have stable source-urls.

@ugexe You mentioned the phrase "bleading-edge ecosystem". Can you elaborate what a use case and usage pattern you envision in that respect? Maybe that provides a more valuable role for p6c to play.

@patrickbkr
Copy link
Member Author

@ugexe Did you notice my above question?

@ugexe
Copy link
Contributor

ugexe commented Jul 2, 2020

Someone can keep pushing their Foo::NotYetReleased that depends on other ::NotYetReleased stuff. The fact everyone uses this mechanism for "official" releases is besides the point.

In my opinion every ecosystem that is readily available for public consumption should be reliable.

What is more important is that every default ecosystem is reliable. If we could make p6c disabled by default in zef then it users could just do zef install Foo::Maybe::NotYetReleased --p6c to include the unreliable p6c ecosystem.

So I don't think having a p6c that also requires initial setup (creating some GitHub hook)

There is already user interaction -- the user has to add their module to META.list. I am saying one option is to remove that step and instead have them setup the hook (presuming a hook can even be used to do what we need).

In my opinion the primary strengths of p6c and CPAN are ease of use and robustness respectively.

But its not just an ecosystem thing -- if users aren't doing use Foo:ver<...> then its mostly a moot point. At that point doing one command for a release doesn't seem so bad.

While I've mentioned possibly making automatic releases by scanning git-repos, I've also not convinced myself that this would be reliable in the sense CPAN is (i.e. some script that will evolve and change over time will be acting as the authority of when some version is exposed to the public as an immutable version). I'm not saying its a bad idea either though.

@patrickbkr
Copy link
Member Author

@ugexe Thanks for your comments!

Someone can keep pushing their Foo::NotYetReleased that depends on other ::NotYetReleased stuff. The fact everyone uses this mechanism for "official" releases is besides the point.

I'm struggling to understand. Do you mean: "P6C is not meant to be used for official releases. That people do that anyways is sad, but not central to this discussion." ? (If that's what you mean, then what is p6c meant to be used for? - That's basically my last question at the bottom.)

In my opinion every ecosystem that is readily available for public consumption should be reliable.

What is more important is that every default ecosystem is reliable. If we could make p6c disabled by default in zef then it users could just do zef install Foo::Maybe::NotYetReleased --p6c to include the unreliable p6c ecosystem.

I think we agree and actually mean the same thing here.

So I don't think having a p6c that also requires initial setup (creating some GitHub hook)

There is already user interaction -- the user has to add their module to META.list. I am saying one option is to remove that step and instead have them setup the hook (presuming a hook can even be used to do what we need).

In my opinion the primary strengths of p6c and CPAN are ease of use and robustness respectively.

But its not just an ecosystem thing -- if users aren't doing use Foo:ver<...> then its mostly a moot point. At that point doing one command for a release doesn't seem so bad.

One command for a release isn't bad. But if we aim for the one-command-per-release solution, then I think the best way forward is to deprecate p6c and push people to release on CPAN instead.

While I've mentioned possibly making automatic releases by scanning git-repos, I've also not convinced myself that this would be reliable in the sense CPAN is (i.e. some script that will evolve and change over time will be acting as the authority of when some version is exposed to the public as an immutable version). I'm not saying its a bad idea either though.

I don't yet understand what concept you have in mind when you mentioned "bleading-edge ecosystem". Can you elaborate on that? Maybe the best way forward is to force people to release on CPAN by default and have p6c serve a different purpose. That's why I'm interested in your thoughts on such a different purpose.

@ugexe
Copy link
Contributor

ugexe commented Jul 2, 2020

But if we aim for the one-command-per-release solution

The version absolutely must be bumped every release -- this cannot be avoided. So maybe not a CLI command, but the user has to do something specifically to say 'ahem new version'. A naive solution might just bump the minor version every commit automatically (assuming the distribution is even using semver), but that not reliable because its going to end up mis-versioning major version changes. The nice thing about the git hook solution is the hook presumably passes along the version the user explicitly declared has just been created.

I don't yet understand what concept you have in mind when you mentioned "bleading-edge ecosystem".

Know how you can add testing or unstable to apt-get? Or add different casks to homebrew? Similar to that... of course these need be co-operative so its slightly different. One might argue it is reasonable for such a use case to just manually pass the url to it and all its bleading edge deps ala zef install $url1 $url2, which also makes sense.

@patrickbkr
Copy link
Member Author

@ugexe The bleading-edge ecosystem idea sounds interesting. If we want to go down that road the following milestones come to mind:

  1. Adapt the docs to very prominently instruct module authors to put final releases on CPAN. Exlain the nature of p6c as an ecosystem for distributing unstable things.
  2. Make sure zef does not install from p6c without being explicitly asked to do so. Maybe rename it to more obviously state the "testing" aspect. This will break backwards compatibility for every script installing a module that is only available in p6c.
  3. Think up and create some tool to ease the process of releasing to p6c.
  4. Install measures to prevent people from putting bad stuff in source-url.

I imagine the use of p6c for final / real releases to drop significantly after we are done with point 2. Point 3. and 4. are then lower priority, because people will start using CPAN for real releases instead.

The above is only an idea. I'd like to have some feedback on this.

We need to have consensus on how we want to move forward with p6c in general before anything can move forward.

@nxadm
Copy link
Contributor

nxadm commented Jul 2, 2020

1. Adapt the docs to very prominently instruct module authors to put final releases on CPAN. Exlain the nature of p6c as an ecosystem for distributing unstable things.

Don't forget that many authors, including some core devs, don't want to use CPAN and prefer the github model. Enforcing the Perl CPAN may result in the opposite of what you want to achieve.

@patrickbkr
Copy link
Member Author

@nxadm Can you elaborate on the "don't want to use CPAN and prefer the github model" bit? I'm interested in the reasons so they can be addressed.

This is especially interesting as there currently is only a single distribution I'm aware of that uses p6c in a safe way. It's Inline::Perl5. And that ironically links to the CPAN release file in its source-url.

@Altai-man
Copy link
Member

Can you elaborate on the "don't want to use CPAN and prefer the github model" bit? I'm interested in the reasons so they can be addressed

Just my 2 cents: when I asked folks how do I get an account to use CPAN I was replied with something like "So you e-mail to some folk and after a week or two they maybe reply you back", which sounded 1993 to me and I dropped this idea.

For the sake of an example, I (as an example of the user who did not use CPAN nor had other related experience) will now proceed with trying to get an account to work with CPAN, writing down below what I am seeing...

1)Google "how to get cpan account"
2)First link is - https://www.cpan.org/misc/cpan-faq.html
3)Skipping questions like "Where can I find the current release of the Perl source code?", "Where can I find/join/create Perl mailing lists?", is this Raku or what?
4)Get to section " VI. - Contributing modules, patches, and bug reports" somewhere down below, the question "How do I contribute modules to CPAN?"
5)"If you would like to learn more about PAUSE and how to go about contributing your module to CPAN please read the PAUSE FAQ at http://www.cpan.org/modules/04pause.html which will tell you how to go about getting a PAUSE ID and the steps needed to upload your code. Also,perldoc perlmodlib and perldoc perlmod are a good introduction to Perl modules. " <- oh, so I need another FAQ to go from this FAQ, I see... Perlmodlib is so related to Raku... Ok, next FAQ
6)https://www.cpan.org/modules/04pause.html <- no CSS page, "Your duties, traps"...
7)https://www.cpan.org/modules/04pause.html#registering <- "Registering as a developer".
8)"If you have written a module, script, or documentation you would like to contribute to the archive, visit pause.perl.org Registration (Non-SSL version) and fill in the form. You will be notified by email about your registration. Please allow three weeks for proceeding, which should be the maximum during vacation time. Normally we hope to register you within a week. The resulting email traffic will run through [email protected] and will be archived at http://www.nntp.perl.org/group/perl.modules/. [email protected] isn't a mailing list, just an alias for the maintainers of the Perl 5 modules database" <- this is not how you register accounts in 2020.
9)Ok, so I go to yet another page again... "A PAUSE account is only required to distribute and manage Perl module distributions on CPAN. You do not need a PAUSE account to submit bug reports to RT or participate in many Perl community sites." <- very Raku.
10)"This trivial expectation was then coded into the server side sanity check of this form and it turned out to be a super efficient spam protection because bots often did not try to enter a space in the middle of the field. It was about the year 2003 when people started to complain that they had tried Peter and it did not work. Poor Peter, please remember you do have a second name. " <- what was all this small text from 2003?
11)"A short description of why you would like a PAUSE ID:" <- I want to upload modules dammit why is it so hard. Do I need to pass an interview with tech lead and HR to upload my module?
12)So assuming I filled this I then wait for people to manage my account by hands. If they are not vacationing, of course, maybe there will be a three weeks delay.

Just to be clear: I do not want to offend anyone. But clear issues I see with CPAN usage for Raku are:

1)Perl everywhere.
2)Look and feel from 2003, which is an awesome thing which works in about any environment, browser, etc! This is really cool. The downside is that t looks unpleasant and abandoned in 2003.
3)We need our clearer and concise documentation about this, pages, managing, all that. I can hardly believe a lot of people today would be happy to write essays in 10 sentences why CPAN should accept their modules.

For the record, I do not want to look like a pretentious kid who needs at least 5 megabytes of javascript to consider a website cool and who does not respect awesome universal solutions like simple web pages, doesn't like to read a lot of text and such. But even so I found mere account registration process, which should be dead simple, not so simple, and I am scared to think how many pages should I read to get to know how to upload my module. Even if there is a single command to do so, as an example of a user outside of Perl community, after the steps above I don't have a lot of desire to look it up. It should be simpler to share useful code.

@nxadm
Copy link
Contributor

nxadm commented Jul 5, 2020

@patrickbkr,

I see the importance and usefulness of CPAN for Perl. It's has been there for decades, it works very well and is 100% identified with Perl. CPAN is Perl and Perl is CPAN.

Even ignoring the terrible error of running user-visible Raku infra on something that's clearly Perl-centric after the rename and the announcement of Perl 7, I fail to see the point of CPAN for Raku. It's 2020 and we live in a world with free and/org self hosted repos (github, gitlab and gitea) and zillions of CI solutions. The CPAN flow feels like reinventing the wheel like it was 2000.

2c and all.

@patrickbkr
Copy link
Member Author

@Altai-man I agree. So deriving some actionables from your observations:

  • Improve our documentation to provide a very clear guideline of how to reach PAUSE, that it's an infrastructure shared with the Raku ecosystem, and what steps need to be taken to register an account. Uploading modules can be performed by one of the multiple dist helper raku utility modules. The documentation should recommend to do so instead of using PAUSE directly. The idea is to have our documentation guide the users through the entire process. So no googling and reading up on third party websites should be involved.
  • Prettify the PAUSE website a little. I think there isn't that much that really needs to change. In essence it's a single form to register and another one to upload a module. Two possible approaches:
    • Colaborate with the PAUSE maintainers and improve the PAUSE website itself. Given the history of stuggles with improving that website this could be difficult.
    • Write our own prettier PAUSE frontend that delates to the current one behind the scenes. This should not be done in a "hostile takeover" manner. The current maintainers should be contacted and the best way to proceed discussed.

@nxadm I do see value in CPAN in contrast to a repo based approach. Mostly because of it's reliability and single point of authority. p6c currently does not provide a similarly reliable service. p6c is inherently decentral, modules can be located anywhere and no guarantees can be given that they don't disappear or change. We could reimplement our own module data-store. But that is coupled to our own hardware and software infrastructure which costs money and needs to be maintained. I think the worst part about CPAN is that we need to cooperate with people outside our Raku bubble. But I don't mind doing that much.

Should this discussion be taken over to problem-solving?

@JJ
Copy link
Contributor

JJ commented Jul 12, 2020

The proposal for native dependencies by @samcv in #334 is probably relevant here too...

@patrickbkr patrickbkr added the WIP Work In Progress, do not merge (yet) label Jul 21, 2020
@patrickbkr
Copy link
Member Author

Now that a new indexer is in development that is meant to work with volatile source URLs and where they make a lot of sense (the zef-p6c ecosystem) we should not proceed with this PR. Closing.

@patrickbkr patrickbkr closed this Jan 22, 2021
@patrickbkr patrickbkr deleted the no-volatile-source-url branch January 22, 2021 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work In Progress, do not merge (yet)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants