Skip to content
This repository has been archived by the owner on Mar 25, 2018. It is now read-only.

Decentralized module delivery #26

Closed
scriptjs opened this issue Nov 21, 2015 · 31 comments
Closed

Decentralized module delivery #26

scriptjs opened this issue Nov 21, 2015 · 31 comments

Comments

@scriptjs
Copy link

I am asking that the TSC begin to explore a future of module delivery for node that is decentralized. Along with this, it should assume control of determining future standards around the use of the package.json. A decentralized approach (as opposed to what today is centralized) for module distribution is safest at scale. The Web itself was designed around this philosophy and it is a useful model for the scale of node.

Further, the node community and TSC must retain control over the user experience of node. The future of module delivery can and should involve multiple players that deliver according to standards that are developer driven and in consultation with the TSC that can assure this interest.

The process of moving forward can and should involve the community in determining possible approaches, tests, and be given careful consideration over time since we are now into an LTS approach to releases. This future can evolve with es6 and with the purpose of moving away from the legacy of a single upstream service.

This is being referenced from nodejs/node#3955

@mikeal
Copy link
Contributor

mikeal commented Nov 24, 2015

I'd like to correct an assumption you may have about how the project and TSC operate.

The TSC & CTC don't direct the project in the manor you suggest. They might sometimes set goals or state mid-term objectives but, for the most part, the TSC/CTC allow the contributors (which is a much larger and broader group than the TSC or CTC) to drive the project. Once something is actionable (like there's an active pull request) the CTC would weigh in. If something is being discussed in the abstract but could easily turn into a Pull Request they might also weigh in. If a consensus can't be reached among the contributors the CTC will come in and make a final decision.

The TSC/CTC don't do anything like "explore a future of module delivery for node" or even direct people to do so. The TSC/CTC would only get in the way of that kind of work. In fact, this kind of work has been happening in the community for some time, it just hasn't reached a state of maturity where those contributors would suggest replacing the current package delivery system with it.

As you can see by the prior art there aren't any changes necessary to core or the module system in order to accomplish what you are suggesting, so there isn't much for the core project to do until this matures.

@scriptjs
Copy link
Author

@mikeal Thank you for this explanation. I would like this issue to remain as a potential target for NG. From my perspective the only tie I see being needed in the core would be in the bootstrap to more tightly integrate the CLI and module management. Beyond this, it can be independently developed.

@Martii
Copy link

Martii commented Dec 1, 2015

+1

@ChALkeR
Copy link
Member

ChALkeR commented Jan 23, 2016

Ok, my opnion on this topic here: this seems like a rather broadly defined issue.

Atm, npm client is not completely centralized, i.e. it can operate with other registries, it can download packages directly by http, etc.

@scriptjs Are there any specific technical steps that you want to see happening? Changes to package.json format? Maintaining another npm-compatible registry? Creating a distributed p2p storage for npm packages backed up by some maintained server that will seed all of them for rare packages and be used for moderation?

And yes, as @mikeal noted: TSC/CTC is not going to like say anyone «go and implement this». This should be implemented by someone who is concerned enough to do that (presumable very huge) amount of work by himself.

Note that moderation of the registry is currenly needed, because there could be harmful packages. Also there could be (theoretical) situations when the whole registry must be stopped for moderation (I could describe such a situation a bit later), and that should be achievable. I am not saying that this restriction must be absolute, though.

@scriptjs
Copy link
Author

@ChALkeR I agree, it is broadly defined and I will come back with specifics and further rationale.

@joshmanders
Copy link

I think the biggest issue is being as npm is bundled in Node.js and the AUTHORITY package registry and is putting too much control in the hands of a single company. The split to io.js was because the community didn't want a single company telling the community what they can and can't do or dictate everything. The mere fact that npm is bundled makes it hard for an alternative to get off the ground.

Yes npm does allow anyone to have their own registry, but then again, what if someone creates a registry that's compatible and then npm is like "whoa, we don't like this. Blocked." now what?

It's just too concerning to me, especially in light of recent activities on Isaac's twitter.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 23, 2016

@joshmanders

Yes npm does allow anyone to have their own registry, but then again, what if someone creates a registry that's compatible and then npm is like "whoa, we don't like this. Blocked." now what?

I don't think that's technically possible. Node.js comes with a pre-bundled npm and most people just use that. To block some other registry, it would be required to get that into some npm version and then get that npm version into Node.js somehow (which would raise questions), then wait for everyone to update (which would take ages). Also, it would be completely crazy to do something like that without serious reasons.

@joshmanders
Copy link

@ChALkeR there are ways around it, when installing node, the version bundled with npm isn't the only version you can use. If they wanted to be sneaky they could say a security issue was discovered in npm and that you need to run npm install --global npm to get to latest to fix bug. Boom block in place.

@joshmanders
Copy link

I don't know, just in light of recent twitter activities I'm very concerned with the fact that a single company has so much control over the users of node.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 23, 2016

@joshmanders Let's keep the discussion technical here and discuss more spefic proposals once they will be prepared.

Upd: I initially posted that twitter link as a reference, I did not want that turning into a huge off-topic discussion of it in these two issues. Perhaps that has been a mistake on my part.

@jasnell
Copy link
Member

jasnell commented Jan 23, 2016

so accountability is a powerful motivator here. While it is certainly possible for npm to be a bad actor here, is it really within their best interests to do so knowing that there is an active and vibrant community that would call them on it the second they do? That's not to diminish the concern in any way @joshmanders , it's just to say that the community does play a level of oversight here that should not be underestimated. That said, what I'd personally be most interested in discussing here are the technical options for improving things for everyone in the community. How do we further improve a registry ecosystem in which we all benefit?

@ashleygwilliams
Copy link

i second @ChALkeR's comments. i think that discussing a decentralized system is a very interesting idea (even as an npm employee) and would like to see the conversation focused on that :)

@scriptjs
Copy link
Author

@joshmanders I have expressed my concerns about the control of the client and registry in nodejs/node#3959 when I discovered that the Artistic License was modified in Node. This seemed to bring everyone into NPMs terms of service without their knowledge. Millions of downloads of Node had this included – including the LTS. The license that appeared in installers and the node site differed from what landed on your system. Some of this was corrected, but the Node Foundation's lawyers still need to come back with an opinion on the impact this has had for those users and whether any of this could be considered enforceable. I don't want to get into this any further here, but in my opinion this kind of issue also speaks to the need for a decentralized registry and generic client software for accessing it that is not controlled by NPM or distributed with node.

@joshmanders @ChALkeR to your question about whether technically it is possible – indeed it is. There is currently nothing in the node module system specific to NPM. The NPM client is the experience we have atm by default – only as a result of the legacy of node. There are a couple of solutions that can provide the peer to peer capabilities. That said, module delivery is only one part of the story. Module resolution and a client capabilities analogous to what exists with the NPM client are another.

Currently NPM 3 while providing a flatter structure and deduplication of modules has demonstrated itself to be significantly slower. This has created a lot of complaints and there is a large thread that deals with this specifically on the NPM repo where you can read about it.

An alternative, simpler and and faster solution is under development http://gugel.io/ied. It has a few features missing but is currently approaching something usable and it is very fast. It also creates a flat structure.The ied client can already handle connections to any registry. I only raise this as an illustration that module resolution piece is already being done by alternative software. ied is much smaller than the npm client as far as code. It is easily examined and anyone can hack on it.

I think ied will be the first viable contender for an alternate client to NPM. Not peer to peer, but competing favourably with the client bundled with node. http://jspm.io was designed as a package manager for frontend but can also manage npm fetches currently as an alternative. duo.js https://github.com/duojs/duo is another alternative that can be coded to work with npm with an alternate provider. The resolving methods are what is important in these solutions for those interested. Peer to peer is transport that the resolving mechanisms can be matched with for future.

Peer to peer will deliver better speed and better global distribution. With the uptake of node, colleges, universities and organizations that currently provide endpoints for apache licensed software can also be used to seed the registry along with many thousands of developers. I believe a better future is one where organizations participate as a peers to persist and seed modules but with no direct control over the registry. Anyone with access to modules can provide front end search capability. At present there are only 250K modules. Even with versions, this is not a large amount of metadata for a search engine.

I believe this is more in the spirit of open source and would match the scale of development taking place in node today. None of us can manage without access to modules. At the same time, no one wants our access to modules controlled by a single commercial entity.

One issue that was raised by someone was code vetting. Currently the security and quality of modules is not vetted by NPM. There are certain warnings emitted by the client but that does not equate to vetting. https://nodesecurity.io today is the authority on these matters and peer to peer transport of signed code would not change this. In fact a very small percentage of software in the registry is used. I don't have figures in front of me but between 10 - 15% of the modules accounts for vast majority of downloads.

I will provide some updates to this thread on specifics of the peer to peer solutions coming. https://github.com/mafintosh/dat-npm was an early attempt to transfer registry data on a peer to peer basis. Dat is reaching a 1.0 quite soon and its simple re-engineered architecture make it a strong candidate for a decentralized registry. Currently the dat team has a proposal in for publicbits where Google and others have expressed interest in supporting persistent peers for public data.

I encourage anyone reading this to offer their feedback and identify solutions that can help achieve this goal.

@formula1
Copy link

Issues with peer to peer and possible solutions

  • duplication/synchronization of data - This can be resolved with each node_module provider provides only a subset (ones under the umbrella of the company) of modules and has a few selected 'trusted peers' that will resolve names that it cannot resolve. Otherwise thats a ton of data that servers may or may not reliably be hosting.
  • naming conflicts - As an end user I can choose 'trusted hosts' similar to apt-get. If the name cannot be resolved by the trusted host, the host would then notify peers of the name requirement until one is discovered and/or the next trusted host will be requested
  • free-ness - As it stands npm is nice because anyone can upload anything and taking any name. Once it becomes decentralized, whom do you send the module to host if you don't belong to a company? What this will in turn do is a greater vetting process for unfunded projects related to funded projects. I highly doubt strongloop would host a broken JSON Stream Parser under the name JSONStream when there are very strong alternatives. While currently npm doesn't really care.

@jasnell
Copy link
Member

jasnell commented Jan 23, 2016

So one idea that's been bouncing around at the back of my brain for a while (that I would love to get @othiym23 and @isaacs thoughts on as well) is this... I have absolutely no position yet on whether this would be a good thing or bad but I'd like to at least surface it here and get the arguments for and against: there's a certain amount of minimal work that any package manager for node would need to do. That would include things like parsing the package.json, determining the tree and laying out the packages on disk. I know the algorithms involved can actually be quite complex but... what if those minimal core pieces were actually part of node core while the remaining higher level functions such as the wire protocol, the registry interactions, etc were kept separate. Would that help or hurt things? (I'm honestly asking because I simply don't know). Or would it not make a difference one way or the other?

@scriptjs
Copy link
Author

@formula1 npm created a serious problem with naming. A better strategy involves combining a registry name with the module name. This is the approach taken by duo. There is no vetting process right now which is a problem for everyone using NPM atm.

@scriptjs
Copy link
Author

@jasnell Truly that would be a good direction. The result of that would then only then require a small piece of software - a provider to connect the transport to a source of modules. This could be a CDN, peered source etc making clients easy to construct. The benefit of this also is that it also opens this up for private registries where you are doing not much more than pointing to a source of modules for your internal consumption. Solutions like Sinopia https://github.com/rlidwka/sinopia are making use of this scenario and could be fitted with a client to only retrieve what is wanted from public sources.

The key point here is the resolving pieces are known and ied is an attempt to keep that part of things as small as possible. Simple clients can be created leaving people to choose module sources to onboard as their endpoints.

@formula1
Copy link

@scriptjs I don't disagree that combining location with naming is a stronger approach, I can't help but find myself quite happy with npm install canvas. git clone [email protected]:nodejs/NG.git or npm install [email protected]:nodejs/NG.git is probably more reliable (and really isn't that much more difficult) and ied looks like a fantastic project which I hope becomes the new standard. But I certainly am also very much in favor of readable names. At the very least allowing servers to act as name->location so that lazy/visceral people like myself don't have to write [email protected]:strongloop/express.git everytime. The ol npm install express feels so good.

@scriptjs
Copy link
Author

@formula1. component.js, duo.js's predecessor already had this kind of thing worked out so that fetching did not mean anything different than what you do today. These were set with repos being resolved and fetched directly from github at the time. It was just as easy as:

component install myrepo/mypackage

You did not have to provide a github path. This was a default and you could set what you wanted there.
You could also require it in your package with just the mypackage name as it was internally aliased.

@formula1 ied fetches quickly. Once you have modules on your system it is insanely fast compared to npm. What it illustrates is that we already know the algorithms necessary to do these things. Right now, it is a matter of the will to change everything for the better to drive this forward

@jasnell NPM has plenty of experience to bring to a discussion and I hope they will contribute here. I think it is inevitable that a peer to peer solution will be available later this year. ied will be given a different name but will also be a viable client this year for the existing registry. I am hoping that what is important is what benefits the community and to encourage solutions. I think the type of thing you suggested is an excellent direction. What I also like about this is that it would also mean the community being engaged in some development of standards, particularly with the package.json. This was also something of a concern in the recent past. In my opinion, it should not be left to a private entity to determine metadata standards for the community.

nodejs/node#3949
npm/npm#10479
npm/npm#8918
and others.

@formula1
Copy link

I'm sold, though if gitlab becomes a standard this may have to change. But thats a problem when we get there

@Qard
Copy link
Member

Qard commented Jan 23, 2016

I can't imagine most enterprise businesses trusting a distributed package repository. It's hard enough just getting them to trust the single entity of npm. Making a publicly distributed package repo would just add risk of 50% attacks. Even with package signing, you can't trust the signature if the the verifier is susceptible to network dominance attacks.

Enterprise already makes use of alternate npm clients and self-hosted repos. I don't think of npm as the command-line tool or the repo, I think of it as the organization that helps to define the spec for how node modules are described and linked together. The repo and and CLI tool are really just reference implementations. I've seen npm express several times before that they are totally okay with these alternatives. Until they start trying to sue people for circumventing their product by creating alternate repos or something like that, I'll continue to trust them.

@jasnell
Copy link
Member

jasnell commented Jan 23, 2016

@Qard ... indeed. For most enterprises, the issues tend to be more about the provenance and reliability of modules as opposed to discovery or even where those sit out in the cloud. Package signing provides only part of the solution and distributing the repository over multiple provides does not provide any additional assurance -- it might be part of the solution, but it is not the entire solution... but that's what we're here to figure out, right? :-)

@scriptjs
Copy link
Author

@Qard. I also agree. Distributed can mean a few things. Peer to peer alternatives can co-exist with the same packages in mirrors across trusted sources analogous to how apache software is distributed. I think lets see where this discussion goes. The dat people are working with the scientific community on large volumes of data, often much larger than what we have in npm.

Their top priority is ensuring the reproducibility of data. These are institutions, people that are similarly concerned with receiving untainted data so this is something of general concern that will be sorted as they move ahead. In addition, peer to peer can simply mean accepting data from trusted seeds and sources

@scriptjs
Copy link
Author

@Qard NPM is not the organization that defines the spec for how modules are described and linked together. The module API is a node API and any system can use it. The CLI and repo are not reference implementations, they have been put in play in a deliberate way and with monopolistic consequences. Even the current mirrors feed solely from NPM which is a problem.

If you want to speak of trust, you may want to read nodejs/node#3959 or nodejs/node#3955 from yesterday where @issacs made reference to blocking access to the registry by ip, or the increasing volume of policy and terms being developed that is changing developers relationship to software that was deliberately and openly provided to a community https://github.com/npm/policies. I think everyone needs to draw their own conclusions. Clearly the relationship between NPM and Node is unlike any other relationship to modules for any other language providing these in open source. It is appearing increasingly opportunistic.

@Qard
Copy link
Member

Qard commented Jan 23, 2016

I'm not talking about how a module file is loaded, I'm talking about how dependency versions work to link modules together. Node itself has no concept of semver, it has no concept of module lifecycle scripts like node-gyp building native modules on install, it has no concept of linking any sort of identify/ownership to specific modules. These are all things that npm has defined for us, and a lot of effort has gone into making npm a good product.

As for the IP blocking comments, I have already read all of that. I personally agree with the approach @isaacs has made. It's not just IP blocking, that is one of many methods being used to keep toxic people from infecting the community. I'm not going to get too deep into my opinions on that, but I think enforcing the CoC is very important to having a safe and inclusive community. You seem to view the stance of npm as more "aggressive" than repository operators of other languages, but I and many others consider that a failing of those other languages.

@formula1
Copy link

@scriptjs Clearly you are quite passionate about this issue and honestly it can possibly be serious, but right now the water isn't boiling so the frogs aren't jumping to solve the problem. Consider the TC39 process

As it stands you are at the proposal point

  • you have made the case for the addition (and I believe no one will disagree with you)
  • Describing the Solution - I don't think that anyone would disagree that ide is an appropriate replacement. However, 'moving things to github' creates a new centralization problem. So I challenge you to get deeper into this
  • Potential Challenges - Consumer's lack of familiarity, Enabling a clean switch with package.jsons, developer hosting, abandoned/higly dependended upon packages. These are aspects you have not mentioned

On the enhancement Aspect

  • Who will champion this? The Nodejs team? I would if it means I get a job out of it, but really though someone has to start the building.
  • Outline Need - Its not an immediate need, but just like Microsoft went from 'the bad guys' to 'we are supporting node in a big way', NPM may change. The Licsensing change is an example of that. You've gone into this
  • Illustrate examples and usage - ide is your example, but its far more back end then consumer facing
  • High Level API - Well, yes, distributors of npm packages would like to know how they can distribute and/or resolve package requests.
  • Algorithms - I presented one that I thought was fair based off the apt-get model. But using github as the quick names is fine too I guess
  • cross-cutting - NPM is probably a funded company. Servers to hold likely terrabytes of data and load balance all the downloads/uploads/searches is not free. Paying developers to maintain their inferstructure and ensure the npm api is top notch is not free. They have 'private repos', but thats a convienience thing. Node also is funded. Node is not a bunch of fun/awkward/passionate individuals making cool projects and somehow make money off free software. It is fun/awkward/passionate individuals funded by some of the best companies in the world whom are leaning on eachother for the good of the whole. So even if this is a good idea from a libertarian opensource independent community standpoint, the reality is that there is real money and real progress going into npm right now to ensure the best possible package manager available. By breaking a cog, you have not only pissed off npm, you've pissed off people that pay to make javascript great. People that risk millions of dollars a year with the expectation its all worth it. Perhaps the Node TSC needs to make it clear that node survives foremost because peoples love of javascript (or a single language once webassembly allows for any language) and second most by people funding the projects we love. Pissing off where the money comes from helps few people especially when npm is doing a fine job as it is.

That's probably a reason why this may not get traction and/or may get ignored.

@Qard Define Toxic. That is an evolving term and means different things to different groups. Ads are viewed as toxic by many content consumers, Ad blocking is viewed as toxic to content producers. Similarly, we used to view kickstarter as an awesome tool that creates new exciting products that normally would never see market. After some time, it was considered as a place where you might get ripped off or dissappointed. Firefox as an example now has pocket integrated in their browser which spurred some backlash about privacy despite Mozilla's claimed commitment to privacy. There are many examples of 'toxic' as being an ever evolving concept. NPM has shown that it changes. The movement towards Artistic Licsense is an example of this. Decentralization allows content producers to define toxic for themselves and/or be free from npm's choices. Though most people probably can agree on what toxic means ignoring this problem because 'its not good enough to be a problem' is ignorant and lacks foresight. There are very good reasons to ignore the problem (see cross-cutting above), but the fact you agree with them now, in my opinion should not be one

@bnoordhuis
Copy link
Member

I think enforcing the CoC is very important to having a safe and inclusive community. You seem to view the stance of npm as more "aggressive" than repository operators of other languages, but I and many others consider that a failing of those other languages.

I disagree. npm is a dumb content provider and should not pretend to be anything more. If it starts discriminating, it's time to look for alternatives.

@jasnell
Copy link
Member

jasnell commented Jan 23, 2016

@bnoordhuis ... can you clarify a bit further what you mean?

@mikeal
Copy link
Contributor

mikeal commented Jan 23, 2016

This is long past the point where it is productive.

I'm going to agree with #26 (comment) and suggest that new issues be created that are more specific in scope. As it stands this thread is just a lightning rod of pet issues with npm and not about any problem specifically or even a specific way to fix it (which I believe was it's intention).

I'm going to suggest we close and lock it.

If there are things people would like to continue to discuss the best way to do that is probably to create new issues with clearer scope.

@jasnell
Copy link
Member

jasnell commented Jan 23, 2016

@mikeal ... sadly I have to agree. I think we absolutely should talk about future technical directions around the module ecosystem but those need to be grounded in concrete technical ideas and proposals or discussions about what may or may not be a good technical direction to follow. This conversation, however, is simply devolving into the same repetitive points being made over and over. I'd say lets close the issue with a request: for those involved in this conversation, please work on a concrete proposal and bring it to the table.

@nodejs nodejs locked and limited conversation to collaborators Jan 23, 2016
@mikeal mikeal closed this as completed Jan 23, 2016
@mikeal
Copy link
Contributor

mikeal commented Jan 23, 2016

Closed and locked.

This is not meant as a statement about this work, only a statement as to the current status of this thread.

I believe that the intention of the thread was a good one and that, unfortunately, this thread no longer serves those intentions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants