-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Must require specifying the type of version constraint #391
Comments
@sdboyer First, I'm really glad this happened because it means we get to catch the issue and work it out before end users have to deal with this. Thanks for bringing this up explicitly. Here is the start of me trying to think it through (it's a little stream of thought)... The first thing we have to do is be pragmatic and build a system with the least about of WTF. To put it another way, we need to build a system that minimizes the number of support requests (because many users will walk away at issues and not file a request and those that do we don't have piles of time to help). This makes me relate to the But, there were numerous times people expected the People wanted to automation to detect common use cases. I think adding an additional property to specify the type of version falls when it comes to making users think. Today it can figure out the type but requires some extra computational work (including talking with the repos). Asking people to add something extra will both not be compatible with But, let's continue but digging into the technical... Glide today checks if something is an existing reference (tag, branch, or commit id). If it's not it then goes down the SemVer path. Internally, a VCS reference is prioritized higher than SemVer. The systems, like Crates or NPM, that have a central repo of packages are able to enforce SemVer in the packages that are pushed. So, we can't look to them for guidance because we can't control things to the same level. Packagist is a hybrid solution. It talks to the systems, such as GitHub, for the code while holding a metadata store in the registry. Composer will let me specify branches using the syntax To keep the metadata up to date Packagist does daily polling or allows web hooks to be setup when code is pushed. This allows the central repo to keep up to date. A client side thing handling versions has to pull to get the latest versions. If there is going to be a local cache of possible values it will need to be derived from repos. If that cache holds the versions why can't it hold all the branches, tags, and versions knowing which is which? Why can't that be calculated automatically and stored in some local cache for the software to use later? This could be useful because branches or unknown versions would force going out to the network to check for updates. Thoughts? |
indeed!
I don't think it'd be incompatible with existing It can certainly be a pain to have to specify extra data. But in this case, we're not actually talking about something the tool can figure out. So...
It's not that we couldn't do this. We can, and a fair bit of it's already in place. It's that when we rely on the information available from upstream repositories in order to divine the meaning of a local constraint declaration (like Here's the not-unlikely nightmare scenario: you're working away on a large project, and you have some dep, A, which in turn has a dep on B constrained to In the middle of your workday, the author of B pushes some new branch, called From here, there are several possible outcomes:
There's some more, too, but that gives a flavor. (Remember, also, that having this issue makes the hash stored in the lock file meaningless. That is a huge loss.) I could easily see people spending hours tearing their hair out over this; these issues could easily be experienced as schroedinbugs, which are...a very special circle of hell. The only real solution, as I initially suggested, is to make the constraints fully self-contained, without the need to contact any external system in determining their meaning. That's really not hard to do, but its effect on sane system design is enormous. Having/not having a central registry becomes irrelevant (on this specific issue). The drawback of that, as you noted, is the extra work for users. I think, though, that that's quite surmountable. I'll write a second comment for that... |
When it comes to a solution, I can't see a way around needing to have the user indicate the type of version, as well as the version itself, in an unambiguous way. The goal is to not have any distributed system-like failure conditions:
IMO, encoding the type within the version string itself is never really going to be sufficient - e.g. a tag named With a separate field as the goal, I think there are three issues to address:
User easeThe big issue here, I think, is not requiring users to manually enter a version type all the time. One way to do this is with autodetection. The change I'd propose from right now is that that autodetection is done once by the initial stages of The advantage of this is that we can go with autodetection most of the time, but when there's an ambiguity like in our
The user is only confronted with this message if we actually find a branch (or tag) named thusly; otherwise, it proceeds with the semver assumption. And, for the naked get - Obviously this doesn't cover the case where people are hand-editing the file, but as long as the rules are consistent, then it's a small cognitive leap, and the extra typing is...well, hopefully negligible. At the very least, the way it works would not be non-obvious, which I think is really the biggest risk? The best defaultAs I noted in the initial issue, we could take the approach where there's a default vtype that's applied when one isn't explicitly provided. This would be handy for the most common case if you're hand-editing the file. (It does have the drawback of making the behavior here less obvious, though.) I initially suggested that giving semver/versions the default slot. There are two reasons for this:
"Eventual" is the key word there, though. We could also optimize for the shorter term, where people are still tracking branches a lot, and have branch be the default type. Legacy manifestsThis is probably the trickiest part. Having a default vtype actually makes this harder, because it would assign potentially erroneous meaning to existing manifests. It's also hard because we know we have to support whatever exists now in perpetuity - while we can require that the user fix an old manifest in their project, we have to be able to parse in data from old TBH, this is its whole own discussion. The cleanest thing to do is not have any defaults, because it lets us draw a line between the new and old type of data. But that default option...well, it seems compelling to me. We could also entertain introducing a manifest version number into |
Oh, another alternative - instead of always using the import:
- package: github.com/foo/bar
version: ~1.0.0
- package: github.com/masterminds/semver
branch: 2.x |
@sdboyer Thanks for going so deep on this. First, User ease. I have already been planning a bunch of work here. We can do a lot better when it comes to this. I'm glad to see you're thinking about something similar. I have plans to even go beyond this but that's a conversation for another time. Second, I'm very wary of adding another config field no matter how we try to ease it. A configuration option for every possibility isn't the path to take. That ends up in user confusion and pain. Yet, you do need to have just enough. And we can have automation try to fill in the rest. The goal is to target the 80% and provide sane default solutions (often not exposing them). If we have to add a configuration option for this we'll have failed our users. I don't want to do that. So, let's exhaust all other options for automation or flexibility. So, let's look into other options.
This is by no means exhaustive. But, I don't want to fail to find an option that keeps the UX simple. We need to hide the complexity. Risk, complexity, and user experience needs to be weighed. My stance is the best user experience possible, hide as much complexity as we can, and keep the risk low. Risk can never be eliminated (if that's even possible) without making the experience painful for the long tail which isn't acceptable. |
Thanks, these are helpful ideas. Now, I'm not quite clear from your response if you're suggesting we continue relying on checking against external repositories at solve time (as opposed to At solve time, either the system can figure out from pure user input what the intended constraints are, or it can't. If it can, we get:
If it can't...well... User delight is important. Avoiding hitting people with a confusing number of concepts is important. But I'm not convinced it's worth the cost of intentionally allowing the user to get stuck in unresolvable situations. So, I need to be clear on where you're coming from - are you saying:
Your feeling on this really informs the rest of my answer. I'm gonna mostly assume your answer is the former - sorry in advance if I misunderstood.
This is already going to be happening - lock data emitted by
Sure, that could take care of some of it. But it's still just tiptoeing around the core problem. Someone could just as easily make a Now, if what you're saying here is that you'd consider a branch or tag literally named
I assume the latter is supposed to be If we're already detecting that there's a problem, why allow an operation to continue? This is a case where soft failure injects bad data into the system in a way that poisons not only the current user, but other users, later. That's a much higher cost than hard failure.
We're also failing our users right now - see the nightmare case described above, or even just the problem that prompted opening this issue. I don't particularly want to add another field, either. Especially after having done it provisionally in #384. It's...gross. At this particular moment, I'm more a fan of actually adding the four fields -
There's a lot of complexity in this domain to hide, to be sure. But I actually believe this bit is reasonable to ask people to think about: do you want to depend on a branch, a rev, a plain version, or a semver [range]? In my article, I even had my little visual metaphors for this: This isn't one of those bottomlessly complex problems where new layers just keep on appearing. The choice of constraint type for a dep is meaningful, has easy metaphors, and the basic guidelines can be taught in a few minutes. People can casually disagree about it over beers. It's essential, not incidental, complexity. And, as I pointed out wrt
Of course, there'll always be risk. Just like there'll always be bugs. But arguing that we shouldn't try to close off this entire class of bugs because "there will always be risk" is like arguing against seatbelts because car accidents will still happen. This is a single, specific problem, with a single, specific architectural requirement: making pure user input sufficient to determine the meaning of constraints. We're very close to having this, and it's the kind of thing that could cost us literally months or years of work later (if it's fixable at all); when the ecosystem becomes more complex and interconnected, a popular project deep in the dep chain somewhere doing something wrong in this regard could have immediate, viral ripple effects. Our very own left-pad, but worse. |
I realize, I should clarify one thing - we could take an approach where glide hits the network prior to calling This would solve the useless input hash problem, at least...though it wouldn't insulate users from much craziness, and it would (obviously) make any action around the manifest -> lock step pretty slow, too. also, could use @technosophos's perspective on this |
just noticed that cargo, when it's interacting directly with git repositories rather than using crates.io, uses something similar to the multiple named-field approach:
|
OK, I've reflected on this since our debate on Tuesday, and I've come up with some things. To help, I looked through six analogous systems for comparison: cargo, bundler/gems, mix (erlang/elixir), pub (dart), npm, and composer.
Looking through all this has helped me to see the real kernel of the problem here. Hopefully, it can help focus discussion. The base issue is that we have several different namespaces for different types of versions, all of which potentially overlap. That overlap entails that, if all we have is a "version" string from the user, we may not know which type of version they actually intended. Autodetection can tell us what's available, but that's not the same as user intent. Loosely, the three types are what I identified in my article with the three images I dropped in earlier:
All six of the other tools have rules to clearly disambiguate how user input corresponds to these version types. User input alone is sufficient to determine exactly one version type to allow (with only one, specific, carefully thought-out exception). That all six tools adhere hew to this line suggests how important it is that the user be in clear and complete control over whether they're relying on a branch or a version. Some of these disambiguation rules rely on explicitness from the user - e.g., the Composer is an outlier in that it allows slightly more overlap between the branch namespace and the version namespace. (Packagist allows semver tags OR branches). This is all in service of its goal of allowing dependees to specify a constraint that admits EITHER a tag series (e.g. (In fact, the way that composer achieves this is to classify branches as existing in semver's specified prerelease space, which composer classifies using its 'stability' system, then allow users to select stability levels. In other words, they've formally defined a semver relationship between branches and tags, which means the namespaces are no longer 'overlapping,' but 'intersecting'.) Now, reading through these various approaches has made me think that I may need to make some changes to That said, there is still a major problem here. Every single other system I've reviewed takes extensive steps to ensure that branch and tag namespaces don't get mixed up, and many take even more steps than that. Composer is the only real exception, and does so in a very specific limited circumstance which is still visibly obvious to anyone scanning a glide's current approach does not rise to this standard. It does not go the full distance of disambiguating semver/plain tags, branches, and revs, nor the lesser disambiguation of branch and tag. Now, having spent some more time reading glide's current code related to this, it makes more sense to me, and I understand better why the current approach has worked well so far. That's great. If I've been disparaging of that, I'm sorry. But we have a problem to solve that is qualitatively harder than every one of those other tools: there's no registry that can guarantee only semver to match against, and we have more vcs types to support than any other tool. glide has to rise to at least the level of rigor that the rest of the tools do on this front. This could, probably, be done by keeping the single Now, given that these six other solidly successful tools have encoded this same information and everything's been OK, I don't think it should be a stretch for glide. However, here are some points directly on the UX front:
Finally, let's please remember that the risks here is not to one person at a time. Ambiguous constraints mean people cannot defend against upstream craziness. That makes it an ecosystem-level threat. That is, it's not "Person A makes a mistake, and only Person A is affected." It's "Person A makes a choice, and persons B, C, D, and E's builds break, they have to take drastic steps to fix it, which in turn might even cause further breakages." (e.g., they each fork A's project and alias in the fork, making B, C, D, and E become mutually incompatible. This screws over F, who relied on the previously-compatible B and C.) |
From an ops perspective the most important attributes of a package management system are safety and predictability. If a fresh install on a clean server could possibly end up with different source code than what my developer had intended then I need to essentially ignore the package manager and build another process to ensure reproduciblity. It doesn't matter if it's because my developer used an ambiguous string in his or her yaml file. Or if the dev has a lock file that's been around for weeks and I'm starting fresh. Or if upstream changes cause the package manager to install something else for me then it did for the dev. Any of those cases break the tool from a systems perspective. |
It sounds like we're getting close to agreeing on this:
The resolution would be something like this (in order):
I'm leaning away from allowing
The remains a notable unsolved problem:
/cc @mattfarina @sdboyer |
sounds pretty good to me. couple notes:
when you say "use," you mean what ends up in the lock file, right? vsolver emits both a tag and the underlying revision in solutions; you can keep both, or discard the tag.
well...eh, timing. maybe this just ends up being semantics, but because i favor an approach where the user's input gets turned into a well-formed constraint, with well-understood boundaries, independent of what happens to be in the repository, let me try rephrasing this in those terms and see if it still meets with your expectations:
It would be possible (and I don't think harmful), to allow 2 and 3 to coexist - semver range, but also allow a literal tag as a fallback. That's only safe, though, if the literal match is used as a fallback - if it supercedes the semver constraint, then in effect, we're changing the semantics of semver ranges. How does that sound?
Sure, though I think it might make the files easier to understand if glide were to reject having both
Yeah, there's a bunch more detail here. Default branch is one way to go, and the simplest one. Just spitballing, but, there is another option: it could be treated as a total, any-type wildcard. So, it could match a branch, or a tag, or any rev. In
Branches could also be put before non-semver tags...that might be a good change to make in general, anyway.
I have a few thoughts on this, but yeah, a strategy must be had. We ALSO have to retain support for old glide.yaml files, and have a way for interpreting them on-the-fly into the new system, so that deps using the old form still more or less work.
That seems pretty straightforward to me: it's an unresolvable conflict. [re]solving fails. Now, there's one avenue I haven't yet explored - you could try to check and see if |
Well...I mean, there are other approaches. But I think they necessarily involve ignoring version constraints in some way. There's two I can readily think of: First, you can try to just ignore the conflict: see if you can safely encapsulate Second, there's always the static analysis approach: use static analysis to determine if |
Per discussion in Masterminds#391.
Per discussion in Masterminds#391.
Per discussion in Masterminds#391.
So, in working on #384 (holy crap, it's coming along even better than I thought!), I ran into this little problem:
This, resulting from the following (abbreviated)
glide.yaml
for glide itself:I knew this was gonna be an issue in general when I started implementing
vsolver
's interfaces on glide'sConfig
andLock
types, but I didn't realize it was gonna crop up so soon.The problem here is that
2.x
- a valid semver string - is actually intended to be a literal branch name. But, because we have no guidance as to the type of constraint it's supposed to be, we just have to guess. I assume that right now, glide gets away with this by just checking to see what exists in the target repository, and assuming that if there's a direct string match, that's what the user wants. That's problematic, though - for one, it requires touching network and/or disk in order to verify.More importantly, though, it means that the pure, locally-controlled inputs (manifest and static analysis of local project files) are no longer sufficient to govern the behavior of the algorithm. For example, in this particular case, if we rely on inferring from the list of branches in
semver
that2.x
is actually a branch, then that constraint becomes a branch - if, however, at some later time that branch goes away and we solve again, then we'd interpret that constraint as a semver range. Externalities are affecting the way we interpret inputs. And when the meaning of inputs are dependent on external factors, memoizing becomes pointless...so, the hash digest comparison that glide already does becomes kinda pointless.The only solution here, really, is to add a new field that allows the user to specify the type of constraint they intend to provide. We can make this less onerous by letting the preferred constraint type be an assumed default. The approach that makes the most sense to me is:
branch
must be specified as the constraint type if you want to track a branchrevision
must be specified as the constraint type if you want to fix to an immutable revisionWe could also throw a little magic in there like what I already wrote that tries to infer if it's a revision by seeing if it's 40 hex chars. I think the cost might outweigh the benefit there, though.
The text was updated successfully, but these errors were encountered: