Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support relative hashes across the entire API v2 surface #7268

Closed
colan-dremio opened this issue Jul 19, 2023 · 22 comments · Fixed by #7308
Closed

Support relative hashes across the entire API v2 surface #7268

colan-dremio opened this issue Jul 19, 2023 · 22 comments · Fixed by #7308

Comments

@colan-dremio
Copy link

Description:

PR 6932 added support for relative hashes (e.g. main*10000000000) to many APIs. It would be good to expose this concept across the entire API surface.

For example, the commit log API already supports relative hashes, e.g. GET http://localhost:19120/api/v2/trees/main*2023-07-17T17:44:36.463564257Z/history. Since the GET api/v2/{ref}/history endpoint is logically a child of the getReference endpoint GET api/v2/{ref}, it makes sense to support the same semantics in both. Today, GET http://localhost:19120/api/v2/trees/main*2023-07-17T17:44:36.463564257Z does not work, but it should.

Create reference POST /api/v2/trees is another example of a place where this would be useful. This would support operations like "Create branch dev at the state of main as it existed on 2023-07-17".

Since all relative hashes evaluate to a real hash, semantically it is similar / the same as specifying a raw commit hash directly (Detached). In some cases, they will evaluate to the commit at the tip of a branch. IMO committing to a relative hash should never be allowed even if it resolves to the tip of a branch. Treating them as resolving to Detached will give us this behavior.

Another edge case is timestamps in the future (barring clock drift). It would be good to throw an error if a timestamp is provided too far into the future. Perhaps this is better as a separate issue.

@dimas-b
Copy link
Member

dimas-b commented Jul 19, 2023

Create reference POST /api/v2/trees is another example of a place where this would be useful. This would support operations like "Create branch dev at the state of main as it existed on 2023-07-17".

IIRC, the new reference hash is in the POST payload, which is a JSON representation of a reference object. Relative hashes are not supported in the hash property of a reference JSON object in any API v2 endpoints.

Adding relative hash resolution to JSON payloads has a much bigger scope than making the {ref} path parameter work uniformly. I'd suggest to fork this into a separate issue.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

I think we need to agree on what we mean by "the entire API surface":

  1. Do we want relative hashes in commits, merges, transplants? This was disallowed recently by Forbid relative hashes in commits, merges and transplants #7193 and @colan-dremio seems to be onboard with that ("committing to a relative hash should never be allowed").
  2. Do we want / need relative hashes in path parameters, where the hash is meant to be the expected hash on the branch? I don't see a lot of interest in that. Currently, path params accept relative hashes only for endpoints reading the contents / commitlog / diffs.
  3. For reference creation and assignment, indeed relative hashes are currently disallowed, and it would be good to have them supported there.

But are there other cases that need any change apart from item 3 above?

Here is a breakdown of all the parameters dealing with hashes:

Endpoint Parameter Relative hash support Remarks
getAllReferences Query param filter No CEL filter does not accept relative hashes.
getReferenceByName Path param ref No Service layer doesn't take any hash (relative or absolute) into consideration.
createReference Request body Reference.hash No ‼️ Reference object only accepts absolute hashes.
assignReference Path param ref No Service layer only accepts absolute hashes.
Request body Reference.hash No ‼️ Reference object only accepts absolute hashes.
deleteReference Path param ref No Service layer only accepts absolute hashes.
commitMultipleOperations Path param branch No Relative hashes recently disallowed.
mergeRefIntoBranch Path param branch No Relative hashes recently disallowed.
Request body Merge.fromHash No Merge object only accepts absolute hashes.
transplantCommitsIntoBranch Path param branch No Relative hashes recently disallowed.
Request body param Transplant.hashesToTransplant[] No Transplant object only accepts absolute hashes.
getSeveralContents Path param ref Yes
getMultipleContents Path param ref Yes
getContent Path param ref Yes
getEntries Path param ref Yes
getCommitLog Path param ref Yes
getDiff Path param from-ref Yes
Path param to-ref Yes

I marked with ‼️ the only 2 places where I think we need to change something.

Let me know if you agree before I can start working on this.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

If the above is agreed upon, the core change would be to amend Reference.checkHash().

I'd note that the current checkHash() implementation uses HASH_REGEX to validate the hash, but that is in contradiction with the hash field, which is also validated with HASH_OR_RELATIVE_COMMIT_SPEC_REGEX. The former rejects relative hashes, the latter accepts them.

@dimas-b
Copy link
Member

dimas-b commented Jul 26, 2023

My reading of @colan-dremio 's request is that getReferenceByName is specifically asked to support relative hashes.

From my POV it generally makes sense, although I do see some rough edges in terminology. For example, the getReferenceByName endpoint name suggests that we're dealing with reference names only. At the same time, the REST URL path is api/v2/trees/{ref} where {ref} generally identifies a "state of the content tree as a particular point in history" (mostly described as such in other end point docs). In the latter context a request to GET api/v2/trees/main@12345678~3 makes sense to me. It will provide info about the main contents three commits prior to 12345678, which will include the effective exact commit hash and (optionally) commit meta, ahead/behind counts, etc.

Re: assignReference - this end point requires the {ref} parameter (e.g. main@12345678) to specify the exact current HEAD commit hash. That is to ensure predictable behaviour when two or more clients race to alter the reference. I'd say we should accept relative hash spec at the syntax level (validation RegEx) for the sake of uniformity of URL mappings, but produce an explicit 400 error if a relative hashes is used in practice (because a relative hash can never resolve to HEAD).

@dimas-b
Copy link
Member

dimas-b commented Jul 26, 2023

committing to a relative hash should never be allowed

I see the positive intent behind this (preventing mistakes), but do not feel this restriction is technically necessary.

Especially if we allow relative hashes in the api/v2/trees/{ref} endpoint, users will always be able to resolve a relative hash via that end point and use the resultant exact hash in commit/transplant. endpoints. In the end if the user is set upon using relative hashes in commits, it will be possible, albeit with an extra round-trip. So why not permit relative hashes in {ref} path parameters of committing operations for the sake of convenience for well thought-through use cases?

The "expected" hash in committing operations is used as the basis for conflict detection. So, using main@12345678~3 is meaningful because it identifies a specific base commit unambiguously.

As for merges, indeed, the base is detected automatically. The commit hash in the {ref} parameter is used only as a sanity check that the request is indeed meant for the branch that the user is aware of. In that regard I do not see any harm of still accepting relative hashes for the sake of API uniformity.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

My reading of @colan-dremio 's request is that getReferenceByName is specifically asked to support relative hashes.

Indeed that was asked, but I'm not sure I agree here. This endpoint returns information about an existing reference, designated by the path parameter. What would a hash, be it absolute or relative, have to do with that?

a request to GET api/v2/trees/main@12345678~3 makes sense to me. It will provide info about the main contents three commits prior to 12345678, which will include the effective exact commit hash and (optionally) commit meta, ahead/behind counts, etc.

Isn't this a resurrection of the RefLog API? I'm not sure this is something we want to/can support.

I do think, however, that it's confusing that this endpoint gladly accepts hashes (both absolute and relative), but silently ignores them. This could be improved by using a more restrictive validation regex. And as you pointed out, the docs also need some rewording.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

Re: assignReference [...] I'd say we should accept relative hash spec at the syntax level (validation RegEx) for the sake of uniformity of URL mappings, but produce an explicit 400 error if a relative hashes is used in practice (because a relative hash can never resolve to HEAD).

That's pretty much what's being done already. If we do a PUT api/v2/trees/main@12345678~2, the regex validation passes, but the service layer rejects it with HTTP 400:

{
	"status": 400,
	"reason": "Bad Request",
	"message": "Illegal hex character '~'",
	"errorCode": "BAD_REQUEST",
	"serverStackTrace": "java.lang.IllegalArgumentException: Illegal hex character '~'\n\tat org.projectnessie.versioned.Hash.nibble(Hash.java:112)\n\tat org.projectnessie.versioned.Hash$GenericHash.<init>(Hash.java:276)\n\tat org.projectnessie.versioned.Hash$GenericHash.<init>(Hash.java:269)\n\tat org.projectnessie.versioned.Hash.of(Hash.java:76)\n\tat org.projectnessie.services.impl.TreeApiImpl.toHash(TreeApiImpl.java:1051)\n\tat 
...

I'm not sure, however, if "the sake of uniformity of URL mappings" buys us any good: I'd rather change the regex to reject anything that the service layer is going to reject anyways. Wdyt?

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

I see the positive intent behind this (preventing mistakes), but do not feel this restriction is technically necessary.

Are you suggesting that we should revert #7193? I'm not against it, but that considerably widens the scope: not only we need to revert that PR, but also amend both Merge and Transplant objects since neither accepts relative hashes currently.

@dimas-b
Copy link
Member

dimas-b commented Jul 26, 2023

I'm not sure, however, if "the sake of uniformity of URL mappings" buys us any good: I'd rather change the regex to reject anything that the service layer is going to reject anyways. Wdyt?

We have two levels of RegEx's here:

  1. The URL mapping RegEx - this controls which java method receives the HTTP request. If this RegEx does not match the path parameters, the user usually gets a very confusing 404 error.
  2. The Validation RegEx in java interface parameters.

I think (1) should be permissive to allow HTTP requests to be routed to the right method even if parameters are invalid.

We can certainly use (2) to provide specific validation in each of the endpoints.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

the user usually gets a very confusing 404 error.

Good point, I take back my proposal to restrict URL mapping regexes. But that makes it even more pressing for getReferenceByName to reject requests that contain a hash (absolute or relative), instead of silently ignoring them.

@dimas-b
Copy link
Member

dimas-b commented Jul 26, 2023

Are you suggesting that we should revert #7193? I'm not against it, but that considerably widens the scope: not only we need to revert that PR, but also amend both Merge and Transplant objects since neither accepts relative hashes currently.

Not exactly. I mean only the {ref} path parameter, which defines the "expected" hash. Merge and Trasplant objects define the new data to be committed to the branch identified by {ref}. I think they do not need to change.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

a relative hash can never resolve to HEAD

There is the case where the relative hash is a timestamp in the future. That would resolve to HEAD. Probably needs some special handling.

@dimas-b
Copy link
Member

dimas-b commented Jul 26, 2023

Yes, timestamps are quite special.

I think we should certainly reject all future timestamps and perhaps have a time window in the recent past where timestamps are rejected too. This would be similar to QuarkusVersionStoreAdvancedConfig.getAssumedWallClockDriftMicros().

I believe in most relevant use cases the timestamp would be provided by the end user and would not be a very recent timestamp.

@dimas-b
Copy link
Member

dimas-b commented Jul 26, 2023

To summarize and hopefully clarify my stance:

  • I think we should accept the same structure of {ref} parameters in all endpoint URLs.
  • I do not feel very strongly about actually allowing relative hashes in commits, while it makes sense to me personally
  • I agree with @adutra that we should error our whenever a relative hash is accepted but not used. This can be via validation RegEx or by explicit checks in java code.

As for GET api/v2/trees/{ref}. I'm fine with reporting errors if {ref} has any hash. Perhaps should should suggest in API docs that users should use the getCommitLog endpoint for resolving relative hashes.

@dimas-b
Copy link
Member

dimas-b commented Jul 26, 2023

As a matter of consistency, I think if we accept relative hashes in Reference.hash we'll have to accept relative hashes in GET api/v2/trees/{ref} because both essentially refer to the same logical entity.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

I think we should accept the same structure of {ref} parameters in all endpoint URLs.

Yes I can live with that, and I do see the benefit of having the same treatment applied to all path params that denote a reference with or without a hash.

That would boil down to also changing TreeApiImpl.toHash() since that's the method the service layer uses to validate most of the path params. It currently does not accept relative hashes.

Maybe this method can be replaced completely by BaseApiImpl#namedRefWithHashOrThrow() which does accept relative hashes.

In any case the expected changes do not look too complex to me.

@adutra
Copy link
Contributor

adutra commented Jul 26, 2023

Tentative scope of changes, revisited with suggestions from @dimas-b. Let's wait for @colan-dremio's input.

EDIT 2023-07-27: added @snazy's suggestions.

Endpoint Parameter Relative hash support Action items
getAllReferences Query param filter No
getReferenceByName Path param ref No ‼️ Reject any hash (absolute and relative), instead of silently ignoring them
createReference Request body Reference.hash No ‼️ Accept non-ambiguous relative hashes
assignReference Path param ref No ‼️ Accept non-ambiguous relative hashes
Request body Reference.hash No ‼️ Accept non-ambiguous relative hashes
deleteReference Path param ref No ‼️ Accept non-ambiguous relative hashes
commitMultipleOperations Path param branch No ‼️ Accept non-ambiguous relative hashes
mergeRefIntoBranch Path param branch No ‼️ Accept non-ambiguous relative hashes
Request body Merge.fromHash No ‼️ Accept non-ambiguous relative hashes
transplantCommitsIntoBranch Path param branch No ‼️ Accept non-ambiguous relative hashes
Request body param Transplant.hashesToTransplant[] No ‼️ Accept non-ambiguous relative hashes
getSeveralContents Path param ref Yes
getMultipleContents Path param ref Yes
getContent Path param ref Yes
getEntries Path param ref Yes
getCommitLog Path param ref Yes
getDiff Path param from-ref Yes
Path param to-ref Yes

@snazy
Copy link
Member

snazy commented Jul 26, 2023

Not generally opposed to support relative hash-specs per-se, but (possibly) ambiguous ones should be prevented for operations that write (create/assign/delete refs, commiting ops). With ambiguous I mean specs that reference HEAD (or derived ones like HEAD~3, because a concurrent commit would change the expected result, or that use a timestamp, even with some "allowed drift", because we do not know when a caller has actually looked into the branch - it can be hours ago.

@snazy
Copy link
Member

snazy commented Jul 26, 2023

Amending that - non ambiguous specs are for example:

  • 12345678~2
  • 12345678^1
  • 12345678

@adutra
Copy link
Contributor

adutra commented Jul 27, 2023

ambiguous ones should be prevented for operations that write

Good idea, I will amend the table above for create/assign/delete refs, and commiting ops.

But that also brings the question: why not support non-ambiguous relative hashes in Merge.fromHash and Transplant.hashesToTransplant?

E.g. one could be in the below situation and want to merge c4 from branch etl into main:

---
title: Merging from non-HEAD hash
---
gitGraph
   commit id: "c1"
   commit id: "c2"
   branch etl
   checkout main
   commit id: "c3"
   checkout etl
   commit id: "c4"
   checkout main
   merge etl
   checkout etl
   commit id: "c5"
Loading

This could be achieved with the below request:

POST http://localhost:19120/api/v2/trees/main@c3/history/merge

{
  "fromHash": "c4",
  "fromRefName": "etl",
  ...
}

But it could also be achieved with the non-ambiguous relative hash c5~1 in request body field fromHash as shown below:

POST http://localhost:19120/api/v2/trees/main@c3/history/merge

{
  "fromHash": "c5~1",
  "fromRefName": "etl",
  ...
}

@dimas-b
Copy link
Member

dimas-b commented Jul 27, 2023

"fromHash": "c5~1",
"fromRefName": "etl",

This LGTM 👍

@dimas-b
Copy link
Member

dimas-b commented Jul 27, 2023

Re: hash lookup by timestamp: I think those can be disambiguated if a base commit is specified, e.g. 12345678*2021-04-07T14:42:25.534748Z.

Since the base commit hash sets a specific point in history, the timestamp can be resolved strictly from that point on, and that resolution does not depend on local clocks either in the client or in the server.

I agree that HEAD-based relative hashes are probably too ambiguous to support for write operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants