-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forbid relative hashes in commits, merges and transplants #7193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM - one comment about quotes passing is-hash tests
@Path("{branch}/history/commit") | ||
@jakarta.ws.rs.Path("{branch}/history/commit") | ||
@Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit") | ||
@jakarta.ws.rs.Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to do similar changes for merge and transplant endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For merge & transplant a relative hash doesn't really make any sense,
The old data model was relatively strict wrt expectedHash
- unnecessarily actually.
It feels, that the merge-operation, as it exists today using a squashed commit based on the diff between the merge-base and the merge-source, does not need any expected-hash. The "expected state" and the "applied change" are implicit (present in the Nessie repo), not explicit (only the new value provided by the caller). Older implementations, that supported "merges w/ individual commits", the expected-hash was important, because it was effectively a transplant.
The implementation for the new data model doesn't use the expected-hash for merge nor for transplant. For the latter (transplant) that seems wrong though. @adutra do you have bandwidth to check whether the expected-hash is required for transplant (I think it is, but I might be wrong).
In any case, every committing operation that takes an expected-hash, should not allow relative commit IDs. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see, in the new model:
- For all committing operations (commit, merge, transplant): if an expected-hash is provided, an expected
CommitObj
is loaded, if the object does not exist, an exception is thrown:
Lines 136 to 142 in 795ba1d
if (referenceHash.isPresent()) { | |
ObjId referenceObjId = hashToObjId(referenceHash.get()); | |
if (!referenceObjId.equals(headId())) { | |
RefMapping refMapping = new RefMapping(persist); | |
e = refMapping.commitInChain(branch, head, referenceHash, emptyList()); | |
} | |
} |
-
Other than that, the expected-hash itself is only used for reporting purposes (when creating error messages, and
MergeResult
instances). -
For commits only: the expected
CommitObj
is further used to load the so-called expected-index:
Lines 117 to 124 in c6388c3
this.expectedIndex = | |
expected == head | |
? headIndex | |
: lazyStoreIndex( | |
() -> { | |
IndexesLogic indexesLogic = indexesLogic(persist); | |
return indexesLogic.buildCompleteIndexOrEmpty(expected); | |
}); |
This index is then used to compute key conflicts.
- For merges and transplants: I do not see any further usage of expected-hash or the corresponding expected
CommitObj
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation for the new data model doesn't use the expected-hash for merge nor for transplant. For the latter (transplant) that seems wrong though.
Why is it not wrong for merges? If you are merging into some branch, expecting its head to be a certain hash, shouldn't the system raise an error if the head of the branch has moved away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "not wrong" I meant that a diff-based squash-merge does not need it to work correctly.
Users may want to only apply a squash, if the HEAD is exactly at that hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should investigate the current and "right" behavior in a separate issue. Mind opening one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IcebergTable table1 = IcebergTable.of("loc1", 1, 2, 3, 4); | ||
NessieError error = | ||
prepareCommitV2(ref, key1, table1, 2, true).statusCode(400).extract().as(NessieError.class); | ||
assertThat(error.getMessage()).startsWith("Relative hash not allowed here"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from curl
it looks like this:
$ curl -X 'POST' "http://localhost:19120/api/v2/trees/main@1f7ec3eb98eef851c5e4348144ec13ce658673fabd155aefbd862be5cf07f88e~1/history/commit" [...]
{
"status" : 400,
"reason" : "Bad Request",
"message" : "Relative hash not allowed here: 1f7ec3eb98eef851c5e4348144ec13ce658673fabd155aefbd862be5cf07f88e~1",
"errorCode" : "BAD_REQUEST",
"serverStackTrace" : null
}
This is much better than before, but still "here" is a bit obscure, IMHO... Could we say something like Relative hash not allowed in commit operations: 1f7ec3eb98eef851c5e4348144ec13ce658673fabd155aefbd862be5cf07f88e~1
?
@snazy you already approved, but should I go ahead and forbid relative hashes in merges and transplants as well? |
Hm, yea, if that can go into this PR, that would be great. |
@@ -42,38 +42,18 @@ public interface ApiDoc { | |||
+ "at different times depending on the state of the system. Using the full 'name@hash' form is recommended " | |||
+ "to avoid ambiguity.\n"; | |||
|
|||
String BRANCH_DESCRIPTION = | |||
String COMMIT_BRANCH_DESCRIPTION = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed these constants for clarity.
+ "This reference can be specified in these forms:\n" | ||
+ "- \\- (literal minus character) - Identifies the HEAD of the default branch \n" | ||
+ "- name - Identifies the HEAD commit on the named branch\n" | ||
+ "This reference is specified in this form:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I opened a can of worms :-) In fact, even if the REST API accepts references without hashes, the Service layer does not. IOW the only form of reference that works is the full name@hash
form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the client API allows the commit operation to be sent without hash:
api()
.commitMultipleOperations()
.branch(Branch.of("main", null)) // null hash allowed here
.operation(Put.of(a, ta))
.commitMeta(CommitMeta.fromMessage("commit 1"))
.commit();
This results in the following error:
Bad Request (HTTP/400): commitMultipleOperations.expectedHash: must not be null
org.projectnessie.error.NessieBadRequestException: Bad Request (HTTP/400): commitMultipleOperations.expectedHash: must not be null
at org.projectnessie.error.ErrorCode.lambda$asException$1(ErrorCode.java:66)
at [email protected]/java.util.Optional.map(Optional.java:265)
at org.projectnessie.error.ErrorCode.asException(ErrorCode.java:66)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC older versions allowed to commit without an expected hash. Probably a relict from that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can of worms... perhaps :) I believe -
(default branch) should also be supported... the original idea was to make it easier for hand-written curl
commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you can use the dash... if you also provide the hash, e.g. -@cafebabe
:-) Which, I suppose, defeats the initial purpose of the dash. But unfortunately any references without explicit hashes are currently refused (and were already before this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, here is the status of expectedHash
parameter across different layers and for different operations.
Operation | REST layer (TreeResource) | Service layer (TreeService) | Version Store layer |
---|---|---|---|
Commit | Not required (path param), defaults to HEAD | Required | Not required, defaults to HEAD |
Merge | Required (request body) | Required | Not required, defaults to HEAD |
Transplant | Required (request body) | Required | Not required, defaults to HEAD |
+ "\n" | ||
+ FULL_REF_INFO; | ||
+ "The 'hash' commit must be reachable from the current HEAD of the branch.\n" | ||
+ "In this case 'hash' indicates the state of contents that should be used for validating incoming changes.\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the original sentence: In this case 'hash' indicates the state of contents that should be used for validating incoming changes
– since this is indeed roughly what happens in CommitImpl
.
+ "- name@hash^2 - The merge parent of the 'hash' commit of a branch or tag.\n" | ||
+ "- -*2021-04-07T14:42:25.534748Z - The predecessor commit closest to the HEAD of the default branch for the given ISO-8601 timestamp.\n" | ||
+ "- name*2021-04-07T14:42:25.534748Z - The predecessor commit closest to the HEAD of a branch or tag valid for the given ISO-8601 timestamp.\n" | ||
+ "- name*1685185847230 - The predecessor commit closest to the HEAD of a branch or tag valid for the given timestamp in milliseconds since epoch.\n" | ||
+ "\n" | ||
+ "The 'hash' commit must be reachable from the current HEAD of the branch.\n" | ||
+ "In this case 'hash' indicates the state of contents known to the client and serves to ensure that the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also kept the original sentence here but to be fair, the new model does not appear to "ensure that the operation is performed on the contents that the client expects" – if the hashes differ, as long as the expected hash exists and is reachable, the operation will be carried out.
@@ -575,6 +576,7 @@ public MergeResponse transplantCommitsIntoBranch( | |||
try { | |||
checkArgument(!hashesToTransplant.isEmpty(), "No hashes given to transplant."); | |||
validateCommitMeta(commitMeta); | |||
validateNoRelativeSpec(expectedHash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we get here, hashes were already validated in ImmutableMerge
and ImmutableTransplant
objects (see their check methods) – but I figured it doesn't hurt to re-validate.
@@ -53,6 +53,7 @@ dependencies { | |||
testImplementation(project(":nessie-versioned-storage-jdbc")) | |||
|
|||
testImplementation(project(":nessie-jaxrs-testextension")) | |||
testImplementation(libs.hibernate.validator.cdi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required, otherwise none of the @NotNull
annotations and alike are actually enforced in this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now - waiting for @dimas-b now ;)
@Path("{branch}/history/commit") | ||
@jakarta.ws.rs.Path("{branch}/history/commit") | ||
@Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit") | ||
@jakarta.ws.rs.Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should investigate the current and "right" behavior in a separate issue. Mind opening one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
+ "This reference can be specified in these forms:\n" | ||
+ "- \\- (literal minus character) - Identifies the HEAD of the default branch \n" | ||
+ "- name - Identifies the HEAD commit on the named branch\n" | ||
+ "This reference is specified in this form:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can of worms... perhaps :) I believe -
(default branch) should also be supported... the original idea was to make it easier for hand-written curl
commands.
Fixes #7119