Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid relative hashes in commits, merges and transplants #7193

Merged
merged 7 commits into from
Jul 7, 2023

Conversation

adutra
Copy link
Contributor

@adutra adutra commented Jul 6, 2023

Fixes #7119

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM - one comment about quotes passing is-hash tests

@Path("{branch}/history/commit")
@jakarta.ws.rs.Path("{branch}/history/commit")
@Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit")
@jakarta.ws.rs.Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to do similar changes for merge and transplant endpoints?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For merge & transplant a relative hash doesn't really make any sense,

The old data model was relatively strict wrt expectedHash - unnecessarily actually.

It feels, that the merge-operation, as it exists today using a squashed commit based on the diff between the merge-base and the merge-source, does not need any expected-hash. The "expected state" and the "applied change" are implicit (present in the Nessie repo), not explicit (only the new value provided by the caller). Older implementations, that supported "merges w/ individual commits", the expected-hash was important, because it was effectively a transplant.

The implementation for the new data model doesn't use the expected-hash for merge nor for transplant. For the latter (transplant) that seems wrong though. @adutra do you have bandwidth to check whether the expected-hash is required for transplant (I think it is, but I might be wrong).

In any case, every committing operation that takes an expected-hash, should not allow relative commit IDs. Good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, in the new model:

  • For all committing operations (commit, merge, transplant): if an expected-hash is provided, an expected CommitObj is loaded, if the object does not exist, an exception is thrown:

if (referenceHash.isPresent()) {
ObjId referenceObjId = hashToObjId(referenceHash.get());
if (!referenceObjId.equals(headId())) {
RefMapping refMapping = new RefMapping(persist);
e = refMapping.commitInChain(branch, head, referenceHash, emptyList());
}
}

  • Other than that, the expected-hash itself is only used for reporting purposes (when creating error messages, and MergeResult instances).

  • For commits only: the expected CommitObj is further used to load the so-called expected-index:

this.expectedIndex =
expected == head
? headIndex
: lazyStoreIndex(
() -> {
IndexesLogic indexesLogic = indexesLogic(persist);
return indexesLogic.buildCompleteIndexOrEmpty(expected);
});

This index is then used to compute key conflicts.

  • For merges and transplants: I do not see any further usage of expected-hash or the corresponding expected CommitObj.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation for the new data model doesn't use the expected-hash for merge nor for transplant. For the latter (transplant) that seems wrong though.

Why is it not wrong for merges? If you are merging into some branch, expecting its head to be a certain hash, shouldn't the system raise an error if the head of the branch has moved away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "not wrong" I meant that a diff-based squash-merge does not need it to work correctly.
Users may want to only apply a squash, if the HEAD is exactly at that hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we should investigate the current and "right" behavior in a separate issue. Mind opening one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IcebergTable table1 = IcebergTable.of("loc1", 1, 2, 3, 4);
NessieError error =
prepareCommitV2(ref, key1, table1, 2, true).statusCode(400).extract().as(NessieError.class);
assertThat(error.getMessage()).startsWith("Relative hash not allowed here");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from curl it looks like this:

$ curl -X 'POST' "http://localhost:19120/api/v2/trees/main@1f7ec3eb98eef851c5e4348144ec13ce658673fabd155aefbd862be5cf07f88e~1/history/commit" [...]

{
  "status" : 400,
  "reason" : "Bad Request",
  "message" : "Relative hash not allowed here: 1f7ec3eb98eef851c5e4348144ec13ce658673fabd155aefbd862be5cf07f88e~1",
  "errorCode" : "BAD_REQUEST",
  "serverStackTrace" : null
}

This is much better than before, but still "here" is a bit obscure, IMHO... Could we say something like Relative hash not allowed in commit operations: 1f7ec3eb98eef851c5e4348144ec13ce658673fabd155aefbd862be5cf07f88e~1?

snazy
snazy previously approved these changes Jul 7, 2023
@adutra
Copy link
Contributor Author

adutra commented Jul 7, 2023

In any case, every committing operation that takes an expected-hash, should not allow relative commit IDs

@snazy you already approved, but should I go ahead and forbid relative hashes in merges and transplants as well?

@snazy
Copy link
Member

snazy commented Jul 7, 2023

should I go ahead and forbid relative hashes in merges and transplants as well?

Hm, yea, if that can go into this PR, that would be great.

@@ -42,38 +42,18 @@ public interface ApiDoc {
+ "at different times depending on the state of the system. Using the full 'name@hash' form is recommended "
+ "to avoid ambiguity.\n";

String BRANCH_DESCRIPTION =
String COMMIT_BRANCH_DESCRIPTION =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed these constants for clarity.

+ "This reference can be specified in these forms:\n"
+ "- \\- (literal minus character) - Identifies the HEAD of the default branch \n"
+ "- name - Identifies the HEAD commit on the named branch\n"
+ "This reference is specified in this form:\n"
Copy link
Contributor Author

@adutra adutra Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I opened a can of worms :-) In fact, even if the REST API accepts references without hashes, the Service layer does not. IOW the only form of reference that works is the full name@hash form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the client API allows the commit operation to be sent without hash:

api()
    .commitMultipleOperations()
    .branch(Branch.of("main", null)) // null hash allowed here
    .operation(Put.of(a, ta))
    .commitMeta(CommitMeta.fromMessage("commit 1"))
    .commit();

This results in the following error:

Bad Request (HTTP/400): commitMultipleOperations.expectedHash: must not be null
org.projectnessie.error.NessieBadRequestException: Bad Request (HTTP/400): commitMultipleOperations.expectedHash: must not be null
	at org.projectnessie.error.ErrorCode.lambda$asException$1(ErrorCode.java:66)
	at [email protected]/java.util.Optional.map(Optional.java:265)
	at org.projectnessie.error.ErrorCode.asException(ErrorCode.java:66)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC older versions allowed to commit without an expected hash. Probably a relict from that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can of worms... perhaps :) I believe - (default branch) should also be supported... the original idea was to make it easier for hand-written curl commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, you can use the dash... if you also provide the hash, e.g. -@cafebabe :-) Which, I suppose, defeats the initial purpose of the dash. But unfortunately any references without explicit hashes are currently refused (and were already before this PR).

Copy link
Contributor Author

@adutra adutra Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, here is the status of expectedHash parameter across different layers and for different operations.

Operation REST layer (TreeResource) Service layer (TreeService) Version Store layer
Commit Not required (path param), defaults to HEAD Required Not required, defaults to HEAD
Merge Required (request body) Required Not required, defaults to HEAD
Transplant Required (request body) Required Not required, defaults to HEAD

+ "\n"
+ FULL_REF_INFO;
+ "The 'hash' commit must be reachable from the current HEAD of the branch.\n"
+ "In this case 'hash' indicates the state of contents that should be used for validating incoming changes.\n";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the original sentence: In this case 'hash' indicates the state of contents that should be used for validating incoming changes – since this is indeed roughly what happens in CommitImpl.

+ "- name@hash^2 - The merge parent of the 'hash' commit of a branch or tag.\n"
+ "- -*2021-04-07T14:42:25.534748Z - The predecessor commit closest to the HEAD of the default branch for the given ISO-8601 timestamp.\n"
+ "- name*2021-04-07T14:42:25.534748Z - The predecessor commit closest to the HEAD of a branch or tag valid for the given ISO-8601 timestamp.\n"
+ "- name*1685185847230 - The predecessor commit closest to the HEAD of a branch or tag valid for the given timestamp in milliseconds since epoch.\n"
+ "\n"
+ "The 'hash' commit must be reachable from the current HEAD of the branch.\n"
+ "In this case 'hash' indicates the state of contents known to the client and serves to ensure that the "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also kept the original sentence here but to be fair, the new model does not appear to "ensure that the operation is performed on the contents that the client expects" – if the hashes differ, as long as the expected hash exists and is reachable, the operation will be carried out.

@@ -575,6 +576,7 @@ public MergeResponse transplantCommitsIntoBranch(
try {
checkArgument(!hashesToTransplant.isEmpty(), "No hashes given to transplant.");
validateCommitMeta(commitMeta);
validateNoRelativeSpec(expectedHash);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we get here, hashes were already validated in ImmutableMerge and ImmutableTransplant objects (see their check methods) – but I figured it doesn't hurt to re-validate.

@adutra adutra changed the title Forbid relative hashes in commit operations Forbid relative hashes in commits, merges and transplants Jul 7, 2023
@@ -53,6 +53,7 @@ dependencies {
testImplementation(project(":nessie-versioned-storage-jdbc"))

testImplementation(project(":nessie-jaxrs-testextension"))
testImplementation(libs.hibernate.validator.cdi)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required, otherwise none of the @NotNull annotations and alike are actually enforced in this module.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now - waiting for @dimas-b now ;)

@Path("{branch}/history/commit")
@jakarta.ws.rs.Path("{branch}/history/commit")
@Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit")
@jakarta.ws.rs.Path("{branch:" + REF_NAME_PATH_ELEMENT_REGEX + "}/history/commit")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we should investigate the current and "right" behavior in a separate issue. Mind opening one?

Copy link
Member

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

+ "This reference can be specified in these forms:\n"
+ "- \\- (literal minus character) - Identifies the HEAD of the default branch \n"
+ "- name - Identifies the HEAD commit on the named branch\n"
+ "This reference is specified in this form:\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can of worms... perhaps :) I believe - (default branch) should also be supported... the original idea was to make it easier for hand-written curl commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent parsing of relative hash references in different API v2 endpoints
3 participants