Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should a blank / empty copyrightText be interpreted? #655

Closed
sschuberth opened this issue Jun 30, 2023 · 20 comments · Fixed by #688
Closed

How should a blank / empty copyrightText be interpreted? #655

sschuberth opened this issue Jun 30, 2023 · 20 comments · Fixed by #688
Labels
documentation Improvements or additions to documentation Profile:Licensing Licensing Profile and related matters Profile:Security Security Profile and related matters
Milestone

Comments

@sschuberth
Copy link
Member

In the context of ORT parsing SPDX files the question arose how a given "copyrightText": "" for files should be interpreted. None of the specs seem to be crystal clear on this. IMO all of these options have some merit:

  • interpret a blank / empty copyrightText as NONE
  • interpret a blank / empty copyrightText as NOASSERTION
  • interpret a blank / empty copyrightText as a liternal copyright declaration that is empty (even if that does not really add any value)

Currently, ORT simply refuses to parse SPDX files with such copyrightText to avoid ambiguity.

Any advice how to handle this case?

PS: The same question basically arises for a set (i.e. present) licenseConcluded that is set to an empty / blank string.

@goneall
Copy link
Member

goneall commented Jul 2, 2023

@sschuberth Good question. In the SPDX Java Library, I'm treating it as a valid blank string. For the license fields, it would be invalid since a blank string would not parse into the license model properly.

@sschuberth
Copy link
Member Author

In the SPDX Java Library, I'm treating it as a valid blank string.

The online validation tool (which is based on the Python implementation IIRC) seems to behave differently. Would it make sense to align the implementations?

@goneall
Copy link
Member

goneall commented Jul 3, 2023

The online validation tool (which is based on the Python implementation IIRC) oss-review-toolkit/ort#7222 (comment). Would it make sense to align the implementations?

I just checked and they do behave the same - the difference was due to a different version of the SPDX spec - version 2.2 required they copyrightText.

@sschuberth
Copy link
Member Author

Quoting from here as I believe it's easier to stick to this issue:

If you try an empty copyright in the 2.3 examples, it should return valid.

@goneall, so you're saying an empty copyright actually is valid in version 2.3 of the spec? Because I though that the version 2.3 specs just do not demand copyrightText to be a mandatory field anymore, but it must not be set to an empty string (to avoid the originally mentioned ambiguity).

@goneall
Copy link
Member

goneall commented Jul 3, 2023

@goneall, so you're saying an empty copyright actually is valid in version 2.3 of the spec?

Not necessarily - I'm only commenting on the tool consistency. When the change was made in the validator to not require copyright text, it also allowed blank copyright text. I'm undecided if this is correct or not - I agree with your original issue that it is ambiguous and should be clarified. If the clarification is different than the tools implementation, we should open an issue for the tools to correct the verify behavior.

@MP91
Copy link

MP91 commented Feb 16, 2024

Hey @sschuberth, @goneall,
can we take any decision about that one?

@sschuberth
Copy link
Member Author

I'm probably not in the position to make any decision here, but I'm awaiting the clarification from the SPDX working group / specification side.

@goneall
Copy link
Member

goneall commented Feb 16, 2024

We can put this on the agenda for an upcoming tech call once SPDX 3.0 RC2 is released

@goneall
Copy link
Member

goneall commented Feb 16, 2024

I just checked the spec, and I think it is clearly defined for the 2.3 version:

If the Copyright Text field is not present for a file, it implies an equivalent meaning to NOASSERTION. The metadata for the copyright text field is shown in Table 43.

Reference section 8.8.

For versions prior to 2.3, the copyright field is required.

Version 2.2.2 section 8.8 doesn't explicitly cover the case of a blank string, but the fact it is required and there is a NONE and NOASSERTION defined, I believe it can be concluded that a blank string would be invalid - the same as not having the field at all.

@sschuberth @MP91 - Let me know if you think further clarification is needed and which version of the spec needs further clarification. Note that we probably will not change the documentation for versions prior to 2.3 since they are already release, but we can provide some guidance through these comments.

@sschuberth
Copy link
Member Author

sschuberth commented Feb 18, 2024

I actually believe that both spec versions still need clarification, as too many implicit assumption are being made to be crystal clear.

For version 2.3, I find the wording "If the Copyright Text field is not present for a file" in itself already to be confusing, as it's unclear what "file" refers to. Originally, I though it would refer to the contents of the file being documented in SPDX, but actually I now believe it refers to the SPDX file entry, as otherwise the "field is not present" part would not make much sense.

But even if this said "If the Copyright Text field is not present for an SPDX File Information entry", it would still be unclear what a present but blank field like copyrightText: "" or copyrightText: " " would mean.

but the fact it is required and there is a NONE and NOASSERTION defined, I believe it can be concluded that a blank string would be invalid

I don't follow that conclusion. Having a field present but blank is not the same as not having the field (which is the only case that would violate the mandatory requirement, IMO).

Maybe the root cause of this confusion is that the Markdown spec primarily has tag-value and XML examples, whereas I'm exclusively looking at JSON and YAML examples, where it is more common that string fields may have quoted values (which implies that having present but blank strings is possible).

So long story short, please advise whether copyrightText: "" and / or copyrightText: " " should map to NONE and / or NOASSERTION under the 2.2 and 2.3 specs respectively, i.e. complete the following table

copyrightText field Spec 2.2 meaning Spec 2.3 meaning
n/a invalid NOASSERTION
"" ? (Gary: invalid) ?
" " ? (Gary: invalid) ?

@goneall
Copy link
Member

goneall commented Feb 18, 2024

@sschuberth - I see your point - I may be reading too much into what "is not present for a file" in the 2.3 spec.

I would interpret an empty string as not being present. The spaces is less clear, but I would propose we also treat any string with is "trimmed" to an empty string as not being present.

Here's my proposed table:

copyrightText field Spec 2.2 meaning Spec 2.3 meaning
n/a invalid NOASSERTION
"" invalid NOASSERTION
" " invalid NOASSERTION

I'll bring this up on the tech call on Tuesday and see if there is a consensus on the call.

@sschuberth
Copy link
Member Author

sschuberth commented Feb 18, 2024

I'll bring this up on the tech call on Tuesday and see if there is a consensus on the call.

Thanks. The important clarification indeed would be if the term "not present" also refers to a defined but blank string. So far, I was interpreting this in the computer science way which distinguishes String == null from String == "".

@MP91
Copy link

MP91 commented Feb 19, 2024

Here's my proposed table:
copyrightText field Spec 2.2 meaning Spec 2.3 meaning
n/a invalid NOASSERTION
"" invalid NOASSERTION
" " invalid NOASSERTION

I'm not that familiar with all this specification things, but IMHO the definitions shouldn't be different in these 2 versions since 2.3 is just a MINOR update. My interpretation is that it just improves/clarifies minor issues from the last version. That's why I assumed, what is specified for 2.3 should also apply for 2.2, if it is unclear there.

@sschuberth
Copy link
Member Author

I'm not that familiar with all this specification things, but IMHO the definitions shouldn't be different in these 2 versions since 2.3 is just a MINOR update.

The problem is, if you think about semantic versioning, then you can't fix the issue in version 2.2 without at least bumping the minor version, ending up at version 2.3 again. And actually, fixing invalid to NOASSERTION could even be considered a breaking change, requiring a major version update.

I agree it's an unfortunate situation, but there are a ton of more unclarities like this in version 2.2. So I'm just looking forward to get things addressed in version 2.3. However, the bad thing is that I've seen many parties refuse to implement version 2.3, as version 2.2 is the ISO standard (also see this discussion), and they want to stay ISO conform... which of course is a fallacy, because how can you be conform to something that's not clearly defined? 😞

@MP91
Copy link

MP91 commented Feb 19, 2024

The problem is, if you think about semantic versioning, then you can't fix the issue in version 2.2 without at least bumping the minor version, ending up at version 2.3 again. And actually, fixing invalid to NOASSERTION could even be considered a breaking change, requiring a major version update.

That's exactly what I tried to explain. Since this clarification was just a minor update, it should be backwards compatible. In other words what is set for 2.3 should also apply for 2.2.

@zvr
Copy link
Member

zvr commented Feb 19, 2024

I completely agree with @goneall's interpretation as written in #655.

If there is no copyright text, it should be considered a NOASSERTION on v2.3 (and was not allowed in v2.2).
Having empty strings, spaces, etc. is not having copyright text, as these do not " Identify the copyright holder of the file, as well as any dates present".

v2.3 was an update on v2.2: everything that is valid in v2.2 is also valid in v2.3. The reverse obviously does not hold.

@goneall
Copy link
Member

goneall commented Feb 20, 2024

Discussed on tech call on 20 Feb 2024 and there was consensus that we treat any string with is "trimmed" to an empty string as not being present.

We typically give 10 days for others to comment, but I think it is pretty safe to go with this interpretation.

@goneall
Copy link
Member

goneall commented Mar 3, 2024

Since we have not received any additional comments, we will conclude on the above interpretation.

A pull request to clarify the wording for the 3.0 release would be much appreciated: https://github.com/spdx/spdx-3-model/blob/main/model/Software/Properties/copyrightText.md

@goneall
Copy link
Member

goneall commented Mar 3, 2024

Transferring this issue over to the SPDX model repository since that is where the copyright text is recorded for the 3.0 release.

@goneall goneall transferred this issue from spdx/spdx-spec Mar 3, 2024
@goneall goneall added the Profile:Security Security Profile and related matters label Mar 3, 2024
@goneall goneall added this to the 3.0-rc3 milestone Mar 3, 2024
@goneall goneall added the documentation Improvements or additions to documentation label Mar 3, 2024
@swinslow
Copy link
Member

swinslow commented Apr 2, 2024

Apologies for being slow to weigh in, I hadn't seen this one since it was tagged as Security.

I think the consensus above was to treat an empty (or "trims-to-empty") copyright text string as being equivalent to NOASSERTION? If so, yes, I'm also +1 to that outcome.

I'll put together a quick draft PR to make this explicit.

@swinslow swinslow added the Profile:Licensing Licensing Profile and related matters label Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Profile:Licensing Licensing Profile and related matters Profile:Security Security Profile and related matters
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants