Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Using pattern with "copyright" always yields a (?i).*.* regex #11218

Closed
3 tasks done
antgamdia opened this issue Aug 14, 2023 · 3 comments · Fixed by apache/skywalking-eyes#166
Closed
3 tasks done
Assignees
Labels
bug Something isn't working and you are sure it's a bug! license eye

Comments

@antgamdia
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Apache SkyWalking Component

License Tools (apache/skywalking-eyes)

What happened

TL;DR: if the pattern includes "Copyright", then the whole regex will be replaced by "" by the OneLineNormalizer due to a wrong (IMHO) text replacement.

My copyright header is supposed to be:

// Copyright 2022-2023 the Kubeapps contributors.
// SPDX-License-Identifier: Apache-2.0

In the .licenserc.yaml file, I have set pattern: Copyright: (?:\d{4}-\d{4}|\d{4}) the Kubeapps contributors\.'.
Then I run license-eye -c .\.licenserc.yaml header check -v debug and it returns OK, that's OK.

However, when I play around with my header and change it to Copyright whatever... the "header check" result is still OK, when it clearly shouldn't be.

What you expected to happen

I'd rather expect the check to fail when passing Copyright whatever and setting pattern: Copyright: (?:\d{4}-\d{4}|\d{4}) the Kubeapps contributors\.'.

How to reproduce

  1. Install the tool with go install github.com/apache/skywalking-eyes/cmd/license-eye@latest
  2. Run license-eye header check in a directory containing the files below:
.licenserc.yaml
header:
  license:
    spdx-id: Apache-2.0
    copyright-owner: the Kubeapps contributors
    pattern: |
      Copyright (?:\d{4}-\d{4}|\d{4}) the Kubeapps contributors.
      SPDX-License-Identifier: Apache-2.0
main.go
// Copyright whatever.
// SPDX-License-Identifier: Apache-2.0

package main

import (
	"fmt"
)

func main() {

	fmt.Println("Hello world")
}

Anything else

Inspecting the code, I guess this is caused by a wrong regex replacement when normalizing the regex. Let me explain:

All the patterns are normalized here (this is where we add (?i).*" + pattern + ".*")

https://github.com/apache/skywalking-eyes/blob/16b9726be37536a05279e061f0da02d205a2af77/pkg/header/config.go#L113

Then, NormalizePattern applies all the normalizers:

https://github.com/apache/skywalking-eyes/blob/3a6d3090d78b7c104cb55ce4cc63a4333d66ecd0/pkg/license/norm.go#L268

One of them is the OneLinerNormalizer, which replaces the string with the corresponding replacement string:

https://github.com/apache/skywalking-eyes/blob/3a6d3090d78b7c104cb55ce4cc63a4333d66ecd0/pkg/license/norm.go#L296-L301

And... here comes the issue: there are two replacements that match til the end of the line, which is erasing the whole regex in the pattern (change added in apache/skywalking-eyes@3a6d309)

https://github.com/apache/skywalking-eyes/blob/3a6d3090d78b7c104cb55ce4cc63a4333d66ecd0/pkg/license/norm.go#L237C8-L247

So the the regex becomes now "" and the yielded normalized regex is therefore (?i).*" + ""+ ".*", that is (?i).*.*... which always would return a match :S

See how this regex is matching the whole line:

image

Note this OneLinerNormalizer not only affects the pattern, but also the files (see how the copyright line disappears)

DEBUG Checking file: main.rs
DEBUG After normalized by github.com/apache/skywalking-eyes/pkg/license.CommentIndicatorNormalizer:
DEBUG  Copyright 2023 the Kubeapps contributors.
 SPDX-License-Identifier: Apache-2.0

....
DEBUG After normalized by github.com/apache/skywalking-eyes/pkg/license.OneLineNormalizer:
DEBUG   SPDX-License-Identifier: Apache-2.0 use clap::Parser; ....

So, if I just replace the pattern: Copyright: foo'. with pattern: foo' (removing the word "copyright"), I'd have yet another problem: the check will always fail since foo would have been erased.

In short, I think the regexes (?m)^\s*([cC©])?\s*Copyright (\([cC©]\))?.+$ and (?m)^\s*Portions Copyright (\([cC©]\))?.+$, should be removed the .+$ part to avoid undesired matches. What do you think? Am I missing something?

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@antgamdia antgamdia added the bug Something isn't working and you are sure it's a bug! label Aug 14, 2023
@wu-sheng wu-sheng added this to the license-eye 0.5.0 milestone Aug 14, 2023
@kezhenxu94
Copy link
Member

Hi @antgamdia , thank you for reporting this 🙇 this turns out to be fixed in apache/skywalking-eyes#142, if you don't mind, please use go install github.com/apache/skywalking-eyes/cmd/license-eye@main to install or use the commit sha in GitHub Actions uses: apache/skywalking-eyes/header@16b9726be37536a05279e061f0da02d205a2af77, I shall initiate 0.5.0 release for it soon, thanks again for reporting and the detailed reproducing steps.

@antgamdia
Copy link
Author

antgamdia commented Aug 16, 2023

Thank you, @kezhenxu94, for your quick response and, especially, for triggering the new release. I can confirm it worked for us.

However, I don't think the issue is fully gone: https://github.com/apache/skywalking-eyes/pull/142/files solely removes one of the two problematic regexes. Citing my initial message:

In short, I think the regexes (?m)^\s*([cC©])?\s*Copyright (\([cC©]\))?.+$ and (?m)^\s*Portions Copyright (\([cC©]\))?.+$, should be removed the .+$ part to avoid undesired matches (...)

In fact, I can confirm the issue still occurs when adding Portions to the copyright line. For example, given this file:

// Portions Copyright 2023 foo.
// SPDX-License-Identifier: Apache-2.0

use clap::Parser;
use log;
use tokio::sync::mpsc;
use tokio_stream::wrappers::ReceiverStream;
use tonic::{transport::Server, Request, Response, Status};

The OneLineNormalizer now converts it to:

DEBUG After normalized by github.com/apache/skywalking-eyes/pkg/license.OneLineNormalizer:
DEBUG   SPDX-License-Identifier: Apache-2.0 use clap::Parser; use log; use tokio::sync::mpsc; use tokio_stream::wrappers::ReceiverStream; 

Note that the whole Portions Copyright 2023 foo. is now gone.

In short, in v0.5.0, if the copyright header contains Portions it will always yield a validation error since the normalizer is always
removing the copyright line :(

Do you want me to send a PR also removing the problematic regex?

https://github.com/xiaoyawei/skywalking-eyes/blob/006a15186f1961abd6b37f886a282df0e4a8e4a0/pkg/license/norm.go#L236-L240

@kezhenxu94
Copy link
Member

Hi, @antgamdia I appreciate if you can file a pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working and you are sure it's a bug! license eye
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants