Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TravisCI to enforce copyright header rules for source files #9452

Closed

Conversation

isle2983
Copy link
Contributor

As the title indicates, this PR adds the enforcement of MIT Licence copyright headers. The idea is that missing headers are caught ASAP and any deviation from the normal The Bitcoin Core developers header needs to be explicitly marked as such in the script.

In addition this PR does a significant iteration on contrib/devtools/copyright_header.py. Like the preceding version, this script is able to parse through the source files and report on the state of the copyright headers as well as update the year.

The first major improvement in this new iteration is that it now has a ruleset specific for each subtree to detect the local proper header style (the subtrees being secp256k1, leveldb, univalue and ctaes).

The second major improvement in this iteration makes more elegant use of Python's regular expressions to detect and manipulate the header text. This approach should be maintainable and extensible while keeping the script's quick execution time.

To provide the TravisCI integration, a ci_check subcommand is implemented. This executes the logic of the script to catch submissions early where a header was not added pre-merge. A non-zero status is returned to the shell if a problem is found. If everything is fine, TravisCI will look like this:

noissuesfound

If, for example, a source file is added with no copyright header it will be errored in TravisCI it like this:

novalidheader

If a file has a proper copyright, but is added with some other copyright text such as "Copyright (c) 2016 Some Other Entity", the error will be:

otherheaderinstance

In total there are six failure conditions as per what is loaded into the FAILURE_REASONS list of the script. They each have a 'resolution' string attached to them to point the reader to the appropriate resolution - either fix the file (because it is a problem) or update the script (because it is a legitimate change involving non-standard copyright that the script must tolerate). The rules and exceptions for each subtree are loaded into the HEADER_RULES list.

There is a commit preceding the update to the script which makes small adjustments to the set of MIT copyright headers to make them uniform in instances where they are not. Also, this commit adds the MIT header to contrib/dev/tools/gen-manpages.sh, and hence requires an ACK from @nomnombtc.

@fanquake fanquake added the Docs label Jan 1, 2017
@TheBlueMatt
Copy link
Contributor

Hmm, do we care about enforcing the listed copyright holders? As long as it appears in the MIT header, I'm not sure it matters all that much.
Also, would it be simple to enforce that all of the copyright date of files modified (according to git timestamp) after date X (ie ignore the issues from #9450) have the right copyright year on them?

@isle2983
Copy link
Contributor Author

isle2983 commented Jan 1, 2017

If we want it to not care about the holder's name, that would be simple to add in the regex. However, there are two side effects I can think of:

  1. It would then have to tolerate bad spelling/punctuation/capitalization.
  2. It would require some more sophisticated logic in the update part of the script to make sure that the last holder is the one that is incremented and not necessarily the one with The Bitcoin Core developers.

It would be simple to enforce that the end year gets bumped upon file modification. There are a couple downsides, so I left it out (for now):

  1. It takes a minute or two to pull the file modification history info from git. In CI that probably isn't much of a problem though. In other codebases, I have seen similar checks as part of make test scripting so a check that is reasonably fast is desirable.
  2. It might take a period of adjustment to train everyone to update the year in sync with changes and this might create demand for "increment every file on January 1st dammit!".

@nomnombtc
Copy link
Contributor

... and hence requires an ACK from @nomnombtc

Sure whatever. ACK :)

@laanwj
Copy link
Member

laanwj commented Jan 2, 2017

To be honest I'd prefer not to add to many of these extra bureaucratic rules to Travis. It's reason for existence is to test the code. If too many auxiliary checks are added it becomes too pesky it increases the chance people will just skip it and merge anyway, as sometimes something needs to be merged.

Copyright is a grey area but please don't go on to add code style checks and such...

@isle2983
Copy link
Contributor Author

isle2983 commented Jan 2, 2017

Thanks @laanwj - getting your take is most important. Keeping header adjustments periodic and asynchronous is perfectly reasonable.

The script improvements here are still useful for assessing and fixing up the headers as they drift over time. I have re-submitted as #9459 with the CI part and 'enforcement' language taken out.

Copyright is a grey area but please don't go on to add code style checks and such...

Could you clarify your perspective on automated code style checks? Some sense of what is and isn't on the wish list would be helpful.

The way I see it, there are a couple levels of similar things that can be done for improving code style:

  1. a script to assess and highlight problems with basic style (spaces over tabstops, no trailing whitespace, don't end lines with double semicolons, etc...)
  2. attaching this basic style script to CI for enforcement
  3. a script wrapper to run clang-format/pylint/whatever and highlight deviations from a preferred style
  4. attaching this script wrapper CI to lock down a preferred style in the codebase (probably starting out with an include/exclude list to manage pain, but then ratchet up)

1 and 3 can be done in a similar way to copyright_header.py to assess and periodically nudge the code in a good direction asynchronous to day-to-day development PRs.

2 and 4 are more for automating away style nits to better focus on PR contents. Also, ratcheting towards and achieving a consistent code style helps make reading the code easier and bugs shallower. I would expect this to be more controversial since it requires some irritation and short-term pain. However if 1 and 3 are well-built and mature first, demand for CI integration may naturally follow.

@TheBlueMatt
Copy link
Contributor

@laanwj On the flip side, everything that travis enforces is one less thing we need to check in review...Obviously code style checks aren't coming anytime soon, but I think this is reasonable.

@TheBlueMatt
Copy link
Contributor

Regarding code style: we're a long way from really enforcing anything regarding code style...would be cool if we could detect indentation errors (since those are likely to introduce real bugs, whereas other code style issues may or may not be as likely), but I'm not sure if we could do ONLY that with many tools.

@maflcko
Copy link
Member

maflcko commented Jan 7, 2017

Well, considering white space errors that caused bugs such as #9319 (comment) or de8980d, I think having an indicator that the touched code is clang formatted really makes sense. Some months ago I hacked up a script for travis, which fails when the new code is not formatted, but a failing travis is obviously not something we want in pull requests.

Would it be possible to have another service run that provides this indicator, but is still not too prevalent so that a "failing" format check can be ignored?

@isle2983
Copy link
Contributor Author

isle2983 commented Jan 7, 2017

I am not deeply familiar with Travis CI, but from looking at their docs there doesn't seem to be an obvious way to 'warn' instead of 'fail'.

One interesting thing in the docs is a workflow they discuss for coverity scans:

https://docs.travis-ci.com/user/coverity-scan/#Build-Submission-Frequency

Similarly, we could have a collection of tools and corresponding side branches that do thorough static code checks in CI. Any found issues can be fixed on those branches and then periodically cherry picked back into the mainline.

@isle2983 isle2983 force-pushed the PR-travisci-copyright-enforce branch from 98fa124 to 28f8a36 Compare January 7, 2017 18:10
@isle2983
Copy link
Contributor Author

isle2983 commented Jan 7, 2017

Rebased. The conflict was in the files:

contrib/linearize/linearize-data.py
contrib/linearize/linearize-hashes.py
share/rpcuser/rpcuser.py

For three files, the conflict was with 27765b6 which incremented the end year to 2016.

For linearize-data.py and linearize-hashes.py it also conflicted with 3c8f63b which changed the shebang at the top of the file to point to python3

@TheBlueMatt
Copy link
Contributor

Heh, #9487 is pretty much exactly what I was referring to that it'd be ideal to fix in #9452 (comment).

@maflcko
Copy link
Member

maflcko commented Jan 7, 2017

It should be possible to hack something up with cpplint (c.f. https://github.com/google/styleguide/blob/b282a74fea1455f4648d7f3098c954cce46e3a8d/cpplint/cpplint.py#L246)

@fanquake
Copy link
Member

Closing this in favour of #9603. Enforcing copyright header styling (to a sub-tree level) is currently overkill.

@fanquake fanquake closed this Jan 21, 2017
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants