Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add ability to ignore folders in scm attribute #6605

Closed
DoDoENT opened this issue Mar 2, 2020 · 11 comments
Closed

[feature] Add ability to ignore folders in scm attribute #6605

DoDoENT opened this issue Mar 2, 2020 · 11 comments
Assignees

Comments

@DoDoENT
Copy link
Contributor

DoDoENT commented Mar 2, 2020

Currently, when scm attribute is used, the entire folder is copied to the local cache on the invocation of conan create. This can take a long time if the source folder contains some temporary files or folders that are under .gitignore. This also includes the .git folder, which contains the entire git history and should not be copied to source folder in the local cache, as the revision information is extracted before copying the source.

I propose adding the ignore attribute to the scm attribute so that files and folders matching the entries in the ignore attribute will not get copied to the source folder on conan create invocation.

For example, it would look like:

class MyPackage(ConanFile):
    ...
    scm = {
        "type": "git",
        "url": "auto",
        "revision": "auto",
        "ignore": [ ".clangd/*", "*/build", "test-data/*", ".git" ]
    }

Also, I think that .git should be ignored by default (i.e. it should not be required to add it to the list.

@DoDoENT
Copy link
Contributor Author

DoDoENT commented Mar 2, 2020

For example, I noticed that after testing my package with conan test ./my-test-package MyPackageName/version manually it creates folder ./my-test-package/build which later gets copied to local cache when invoking conan create.

I know that you expect to run conan test after conan create, and this is OK, but sometimes you have multiple iterations (especially during development), so it happens like this:

conan create ... # created package
conan test ...    # testing package, something fails
# fix the recipe
conan create ... # created a fixed package, test/build gets copied
conan test ...

I know that the ideal way of package development would be to use conan install, conan build and conan package but it's nowhere near as convenient(*) as conan create and conan test with your local cache.

(*) The conan build and conan package require a lot of command-line parameters (build folder, install folder, package folder, source folder) and you still cannot perform conan test on the folder created with conan package, let alone test the calculation of package_id. Therefore, it's much more convenient to simply conan create ./conanfile.py myuser/temp, let it do all the things and then test it with conan test. Of course, the temp packages are never uploaded to the Artifactory (only the CI has the write access to it), so there is no risk that temp packages will end-up on Artifactory.

@DukeOfKirkcaldy
Copy link

DukeOfKirkcaldy commented Mar 2, 2020

I'd also like to see ".git" ignored.

When developing with the "package development" commands, subsequent calls of conan source result in an error due to the git index file being read-only, meaning that it must be manually deleted before pulling changes to sources into the build folder.

@jgsogo jgsogo self-assigned this Mar 3, 2020
@jgsogo
Copy link
Contributor

jgsogo commented Mar 3, 2020

Let's focus on the original issue, we know that there are flaws related to the development commands (we would love to hear your feedback about them), but it is better not to mix them with this issue.

I agree with you that we should avoid copying the ignored files, we can ask Git to tell us which files/directories are in the .gitignore file (we already have that functionality implemented in Git::excluded_files() method). About not copying the .git directory I need to check it, because some functionality in the recipes can rely on the repository being available and it can fail.

Besides taking into account what it is said in the .gitignore file, probably it's not needed to add a feature to ignore more files/folders, right? Probably it adds a lot of complexity (Conan should know that the repo is not pristine because of this feature instead of actually missing files).

Thanks for reporting!

@DoDoENT
Copy link
Contributor Author

DoDoENT commented Mar 3, 2020

Asking the git for files in .gitignore is only one part of the solution. But sometimes you want to ignore files that are in git but are not relevant for the final package, such as testing files (in our cases those are images and videos that take several gigabytes).

Those files are not in .gitignore (in our specific case they are also in git submodule, therefore I've also filed issue #6606 to prevent even cloning that submodule when building by dependency).

The other cases when test files are relevant for testing the code, but not relevant for the package could be:

  • images/videos for packages that contain code that perform some image/video recognition
  • test textures/models for packages that contain some part of the game engine
  • test vectors and blobs for a package that contain cryptographic algorithms

You generally want those files in git (usually in git LFS) because you need them for your unit tests and during development, but the final package will never depend on them.

@memsharded
Copy link
Member

The point here, is that when using the SCM, what you get when the package is going to be built in the local cache, using a git clone from the SCM data, should be as close as possible to what is being copied from the user folder to the cache, so when the thing is built and package, the result is the same.

The best way to achieve that is using the information from the .gitignore, using Conan to do extra filtering on that local copy, will produce failures later. Lets say that you exclude a local "header.h", but that is not .gitignored. When the package is built from sources in the cache, and does the git clone, it is very likely that such "header.h" will now get packaged, and that was not intended, probably the original reason you wanted to exclude it from the SCM copy.

@DoDoENT
Copy link
Contributor Author

DoDoENT commented Mar 3, 2020

When the package is built from sources in the cache, and does the git clone, it is very likely that such "header.h" will now get packaged, and that was not intended, probably the original reason you wanted to exclude it from the SCM copy.

From my point of view this just means that the filtering should be done after git clone even in by-dependency build, i.e. that the auto-generated source method (generated behind the scenes by the scm attribute) should perform the same filtering procedure after git clone as while copying the user folder.

@memsharded
Copy link
Member

From my point of view this just means that the filtering should be done after git clone even in by-dependency build, i.e. that the auto-generated source method (generated behind the scenes by the scm attribute) should perform the same filtering procedure after git clone as while copying the user folder.

Removing those big files when the package is built in the cache as a dependency might be even slower, because they will be cloned anyway from the repo. The value of removing those is very marginal to be a feature to be added with the added complexity.

Regarding the other optimization (avoiding the copy), lets discuss about it, I don't think this is a good first issue.

@DoDoENT
Copy link
Contributor Author

DoDoENT commented Mar 3, 2020

Specifically, in my case, those big files are actually in test-data submodule, which doesn't get cloned in the first place (as described in #6606). So in case of the build by dependency, those files actually do not exist. But during local development, they do and they slow down the development iterations a lot (especially on slower disks).

I agree that it is problematic to exclude some files that are actually part of the source (local header, as in your example). One possible idea would be to ignore certain submodules during the copy (this is essentially the same as #6606), but someone will certainly have a submodule in which only some files need to be ignored, so this is also not a good solution.

Maybe we can go the ignore path as described in my initial comment, but with a note that it only applies for local development (i.e. that it does not filter those files during by-dependency build)? It's not ideal (it introduces inconsistency between the case of local and by-dependency build), but would solve my issue (and probably introduce a lot of new ones 😁 :trollface: ).

Of course, if anyone has some better idea, let's discuss it.

@jgsogo
Copy link
Contributor

jgsogo commented Mar 3, 2020

I've opened a different issue (#6617) for taking into account the .gitignore when copying files from the user directory to the cache. This is just an optimization and we shouldn't copy those files if they are not part of the repo.

As for the submodules, we should do the same: if the SCM attribute says that submodules=None we shouldn't copy them (same reason: without adding complexity we should copy to that folder the same files that would be there if cloning the repo).

We can open one issue for that and keep this to discuss about ignoring specific files ⬇️


About ignoring files: IMHO we shouldn't make the syntax more complex to add a feature that works only for an optimization. Those ignore_paths would apply only when running the SCM optimization that copies the files from the local folder to the cache (conan export or conan create). I think that the underlying issue here is that the local development commands don't work as expected and the user prefers the conan create + conan test instead of conan build + conan package + conan test (maybe the test command should be able to use a local package folder, and make it a local development command).

@DoDoENT
Copy link
Contributor Author

DoDoENT commented Mar 3, 2020

IMHO we shouldn't make the syntax more complex to add a feature that works only for an optimization.

Wasn't just that the reason for adding move semantics to C++ :trollface: ?

maybe the test command should be able to use a local package folder, and make it a local development command

I've requested that feature back in July 😛

But in general, the conan build + conan package is quite complex for daily use (a lot of parameters must be given (install folder, build folder, package folder, profile, options, settings), which is easy to do wrong, and conan test does not work with that flow (see issue #5512).

And, based on this comment from @jgsogo, it seems that local development flow is on special codepath in conan, i.e. it appears that different code gets executed in conan create vs conan build + conan package, at least in terms of dependency resolution (correct me if I'm wrong).

Also, the conan build + conan package will not work for complex cases like this one - in order to test the OcrEngine package from the linked example, you need to have OcrModelBuilderTool in your local cache to simulate usage from the perspective of EuropeanOcrModels.

My current flow for testing this package is to first conan create the OcrModelBuilder, then OcrModelBuilderTool, then OcrEngine and then I can conan test the OcrEngine package. I see no way of doing the same purely with conan build + conan package + conan test (even if conan test supported using local folder generated by conan package instead of local cache).

@memsharded
Copy link
Member

scm deprecated in 1.X, removed in 2.0, closing this ticket as outdated.

@memsharded memsharded closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants