-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
17 MB / 90 MB repo size!? #96
Comments
Hi @nanoant, any idea how to reorganize without fragmenting stuff around into other repos or to other sites? I understand your concern, but at least now everything is in one place. In particular, the benchmark helped to improve at least one pull request. Cheers, |
@nlohmann My concern is just these huge benchmark input files that blow the repo size, these may be just moved out from the main repo to some other repo, and then cloned on first benchmark run (in Makefile). Anything else is already pretty small. Right now I am probably going for embedding |
Would a special development branch help? Then I could move most of the stuff from there and leave the master branch small. (Sorry, I have little experience with Git.) |
@nlohmann I think the problem is that once the object (huge file) is there in the repo then there is no difference in which branch it resides, the clone will still fetch whole repo including huge file. I was able to reduce repo size to 3.4 MB with recipe taken from: http://manuel.manuelles.nl/blog/2011/12/22/shrinking-your-git-repository/ But still this is lots of effort, so I probably you should skip that, and leave the repo as is. |
@nlohmann
the library repo will be easy to be integrated into other 3rdparty codes (by git submodules, subtree ...) to prevent radical changes to the current workflow, you can create another repo (a single header) as release repo and let people use that repo as dependency simply by |
This split would make the development a bit inconvenient on my side. As I am doing this alone in my spare time, I would like to keep it as is. |
Why would you include a single-header project as submodule? |
That's how I use it. That way the link is explicit, and it tracks a particular version, rather than manually extracting the one header file from the repository with no indication that it came from another repo, and have to deal with the possibility that someone may modify the local copy. However, I have no problem with the size of this project as a submodule. |
@nlohmann you can consider using |
@nanoant To me, this splitting would be inconvenient, and I like having everything in one repository. |
Guys. I think the best solution is to have a package manager that will download whole repo, insert |
To add to the discussion I will just add my argumentation here. I want to use "json" as a submodule to my project, because:
However, the current repository is extremely large, due to the tests and benchmarks. This hinders again the distribution of my code. Therefore it would be great if a minimal repository can be maintained. |
I'll also chime in here - I also like using even single-header libs as a submodule to be able to track what version I'm using and easily be able to update by just pulling the submodule. And yes, for me it is also problematic that 90mb is downloaded. It takes ages, even on my fast connection and this is gonna hit everyone using my repo as well, including autobuilders. This memory/bandwidth cost of 50+ mb just coming from a single test file is ridiculous. |
just tried to pack every released # size in KB
$> curl https://api.github.com/repos/nlohmann/json | grep size
"size": 136956,
$> curl https://api.github.com/repos/azadkuh/nlohmann_json_release | grep size
"size": 100, |
I would prefer not to maintain my own minified fork (or depend on anyone else's). Wouldn't a submodule solution meet everyone's requirements?
|
I believe this would make everybody happy indeed. |
I know this is already closed, but as the discussion is still going on, I want to share my thoughts I wonder why this is needed? I mean @nlohmann is right it is nice to have everything in one place for development and IMHO this is the what this repo is for. But to have the luxury of a git-submodule I would propose a different approach. PRO
CON
|
Just to add to the discussion: I have created a repository at https://github.com/astoeckel/json which tracks the released Note however that this repository is bare-bones and tailored towards my own needs. It just contains |
nlohmann/json commit: 21516f2bae552a49cc1ba1c11746be3730361d8d Reference: nlohmann/json#96
As of now, the size of repository has increased to whopping 435 MBs. The biggest offenders are
To reduce the size of
I believe using We can later deal with the size of |
Thanks for investigating! I have little time in the moment, but this is definitely a way to move forward! |
I had a PR closed by the stale bot which used |
@nlohmann: Sure! I've created a PR and the preliminary tests seems promising. @theodelrieu: Thanks for pointing that out! I will look into it. Since this change can affect the workflow of the users, there is the possibility that I've overlooked some use-cases. Therefore, besides passing the CI tests, hearing others' experience would important to make sure this change will not have negative effects. |
@nickaein @theodelrieu @gregmarr Hi all, I finally had time to think about this issue and thing we should move forward with it. As a first working draft, I created a branch https://github.com/nlohmann/json/tree/tests_external which loads the tests from a separate repository via include(FetchContent)
FetchContent_Declare(nlohmann_json_tests
GIT_REPOSITORY https://github.com/nlohmann/json_tests_tmp.git
GIT_SHALLOW TRUE
)
FetchContent_GetProperties(nlohmann_json_tests)
if(NOT nlohmann_json_tests_POPULATED)
message(STATUS "Download tests")
FetchContent_Populate(nlohmann_json_tests)
message(STATUS "Download tests - done")
add_subdirectory(${nlohmann_json_tests_SOURCE_DIR}/test)
endif() As a result, the However, I am not yet sure whether this is the best approach. In particular, I am afraid that tests and code get out of sync as a PR in the original repo cannot simply add tests on its own, but would need a PR in the tests repo first. Alternatively, I could only move the test files to the separate repo, but leave the JSON files in the separate repo. Any ideas on this? |
I think splitting code and tests into a separate repositories is a bad idea. Newer git versions have "sparse checkout" support, see https://www.git-scm.com/docs/git-sparse-checkout, so it would maybe possible to not checkout the tests folder. For general users the various package managers should also help. |
@nlohmann it may be a better idea to have a separate repository for released versions of BTW every solution has its own pros/cons |
@nlohmann Library and tests can get out of sync
I agree. There is also the problem that after a while a user may want to checkout an older version of the library while the latest Possible solution: We might have to (internally) tie the library version to a
I believe you meant to keep the source code for tests here and only move the large JSON files into the separate repo. I totally agree with this approach. The changes in the library and test source code are related in meaningful way and there is semantic to see them in single commit (or a single PR). On the other hand, since the large JSON data for tests doesn't have much semantics and they rarely change, we can have them as an external repo. Which one? test or benchmark dataThere is also ~58 MBs worth of data for benchmarks. While they have smaller impact (compared to ~194 MBs of test data), we can move them if you want to be conservative. In future though, both of them can be moved to an external repo. Why?Putting aside the technicalities, I believe we should consider the use-cases which make the users raise this issue. In some cases the user would be happy if along the single-header release, Currently, they find the contents of releases inadequate and fall back to cloning the whole repo. This can be avoided if we add such files to the zip releases. I'm not sure if there are other considerations though. |
@t-b Nevertheless, this solution can probably help some of the users. We can recommend this (as a temporary solution at least), until further notice.
There is already single header files available for such users at Releases page. Does providing them in a separate repo have any benefits? |
Thanks for the comments!
|
Hi all, please have a look at #2081 where I implemented the approach of having a separate test data repository. |
Even 17 MB download is not really that big, it is really unexpected to download that much for single-header json library. Moreover once cloned it takes 90MB on the disk, even more surprising.
And the reason for that are these benchmark test files introduced by cb873a4. Is it really good to put such large files in the repo!? Unfortunately once you have put it then it is not easy anymore to reduce repo size. You will need to rewrite.
Anyway please consider rewriting the repo, and keeping it clean & lean, then likely everyone can use it as submodule.
The text was updated successfully, but these errors were encountered: