-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort out models, data, and tools #142
Conversation
Decided to have a separate repository for data, model definition files, and example models, with scripts in this repo to download them as needed. The separate repo will be stable against master, not dev. @shelhamer will do this. |
@sergeyk that will be nice, currently the repo is big and slow to download because of the synsets. Can these be removed from the git history also? |
@mavenlin the synsets are on my list for this reorganization. Filtering them from the history is necessary to save space, and a simple command, but it breaks history. We'll consider such house cleaning when it comes time to release Caffe 1.0, but we're not going to rewrite history on |
@sergeyk I added you as a collaborator on my fork so that we can jointly take care of the documentation updates trigged by this PR. All you have to do is push to my fork's |
Instead of scripts to pull models and data, a |
This isn't going to work with github's (not unreasonable) file size and traffic limitations. git's a drag with large binary files too, so perhaps it's best. The alternative is self-hosting from campus or ICSI. |
Let's host as many models, sample data, and model def files as possible in On Tue, Feb 25, 2014 at 12:59 AM, Evan Shelhamer
|
Oh, sorry I wasn't clear. This isn't going to work at all. Not even the Caffe reference imagenet model fits on its own as there's a filesize cap of 100mb. My fallback plan is ICSI hosting and versioning the fetch urls of the scripts in |
I know that the reference imagenet model won't fit. I still think that prototxt files and small sample data should be hosted on github -- everything that fits under 100mb, which is gonig to be basically everything except imagenet models. |
Resolution: keep model definitions in the repo, drop included data, and add scripts to fetch learned models and data as needed. Auxiliary data and model weights will live on dropbox for the moment, and will find their permanent home on a Berkeley server after March 7. Our group will be bringing a demo server online after that date which can hold the data. |
It seemed not a high priority to have a demo (#78). May I ask what demo will the server host? |
Any suggestions on the dir structure or names are welcome–this is the time to arrange everything neatly. @kloudkl re: demo, there will in fact be a Caffe demo along the lines of the DeCAF demo, and along with it other demos of our research group's projects. @Yangqing was not against a demo so much as spending too much time engineering a simple illustration of the framework and not focusing on the research hacking. |
What about this dir structure?
|
That looks right, but I'm torn about how examples fit in. Packing example code, model, and data together makes the example clear, but reuse weird. I'll package purely example files up together, but keep data on its own. |
Collect core Caffe tools like train_net, device_query, etc. together in tools/ and include helper scripts under tools/extra.
Data, models, and examples should not be versioned by default. Reference versions of these are not to be casually committed. Plus this makes for a better playground in examples without having to worry about data, intermediate files, or experiments being accidentally tracked.
Ok everyone, feast your eyes and let me know. Speak now or forever hold your peace. |
Looks great to me! Thanks for the reorganization work @shelhamer. |
- fix paths - replace shell command blocks with scripts - file ipython notebooks in examples - proofread
It looks good but I cannot compile it due the hdf5 dependency introduced in #147 There are some small error in the get_data.sh |
Perhaps we should shortlist this for
|
That will solve the problems for now. We still need to figure out a way Sergio 2014-02-25 19:09 GMT-08:00 Evan Shelhamer [email protected]:
|
Looks good to me, we'll fix potential mistakes once we merge. |
@shelhamer I meant all the get_ilsvrc_aux.sh, get_mnist.sh |
@shelhamer beside that it is great, ready to merge |
@sguada you might have some kind of weird shell. Check |
Sort out models, data, examples, and tools
Sort out models, data, examples, and tools
Sort out models, data, examples, and tools
Sort out models, data, examples, and tools
caffe-mug repodropbox for now (and a suitable server later).Orchestrating updates between commits in caffe and uploaded models and data has overhead, but is worth the separation of concerns and keeping the repo lean.