Clean input files arguement #494
Replies: 4 comments 2 replies
-
It depends on what kind of branching you need. If you want to dynamically branch over individual files, |
Beta Was this translation helpful? Give feedback.
-
Thanks for the tips. I've taken a look and I'm not sure what option, if any, would provide the best solution for what I'm looking for. I don't really want to branch over the files individually, at least not indefinitely in the pipeline. Rather I just want to have a clean argument of input files because at the beginning of the pipeline I consistently follow this pattern of defining the file and then reading it in, and then apply different functions to different data sets before eventually combining the data sets. I think the suggestions you put forward are to help with applying the identical workflow to different files? Perhaps the network visualization below helps. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot, and sorry for the unclear example and explanation! Yes you are correct, I have many input files at the start of my project, I do some processing and then eventually end up with some clean data where I really take advantage of the dynamic branching functionality in drake. The beginning targets are simply:
I guess I was just wondering if there was a cleaner way to read in a whole bunch of input files because it seemed like a lot of very similar code and one of the benefits of So in your example, I could do something like this:
But I was wondering if there was functionality for files to have names so that the targets can be easily accessed? for example:
Hopefully that makes more sense. |
Beta Was this translation helpful? Give feedback.
-
Here's a better example with a potential solution using the
|
Beta Was this translation helpful? Give feedback.
-
I have a project with several input files and I was thinking it would be quite clean to have just one target for input files and then call to that target to read those input files as necessary. Right now I essentially have a target for each file and then antoher target for the data itself which reads quite messy.
For example, borrowing from the
minimal_example
.The problem with the above is that
input_files_check
does not feed intoraw_data
. Also if I later update the input files function with more files:Then of course
raw_data
becomes outdated. I assume onceraw_data
is re-runtargets
sees that nothing has change and its downstream targets are not run, but is even this computation expensive for largish files? Or is it just seeing that arguement has not changed so it doesn't even need to run the read/fread function (in that case, not expensive at all). Do you have any suggestions for this type of workflow? Ideally I want one nice clean target for all input files that I can easily refer to. But if it changes late in the project perhaps it's resource expensive to be changing it all the time. What do you think?Beta Was this translation helpful? Give feedback.
All reactions