Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usability Fixes #10

Closed
4 tasks done
HenriKajasilta opened this issue Mar 5, 2024 · 9 comments
Closed
4 tasks done

Usability Fixes #10

HenriKajasilta opened this issue Mar 5, 2024 · 9 comments
Assignees

Comments

@HenriKajasilta
Copy link

HenriKajasilta commented Mar 5, 2024

Followed the code junks in the process flow and visualisation sections: in both, the bottleneck seemed to be the load_from_cache() function, which produced an error. If I understand correctly, the primary problem is with the driver_drake() function, as it produced the error mentioned in this #8 issue. Since driver_drake() is not working as expected, there is also problem what is stored in cache. Calling a driver() function seems to work and downloads a bunch of different files, but doesn't seem to produce a data trace.

> MODULE_INPUTS %>% load_from_cache() -> all_data
Error in (function (target, character_only = FALSE, path = NULL, search = NULL,  : 
  cannot find drake cache.
  • Fix the aforementioned workflow.

I would prefer examples in functions. Especially since the driver() function has many options, it would be beneficial to have an option set, that is easy and fast to run, so that the user gets a better illustration of what data has been processed.

  • Suggestion: Create an example of how to use the function(s), that is/are easy to perform and light to run.

Changing the output directory to where the csv files are written (as mentioned here, will produce some of the outputs in the specified location, but will also produce the same files under the ./output. I assume this is not intended.

  • Fix data storing to multiple locations.

There are packages, which are not mentioned in DESCRIPTIONS. They may not be necessary for the package to work in some parts, but when working for example with the visualizations more packages were necessary to install. Take care that at least devtools::check() will not produce any errors and warnings.

  • Fix the issue so that building the package won't produce an error.
@realxinzhao
Copy link
Collaborator

realxinzhao commented Mar 16, 2024

Thank you for the valuable comments.

It seems there was some confusion caused by the driver() function, which was not a preferred approach compared to driver_drake(). This is now clarified in Getting Started. And only driver_drake was included in the paper and Quick Start.
But in short, both functions were inherited from gcamdata and they were documented. For gcamfaostat, it is recommended to use driver_drake() so the data tracing and exploring after the drive will be possible, even though it may take longer to run and additional space (for drake cache) compared to driver().

Fix the aforementioned workflow.

Please note that driver_drake() needs to be run successfully to build a drake cache. And the load_from_cache() function loads the data from the drake cache. It appears the error you shared was cannot find drake cache, which is likely because driver_drake() was not run successfully. Please check this can let us know if the issue remains.
The issue in #8 was a little different I believe.

Suggestion: Create an example of how to use the function(s), that is/are easy to perform and light to run.

As explained, the driver() function, while working, won't be the main approach for gcamfaostat. Many of the options in the driver() function can be replaced by the use of the drake cache generated by running driver_drake(). Thus, the examples we provided focused on utilizing drake cache to trace intermediate data flows (e.g., ``load_from_cache()`).

Fix data storing to multiple locations.

Good catch! Writing out intermediate data to csv files was a default option in driver(). The following sentence is added to Getting Started:
"The driver() approach has a default option (write_outputs = T) to write the intermediate csv outputs to ./outputs"

Fix the issue so that building the package won't produce an error.

Package checking errors have been addressed. Thank you!

@HenriKajasilta
Copy link
Author

I installed the package from scratch following the instructions in the documentation. I am not very familiar with the renv package beforehand, so this could also be a user error. However, when I follow the instructions for using renv here I am not able to get the packages needed to run the application. Can you please check if following these steps gives you the expected outcome? I can make a new issue about this, but first I would like to clarify how it should work.

@realxinzhao
Copy link
Collaborator

@HenriKajasilta Thank you!
Did you see a specific error message when running driver_drake?

I believe renv is only to ensure the version of the dependency packages is identical to the ones specified in renv files. It is not really needed unless there are package version conflicts that could affect the running of the package.
We tried different machines before usually without needing renv.

@HenriKajasilta
Copy link
Author

Yes @realxinzhao , I have the same error as mentioned earlier on issue 8, that is why I wanted to proceed with renv to overcome this.

ℹ Loading gcamfaostat
> driver_drake()
Loading required namespace: drake
GCAM Data System v1.0.0
Found 19 chunks
Found 77 chunk data requirements
Found 43 chunk data products
37 chunk data input(s) not accounted for
Error in `mutate()`:
ℹ In argument: `command = paste0(PACKAGE_NAME, "::PREBUILT_DATA[['", target, "']]")`.
Caused by error:
! `command` must be size 0 or 1, not 99.

@realxinzhao
Copy link
Collaborator

@HenriKajasilta Thank you, and apologies for the inconvenience.
I did try a clean install again and didn't see the error. However, I realize devtools::install_deps() only updates local package per the specifications in the DESCRIPTION file. And we only required lower versions of the packages mostly following the original settings in gcamdata.
I have now updated the package version there so that devtools::install_deps() (or Rstudio) will check those versions.

I think the issue came from either tibble or dplyr package versions. But please let us know if the error remains. Thank you so much!

@HenriKajasilta
Copy link
Author

@realxinzhao Thank you for taking a look.

It seems that the problem still remains for me. I started with the renv approach and ran renv::init( bare = TRUE), which I assume worked as intended. Then running renv::restore() doesn’t really do anything and informs that the library is already synchronized with the lockfile. I must be missing something simple here.

Nevertheless, if I continue with devtools::install_deps(dependencies = TRUE), I get the to install the listed imports and suggests packages from the DESCRIPTION file (and in my case, those went to the Project Library due to the renv), but now I again pump in to the error message, which was mentioned above, when I run the driver_drake().

@realxinzhao
Copy link
Collaborator

Thank you so much! @HenriKajasilta
I'm testing this on another machine. I did see requests of updating packages
image

And after updating packages, I did see the error... which indeed because your packages (likely dplyr or tibble) were newer not older. Sorry about that (I thought I used the most recent versions).
image

I'm running a driver() test, which seems to be fine. I will debug the mutate error in driver_drake in a bit. Thank you!

@realxinzhao
Copy link
Collaborator

Dear @HenriKajasilta
Thanks again for careful checking the package.

We have now fixed the bug which seems to be environment related (tibble won't call data in parent env if it is not exist in the current package env). The small fixe is pushed up. In addition, we also re-initialized the renv to update the lock file. I think renv won't be needed in your case (since you have the newer packages).
Please let us know if the issues remain or there are other issues. Thank you! And apologies about the inconvenience.

@HenriKajasilta
Copy link
Author

Hi @realxinzhao

Thank you for taking care of the issues. Everything seems to work as expected now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants