Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac OS current error #5

Closed
henrykironde opened this issue Oct 22, 2021 · 15 comments
Closed

Mac OS current error #5

henrykironde opened this issue Oct 22, 2021 · 15 comments

Comments

@henrykironde
Copy link
Contributor

 model = df_model()
Reading config file: /Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/deepforest/data/deepforest_config.yml
> model$use_release()
Model from DeepForest release https://github.com/weecology/DeepForest/releases/tag/1.0.0 was already downloaded. Loading model from file.
Loading pre-built model: https://github.com/weecology/DeepForest/releases/tag/1.0.0
> 
> annotations_file = get_data("testfile_deepforest.csv")
> model$config$cpus = 1L
> model$config$workers = 1L
> model$config$epochs = 1
> model$config["save-snapshot"] = FALSE
> model$config$train$csv_file = annotations_file
> model$config$train$root_dir = get_data(".")
> 
> model$config$train$fast_dev_run = TRUE
> 
> model$create_trainer()
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 1 batch(es).
> model$trainer$fit(model)

  | Name  | Type      | Params
------------------------------------
0 | model | RetinaNet | 32.1 M
------------------------------------
31.9 M    Trainable params
222 K     Non-trainable params
32.1 M    Total params
128.592   Total estimated model params size (MB)
Epoch 0:   0%|          | 0/1 [00:00<00:00, 4152.78it/s]  /Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py:106: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
/Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py:327: UserWarning: The number of training samples (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"
/Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py:382: UserWarning: One of given dataloaders is None and it will be skipped.
  rank_zero_warn("One of given dataloaders is None and it will be skipped.")
[W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
@henrykironde
Copy link
Contributor Author

Some more error report from terminal, the above was from Rstudio

> model$trainer$fit(model)

  | Name  | Type      | Params
------------------------------------
0 | model | RetinaNet | 32.1 M
------------------------------------
31.9 M    Trainable params
222 K     Non-trainable params
32.1 M    Total params
128.592   Total estimated model params size (MB)
/Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py:327: UserWarning: The number of training samples (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"
/Users/henrysenyondo/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py:382: UserWarning: One of given dataloaders is None and it will be skipped.
  rank_zero_warn("One of given dataloaders is None and it will be skipped.")
Epoch 0:   0%|                                                                                                | 0/1 [00:00<00:00, 4782.56it/s][W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:212] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
zsh: abort      R

@henrykironde
Copy link
Contributor Author

Looks like there a crash on binary libiomp
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initializ
Some reference :

  1. OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized. dmlc/xgboost#1715
  2. https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial

Worked for me after setting Sys.setenv("KMP_DUPLICATE_LIB_OK"="TRUE").
We have to be careful since libiomp5.dylib vs libomp.dylib may give us different results

@spono
Copy link

spono commented Aug 18, 2022

same OMP issue on W10 when running model = df_model():

  • plain crash on Rstudio
  • some informative help on R:

OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

What do you suggest "[The best thing to do is ] to ensure that only a single OpenMP runtime is linked into the process"?

Your solution using Sys.setenv("KMP_DUPLICATE_LIB_OK"="TRUE") seems "risky" for the actual use in a production environment (having no idea if and when it may cause issues).
Thanks in advance

@ethanwhite
Copy link
Member

ethanwhite commented Mar 5, 2023

I've now fixed the OMP issue via a change in the installation instructions that removes the mkl package which is causing this issue e10c158

Can someone using macOS follow the new installation instructions and see if the rest of the issues reported here remain? I'm still seeing training issues on Windows, but things now work properly for predicting from the release model

@mirandateats
Copy link

Using macOS, I ran into the following issues during installation:

  1. reticulate::conda_remove('r-reticulate', packages = 'mkl') returned the following:

"+ '~/Library/r-miniconda/bin/conda' 'remove' '--yes' '--name' 'r-reticulate' 'mkl'
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed

PackagesNotFoundError: The following packages are missing from the target environment:

  • mkl

Error: Error 1 occurred removing conda environment r-reticulate"

After this error, I continued with installation anyways...

  1. I had to run install.packages('devtools') (not included in the installation code) before running devtools::install_github('weecology/deepforestr')

  2. It seems that any code including the df_model() function crashes RStudio. Examples that have caused a crash:
    model <- df_model()
    deepforestr::df_model()

@ethanwhite
Copy link
Member

Thanks for the report @mirandateats! Unfortunately we've had ongoing stability issues with reticulate (which is how we run the core Python package from within R) on non-Linux systems. We'll keep trying to address those issues, but at the moment my recommendation is to do the core DeepForest work using the Python package directly and then import the results to R for further analysis and visualization.

@ethanwhite
Copy link
Member

@mirandateats - it looks like some of the upstream issues have been resolved now and I have things running properly on Windows 10. Can you try a fresh install and let me know if you're still running into issues?

@ethanwhite
Copy link
Member

@spono - after some upstream fixes everything seems to be working on Windows now. Can you try a fresh install and then see if the test code below runs

library(deepforestr)

model = df_model()
model$use_release()

annotations_file = get_data("testfile_deepforest.csv")

model$config$train$csv_file = annotations_file
model$config$train$root_dir = get_data(".")

model$create_trainer()
model$train$fit(model)

@ethanwhite
Copy link
Member

@henrykironde - can you test again on macOS since our upstream issues seem to be resolved now (at least on Windows)

@robAndrus34
Copy link

@henrykironde and @ethanwhite - I'm curious if you've resolved this issue. I ran into the same problem on macOS yesterday. After a basic install according to the directions on the website, Rstudio crashed when I ran model = df_model()

Thank you.

@ethanwhite
Copy link
Member

Thanks for the report @robAndrus34! We haven't managed to reproduce this locally in part due to not having many mac's in the lab. If you have time to work with us on debugging on macOS we'd be happy to do that. If you need to get something up and running quickly then it's pretty easy to do in Python even if you don't much Python work. Let us know which direction you'd like to go and we'll be happy to help.

@robAndrus34
Copy link

Thanks @ethanwhite . I decided to go the Python route for now. At some future date, I may be interested in troubleshooting the R issue. Thanks

@ethanwhite
Copy link
Member

Sounds good @robAndrus34 - let us know if you have any questions as you get things up and running in Python

@ethanwhite
Copy link
Member

This failure is now reflected in our failing macOS tests which may help us explore this further.

@ethanwhite
Copy link
Member

Tests are now passing for macOS on non-M1 chips and everything is working on local tests on Linux and Windows including RStudio. Therefore I'm going to go ahead and close this issue. Please open a new issue with detailed information if you have issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants