-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely long run time on unit tests in GitHub actions Ubuntu environment. #409
Comments
it looks like for any test that involves training a model the model just never starts training, but only on Ubuntu. Windows and Mac work fine. Not sure why yet. |
I only ran the very first ModelRunner test on the PR for
This was the output from the MacOS and Windows unit test action, both of which fully executed in under 1.5 minutes:
I used the def test_initialize_model(tmp_path, mgf_small):
"""Test initializing a new or existing model."""
print("Initializing test configuration")
config = Config()
config.model_save_folder_path = tmp_path
# Test: Initializing model without initializing tokenizer raises an error
print("Testing initialization without tokenizer (expecting RuntimeError)")
with pytest.raises(RuntimeError):
ModelRunner(config=config).initialize_model(train=True)
# Test: No model filename given, so train from scratch
print("Initializing tokenizer and model for training (train from scratch)")
runner = ModelRunner(config=config)
runner.initialize_tokenizer()
runner.initialize_model(train=True)
# Test: No model filename given during inference = error
print("Testing inference with no model filename (expecting ValueError)")
with pytest.raises(ValueError):
runner = ModelRunner(config=config)
runner.initialize_tokenizer()
runner.initialize_model(train=False)
# Test: Non-existing model filename during inference = error
print(
"Testing inference with non-existing model filename (expecting FileNotFoundError)"
)
with pytest.raises(FileNotFoundError):
runner = ModelRunner(config=config, model_filename="blah")
runner.initialize_tokenizer()
runner.initialize_model(train=False)
# Train a quick model
print("Training a quick model with minimal configuration")
config.max_epochs = 1
config.n_layers = 1
ckpt = tmp_path / "existing.ckpt"
with ModelRunner(config=config, output_dir=tmp_path) as runner:
runner.train([mgf_small], [mgf_small])
runner.trainer.save_checkpoint(ckpt)
print(f"Quick model trained and checkpoint saved at {ckpt}")
# Test: Resume training from previous model
print(f"Resuming training from checkpoint {ckpt}")
runner = ModelRunner(config=config, model_filename=str(ckpt))
runner.initialize_tokenizer()
runner.initialize_model(train=True)
# Test: Inference with previous model
print(f"Initializing model for inference with checkpoint {ckpt}")
runner = ModelRunner(config=config, model_filename=str(ckpt))
runner.initialize_tokenizer()
runner.initialize_model(train=False)
# Test: Spec2Pep model tries to load weights and throws EOFError
print("Testing Spec2Pep model weight loading (expecting EOFError)")
weights = tmp_path / "blah"
weights.touch()
with pytest.raises(EOFError):
runner = ModelRunner(config=config, model_filename=str(weights))
runner.initialize_tokenizer()
runner.initialize_model(train=False)
print("All tests for model initialization completed successfully") |
I tried running the test on one of the cluster machines and got this error:
|
The unit tests take a long time to execute in a GitHub actions Ubuntu environment, especially on the branch
dev_latest_depthcharge
. On this branch the past two attempts to run the unit tests in a GitHub actions Ubuntu were killed after 6 hours, but the Windows and MacOS environments both ran the unit tests within 6 minutes.The text was updated successfully, but these errors were encountered: