Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtb does not close file descriptors used in .set_output(filename) #113

Open
coltonbh opened this issue Apr 11, 2024 · 0 comments
Open

xtb does not close file descriptors used in .set_output(filename) #113

coltonbh opened this issue Apr 11, 2024 · 0 comments
Labels
unconfirmed This report has not yet been confirmed by the developers

Comments

@coltonbh
Copy link
Contributor

coltonbh commented Apr 11, 2024

Describe the bug

xtb (the underlying library) does not close file descriptors after writing logs to them. This causes multiple issues with logs. If you are performing a large volume of calculations you will get an OS error when you have performed more calculations that the ulimit -n number of calclutions. ulimit is communicating the maximum number of file descriptors a process can have open.

Additionally, depending on how xtb is run, it appears xtb has some internal logic that attempts to circumvent this bug with strange behavior. xtb will only write up to ulimit - a few calculations to log files, then it stops opening file descriptors, even if calculator.set_output(f"logfile-{i}") is passed and starts dumping output to the console instead of writing output to disk. Also, if you run a calculation and then time.sleep(10) after the calculation and check the log file, you will see that it is empty--in fact a whole series of log files will be empty until the whole program exits and the OS flushes the logs to disk.

xtb needs to close the file it opens here. Calling calculator.release_output() does not fix this problem.

To Reproduce

To show logs getting dumped to console instead of written to disk and to show that logs are not written after a calculation (only flushed when the whole process exists):

Most systems set ulimit -n 1024 by default. Run this script. Calling it script.py.

import time
from pathlib import Path

import numpy as np
from xtb.interface import Calculator, Param
from xtb.libxtb import VERBOSITY_FULL

output_dir = Path("xtb-data")
output_dir.mkdir(exist_ok=True)

numbers = np.array([1, 1])
positions = np.array(
    [
        [0.0, 0.0, 0.0],
        [0.0, 0.0, 1.4],
    ]
)


def calculate_xtb(i):
    calc = Calculator(Param.GFN2xTB, numbers, positions)
    calc.set_verbosity(VERBOSITY_FULL)
    calc.set_output(str(output_dir / f"logs-{i}.txt"))
    calc.singlepoint()
    calc.release_output()
    # time.sleep(10)


for i in range(2000):
    calculate_xtb(i)
    if i % 100 == 0:
        print(i)
python script.py

Now look at the number of log files (there should be 2000), there will only be 1021. Notice how the output started dumping to the terminal instead of being written to log files at the end. This should not happen.

ls -1 xtb-data | wc -l

Remove all the files

rm -r xtb-data/*.txt 

Set a new ulimit and run the script again.

ulimit -n 75
python script.py

Count the log files. Note that much more output dumped to the console instead of being written to log files. Should be 2000 log files, there are only 71 but there should be 2000.

ls -1 xtb-data | wc -l

xtb appears to be introspecting ulimit declarations and limiting file handle opens so it doesn't get terminated by the operating system. However, it should just close the file handles after it opens then and writes logs to them. It should flush the log writing to disk after a calculation. You can see the log flushing/writing is not happening properly by uncommenting the time.sleep(10) in scripts.py. Then look at log-0.txt after the calculation. It is empty. It will remain empty until all calculations are complete and then the OS flushes data to the log files. This is not good and is a result of xtb not closing its file handles properly after writing logs.

If you are running xtb calculations inside of a process the opens temporary directories using python, you'll get a OSError: [Errno 24] Too many open files:. Here is an alternative script that shows how xtb causes this issue:

from pathlib import Path
from tempfile import TemporaryDirectory

import numpy as np
from xtb.interface import Calculator, Param
from xtb.libxtb import VERBOSITY_FULL

from qcop.adapters.utils import tmpdir

output_dir = Path("xtb-data")
output_dir.mkdir(exist_ok=True)

numbers = np.array([1, 1])
positions = np.array(
    [
        [0.0, 0.0, 0.0],
        [0.0, 0.0, 1.4],
    ]
)


def calculate_xtb(filepath):
    calc = Calculator(Param.GFN2xTB, numbers, positions)
    calc.set_verbosity(VERBOSITY_FULL)
    calc.set_output(filepath)
    calc.singlepoint()
    calc.release_output()
    # time.sleep(10)


outputs = []
for i in range(2000):

    with TemporaryDirectory() as tmpdir:
        logs = f"{tmpdir}/logs-{i}.txt"
        calculate_xtb(logs)
    if i % 100 == 0:
        print(i)

All of these issues go away if you do not .set_output(filepath) as xtb just writes output to the console.

Please provide all input and output file such that we confirm your report.-->

Expected behaviour

xtb properly opens, writes to, and closes the file passed into .set_output()

@coltonbh coltonbh added the unconfirmed This report has not yet been confirmed by the developers label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unconfirmed This report has not yet been confirmed by the developers
Projects
None yet
Development

No branches or pull requests

1 participant