Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rcutils_list_directory function #249

Closed
wants to merge 1 commit into from
Closed

Conversation

cottsay
Copy link
Member

@cottsay cottsay commented May 13, 2020

This change adds a simple function for listing the contents of a directory, and returning the names of those files as a string_array object. The iteration method is largely based on the one used in rcutils_calculate_directory_size.

The signature for this new function is modeled after the existing rcutils_split function:

/// Split a given string with the specified delimiter
/**
* \param[in] str string to split
* \param[in] delimiter on where to split
* \param[in] allocator for allocating new memory for the output array
* \param[out] string_array with the split tokens
* \return `RCUTILS_RET_OK` if successful, or
* \return `RCUTILS_RET_INVALID_ARGUMENT` for invalid arguments, or
* \return `RCUTILS_RET_BAD_ALLOC` if memory allocation fails, or
* \return `RCUTILS_RET_ERROR` if an unknown error occurs
*/
RCUTILS_PUBLIC
rcutils_ret_t
rcutils_split(
const char * str,
char delimiter,
rcutils_allocator_t allocator,
rcutils_string_array_t * string_array);

Requires #247 and #248

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@cottsay cottsay added enhancement New feature or request in progress Actively being worked on (Kanban column) labels May 13, 2020
@cottsay cottsay self-assigned this May 13, 2020
{
RCUTILS_CHECK_FOR_NULL_WITH_MSG(
directory_path, "directory_path is null", return RCUTILS_RET_INVALID_ARGUMENT);
RCUTILS_CHECK_FOR_NULL_WITH_MSG(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rcutils_string_array_init also does this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. To be pedantic, the argument name could be different in that function so the error message might not be correct, but it might be worth the performance since this is likely to be a rare error case.

string_array, "string_array is null", return RCUTILS_RET_INVALID_ARGUMENT);

// Start with 8 entries
rcutils_ret_t ret = rcutils_string_array_init(string_array, 8, &allocator);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do check size 1st in the directory, and the allocate array what we need and store the data. this reduces array adjustment operation. I think this is cost effective compared to allocate/de-allocate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my biggest problem with that is that it introduces a race with the filesystem. Entries could be removed or added during or after the initial enumeration and then our size would be wrong. To handle this, we'd essentially need to have all the code already here, plus the initial enumeration to give us the size hint.

I'm also not convinced that the performance would be better, but I could be convinced by some data on the topic.

goto fail;
}
struct dirent * entry;
for (; NULL != (entry = readdir(dir)); ++count) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counting . and .. directory included, is this intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I felt that we can leave it up to the consumer whether they want to ignore those entries rather than performing the comparisons here. The API proposed in #323 mentions this behavior in the API docs.

src/string_array.c Outdated Show resolved Hide resolved
This change adds a simple function for listing the contents of a
directory, and returning the names of those files as a string_array
object.

Signed-off-by: Scott K Logan <[email protected]>
@clalancette
Copy link
Contributor

@cottsay What's the status of this one? Is it something you are still interested in getting in?

@cottsay
Copy link
Member Author

cottsay commented Oct 26, 2020

What's the status of this one? Is it something you are still interested in getting in?

I really think some sort of directory iteration would be a great thing to have in rcutils, and I have at least one use case in mind in a downstream package.

The main reason I haven't pushed further on this is that I'm having doubts about the scalability of this solution. It's a really convenient API, but having an iteration-based API that doesn't need to allocate space for the entire directory listing might be nice to have. Really, we could do both.

I'd love to hear your thoughts on it.

@clalancette
Copy link
Contributor

The main reason I haven't pushed further on this is that I'm having doubts about the scalability of this solution. It's a really convenient API, but having an iteration-based API that doesn't need to allocate space for the entire directory listing might be nice to have.

Oh, that is a really good point. It would be much better not to allocate all of the memory at once, particularly for potentially large directories. I like your idea of a directory iterator; essentially I would think of something like the readdir API on Linux, but cross-platform.

My personal suggestion is that we close this out, and instead open an issue tracking the addition of a directory iterator. But if you still think this has value, we can leave it open.

@Karsten1987
Copy link
Contributor

While I agree with you that there might be better and more efficient methods out there, I wonder if it's not just worth finishing up the current solution in order to move forward and iterate over it in a point of time. Opposing the idea of closing this PR.
If @cottsay has a downstream package using it, the rosbag2 PR would also benefit from it from what I understand.

@clalancette
Copy link
Contributor

While I agree with you that there might be better and more efficient methods out there, I wonder if it's not just worth finishing up the current solution in order to move forward and iterate over it in a point of time. Opposing the idea of closing this PR.
If @cottsay has a downstream package using it, the rosbag2 PR would also benefit from it from what I understand.

The issue that @cottsay brings up is more than just being inefficient, though. If you happen to have a directory with many, many files, then calling this API inside of that directory could potentially use up all of the memory on your machine just storing the names of those files. Since this API is not recursive, this may be somewhat mitigated by the limits of the underlying filesystem on the number of files in a directory, but I'd really rather not depend on that.

I think that in order to move forward here we need to switch this to a directory iterator in order to avoid that problem (and that matches what the underlying OS APIs look like as well).

@cottsay
Copy link
Member Author

cottsay commented Jan 7, 2021

I've prototyped an iteration model, but it wasn't terribly elegant due to some fundamentally different approaches that Windows and Linux take. I'm not overly thrilled about the way errors would be handled, but I'll try to dig up that old branch and put it somewhere for review.

Another reason to take that approach, however, is that we can update the location in rcutils where we're already performing directory iteration to use the new public API and reduce duplication, where the approach presented here duplicates the cumbersome iteration logic.

@clalancette
Copy link
Contributor

@cottsay So with #323 open, can we close this one out?

@cottsay
Copy link
Member Author

cottsay commented Jan 12, 2021

Closed in favor of #323.

@cottsay cottsay closed this Jan 12, 2021
@cottsay cottsay deleted the list_directory branch January 12, 2021 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress Actively being worked on (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants