Add rcutils_list_directory function #249

cottsay · 2020-05-13T23:25:12Z

This change adds a simple function for listing the contents of a directory, and returning the names of those files as a string_array object. The iteration method is largely based on the one used in rcutils_calculate_directory_size.

The signature for this new function is modeled after the existing rcutils_split function:

rcutils/include/rcutils/split.h

Lines 27 to 44 in 6acf8d8

    
           /// Split a given string with the specified delimiter 
        
           /** 
        
            * \param[in] str string to split 
        
            * \param[in] delimiter on where to split 
        
            * \param[in] allocator for allocating new memory for the output array 
        
            * \param[out] string_array with the split tokens 
        
            * \return `RCUTILS_RET_OK` if successful, or 
        
            * \return `RCUTILS_RET_INVALID_ARGUMENT` for invalid arguments, or 
        
            * \return `RCUTILS_RET_BAD_ALLOC` if memory allocation fails, or 
        
            * \return `RCUTILS_RET_ERROR` if an unknown error occurs 
        
            */ 
        
           RCUTILS_PUBLIC 
        
           rcutils_ret_t 
        
           rcutils_split( 
        
             const char * str, 
        
             char delimiter, 
        
             rcutils_allocator_t allocator, 
        
             rcutils_string_array_t * string_array);

Requires #247 and #248

Linux
Linux-aarch64
macOS
Windows

fujitatomoya · 2020-05-18T08:16:59Z

src/filesystem.c

+{
+  RCUTILS_CHECK_FOR_NULL_WITH_MSG(
+    directory_path, "directory_path is null", return RCUTILS_RET_INVALID_ARGUMENT);
+  RCUTILS_CHECK_FOR_NULL_WITH_MSG(


rcutils_string_array_init also does this check.

That's fair. To be pedantic, the argument name could be different in that function so the error message might not be correct, but it might be worth the performance since this is likely to be a rare error case.

fujitatomoya · 2020-05-18T08:31:17Z

src/filesystem.c

+    string_array, "string_array is null", return RCUTILS_RET_INVALID_ARGUMENT);
+
+  // Start with 8 entries
+  rcutils_ret_t ret = rcutils_string_array_init(string_array, 8, &allocator);


I would do check size 1st in the directory, and the allocate array what we need and store the data. this reduces array adjustment operation. I think this is cost effective compared to allocate/de-allocate.

I think my biggest problem with that is that it introduces a race with the filesystem. Entries could be removed or added during or after the initial enumeration and then our size would be wrong. To handle this, we'd essentially need to have all the code already here, plus the initial enumeration to give us the size hint.

I'm also not convinced that the performance would be better, but I could be convinced by some data on the topic.

fujitatomoya · 2020-05-18T08:33:22Z

src/filesystem.c

+    goto fail;
+  }
+  struct dirent * entry;
+  for (; NULL != (entry = readdir(dir)); ++count) {


counting . and .. directory included, is this intended?

Yeah, I felt that we can leave it up to the consumer whether they want to ignore those entries rather than performing the comparisons here. The API proposed in #323 mentions this behavior in the API docs.

src/string_array.c

This change adds a simple function for listing the contents of a directory, and returning the names of those files as a string_array object. Signed-off-by: Scott K Logan <[email protected]>

clalancette · 2020-10-23T13:43:10Z

@cottsay What's the status of this one? Is it something you are still interested in getting in?

cottsay · 2020-10-26T19:03:34Z

What's the status of this one? Is it something you are still interested in getting in?

I really think some sort of directory iteration would be a great thing to have in rcutils, and I have at least one use case in mind in a downstream package.

The main reason I haven't pushed further on this is that I'm having doubts about the scalability of this solution. It's a really convenient API, but having an iteration-based API that doesn't need to allocate space for the entire directory listing might be nice to have. Really, we could do both.

I'd love to hear your thoughts on it.

clalancette · 2020-10-26T19:57:45Z

The main reason I haven't pushed further on this is that I'm having doubts about the scalability of this solution. It's a really convenient API, but having an iteration-based API that doesn't need to allocate space for the entire directory listing might be nice to have.

Oh, that is a really good point. It would be much better not to allocate all of the memory at once, particularly for potentially large directories. I like your idea of a directory iterator; essentially I would think of something like the readdir API on Linux, but cross-platform.

My personal suggestion is that we close this out, and instead open an issue tracking the addition of a directory iterator. But if you still think this has value, we can leave it open.

Karsten1987 · 2021-01-07T20:57:12Z

While I agree with you that there might be better and more efficient methods out there, I wonder if it's not just worth finishing up the current solution in order to move forward and iterate over it in a point of time. Opposing the idea of closing this PR.
If @cottsay has a downstream package using it, the rosbag2 PR would also benefit from it from what I understand.

clalancette · 2021-01-07T21:58:08Z

While I agree with you that there might be better and more efficient methods out there, I wonder if it's not just worth finishing up the current solution in order to move forward and iterate over it in a point of time. Opposing the idea of closing this PR.
If @cottsay has a downstream package using it, the rosbag2 PR would also benefit from it from what I understand.

The issue that @cottsay brings up is more than just being inefficient, though. If you happen to have a directory with many, many files, then calling this API inside of that directory could potentially use up all of the memory on your machine just storing the names of those files. Since this API is not recursive, this may be somewhat mitigated by the limits of the underlying filesystem on the number of files in a directory, but I'd really rather not depend on that.

I think that in order to move forward here we need to switch this to a directory iterator in order to avoid that problem (and that matches what the underlying OS APIs look like as well).

cottsay · 2021-01-07T22:05:52Z

I've prototyped an iteration model, but it wasn't terribly elegant due to some fundamentally different approaches that Windows and Linux take. I'm not overly thrilled about the way errors would be handled, but I'll try to dig up that old branch and put it somewhere for review.

Another reason to take that approach, however, is that we can update the location in rcutils where we're already performing directory iteration to use the new public API and reduce duplication, where the approach presented here duplicates the cumbersome iteration logic.

clalancette · 2021-01-12T13:38:52Z

@cottsay So with #323 open, can we close this one out?

cottsay · 2021-01-12T17:02:51Z

Closed in favor of #323.

cottsay added enhancement New feature or request in progress Actively being worked on (Kanban column) labels May 13, 2020

cottsay self-assigned this May 13, 2020

fujitatomoya requested changes May 18, 2020

View reviewed changes

Add rcutils_list_directory function

290819c

This change adds a simple function for listing the contents of a directory, and returning the names of those files as a string_array object. Signed-off-by: Scott K Logan <[email protected]>

cottsay force-pushed the list_directory branch from 67af7a2 to 290819c Compare June 11, 2020 01:59

Karsten1987 mentioned this pull request Oct 23, 2020

Introduce ros2 bag reindex ros2/rosbag2#539

Closed

cottsay mentioned this pull request Jan 12, 2021

Add an API for directory iteration #323

Merged

cottsay closed this Jan 12, 2021

cottsay deleted the list_directory branch January 12, 2021 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rcutils_list_directory function #249

Add rcutils_list_directory function #249

cottsay commented May 13, 2020 •

edited

Loading

fujitatomoya May 18, 2020

cottsay Jan 12, 2021

fujitatomoya May 18, 2020

cottsay Jan 12, 2021

fujitatomoya May 18, 2020

cottsay Jan 12, 2021

clalancette commented Oct 23, 2020

cottsay commented Oct 26, 2020

clalancette commented Oct 26, 2020

Karsten1987 commented Jan 7, 2021

clalancette commented Jan 7, 2021

cottsay commented Jan 7, 2021

clalancette commented Jan 12, 2021

cottsay commented Jan 12, 2021

	/// Split a given string with the specified delimiter
	/**
	* \param[in] str string to split
	* \param[in] delimiter on where to split
	* \param[in] allocator for allocating new memory for the output array
	* \param[out] string_array with the split tokens
	* \return `RCUTILS_RET_OK` if successful, or
	* \return `RCUTILS_RET_INVALID_ARGUMENT` for invalid arguments, or
	* \return `RCUTILS_RET_BAD_ALLOC` if memory allocation fails, or
	* \return `RCUTILS_RET_ERROR` if an unknown error occurs
	*/
	RCUTILS_PUBLIC
	rcutils_ret_t
	rcutils_split(
	const char * str,
	char delimiter,
	rcutils_allocator_t allocator,
	rcutils_string_array_t * string_array);

Add rcutils_list_directory function #249

Add rcutils_list_directory function #249

Conversation

cottsay commented May 13, 2020 • edited Loading

fujitatomoya May 18, 2020

Choose a reason for hiding this comment

cottsay Jan 12, 2021

Choose a reason for hiding this comment

fujitatomoya May 18, 2020

Choose a reason for hiding this comment

cottsay Jan 12, 2021

Choose a reason for hiding this comment

fujitatomoya May 18, 2020

Choose a reason for hiding this comment

cottsay Jan 12, 2021

Choose a reason for hiding this comment

clalancette commented Oct 23, 2020

cottsay commented Oct 26, 2020

clalancette commented Oct 26, 2020

Karsten1987 commented Jan 7, 2021

clalancette commented Jan 7, 2021

cottsay commented Jan 7, 2021

clalancette commented Jan 12, 2021

cottsay commented Jan 12, 2021

cottsay commented May 13, 2020 •

edited

Loading