-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ros2 bag reindex #539
Conversation
f568738
to
eaabdc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I hope you don't mind that I changed your title. That way it's a bit more obvious what this PR is doing.
Thanks for the PR, a couple of remarks though:
- The reindexer has to be more generic. I found a couple of places where
.db3
andsqlite3
is hardcoded. Does it make more sense to have that reindex functionality live within the sqlite3 plugin then? - As mentioned inline, while I appreciate that you follow the style, I don't think we have to introduce interfaces for the reindexer. I don't think we need that flexibility on that level.
- Boost is currently kind of a show-stopper. I think we have to come up with a rcpputils solution which can iterate over files wo/ boost.
rosbag2_cpp/src/rosbag2_cpp/reindexers/sequential_reindexer.cpp
Outdated
Show resolved
Hide resolved
storage_.reset(); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata_ = rosbag2_storage::BagMetadata{}; | ||
|
||
// This reindexer will only work on SQLite files, so this can't change | ||
metadata_.storage_identifier = "sqlite3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that has to be more generic. What happens if I try to reindex a ros1 bagfile? Or some custom storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought sequential_reindexer
was supposed to be the reindexer that worked on the sqlite
files. Not sure why. I guess I'm derpy like that.
Made a few comments on your suggestions, thank you for those! And no problem on the name change - I'm not spectacular with titles :) I did have one question that's been nagging me, though: Since it's theoretically possible to extract all the data that a |
The idea here is that the actual data payload can be huge compared to the metadata/index and distributed over multiple files. With a dedicated metadata file, we have all that data at hand without having to iterate over the payload. |
Ok. Well, the good news is I've gotten the Linux/Unix reindexer working again. The bad news is I don't really have a way of testing the Windows version right now, since ROS 2 refuses to build itself on my Windows VM... Anyhoo, now trying to figure out the best way to transplant the reindexer into the Otherwise, I'll try to wire up a new |
if you want to make your life a bit easier, I guess you could directly plant that reindex logic in a python script within ros2bag. There's a python API for sqlite3 and all it needed would be a bit of logic to fail gracefully when that script is not available for other storage plugins than sqlite3. The reason why I could see that as a feasible solution is that I don't think we need proper API for such a logic - I can't see a use-case other than just running a script to repair a broken record. |
That... Actually might fix a LOT of problems - We wouldn't need to worry about two code paths for finding out the files within a folder in either Windows or Linux, for example. And I've already updated the On a slightly different subject; as you've probably noticed, I realized I was using the wrong license blurb at the top of the reindex files I've added. However, the copyright lint doesn't seem to like what my employers wanted. Is there something I can do to fix this? EDIT: Just realized you meant I can query SQLite from python directly. Even better! Should I undo the changes to the |
I would recommend you'd use the regular license text and change the copyright to reflect your company name. That should pass the linters - as you can see Open Robotics, Amazon and Bosch have contributions as such. For all other legal hints, I would put them in a separate doc block below the copyright header. If that's okay with your employer. |
@jhdcs you might want to have a look at ros2/rcutils#249. It's still in a draft but might be the PR you're looking for the iterate through a directory. |
Got the python version probably working, albeit unlinted. I need to write unit tests for it, which means I need to find some test databases. Should I just copy them over from the other test packages? Also, I'm not sure how to get this to work with compression yet... |
I've gotten the reindex operation working for uncompressed and file-compressed SQLite bags, now. Should I squash the commits? There appears to be a commit somewhere in the rebase that's confusing DCO. |
d7cb035
to
99509f8
Compare
Figured out why the test is failing - in order to support the If we don't want to add new python packages to the workspace, I can comment out the areas of the code that reference it. However, reindexing operations will not be able to support |
Reindexing for message-compressed files has been implemented. This code is ready for review. However, we will need to figure out how we want to handle the inclusion of the |
May I please have another review on this? I'm not sure why the Rpr__rosbag2__ubuntu_focal_amd64 check is failing. |
Okay, got some time to look into it. The Rpr__rosbag2__ubuntu_focal_amd64 check is failing because the Do you have any recommendations for solving this? Without that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - sorry I have not been involved on this PR to date!
I don't think we want to install the python-based zstandard
package - because it may not match up with the version being used by the C++ code. Another reason is that we may support other compression implementations, and we don't want equivalent python bindings for every one of them. The real solution around this is probably to add an endpoint to the rosbag2_py
API to decompress bagfiles, having it choose the correct compression implementation under the hood to do so.
I think as a first step with this feature it would be acceptable to support reindexing uncompressed bags and print out an error message when trying to work on a compressed bag - we could have a followup ticket for Enable reindex on compressed bagfiles
If I am reading correctly, this doesn't need to actually worry about compressed messages, right? Since the type data is available in the sqlite table uncompressed.
rosbag2_cpp/CMakeLists.txt
Outdated
rosidl_typesupport_cpp | ||
rosidl_typesupport_introspection_cpp | ||
PUBLIC | ||
ament_index_cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't add extra indent, removes unrelated whitespace changes
rosbag2_cpp/CMakeLists.txt
Outdated
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove unrelated whitespace change
rosbag2_cpp/package.xml
Outdated
@@ -10,7 +10,10 @@ | |||
|
|||
<buildtool_depend>ament_cmake</buildtool_depend> | |||
|
|||
<build_depend>libboost-filesystem-dev</build_depend> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be left from an old version of master
? You'll probably want to revert all the changes from rosbag2_cpp
, rosbag2_storage_default_plugins
, rosbag2_tests
, and rosbag2_transport
- they don't seem related to your PR.
Additionally, I've noticed you checked in a __pycache__
directory in rosbag2_py
- that should be removed as well.
parser.add_argument( | ||
'-s', '--storage-id', default='sqlite3', | ||
help="storage identifier to be used, defaults to 'sqlite3'") | ||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both compression-format
and compression-mode
choices shouldn't be duplicated. Perhaps they should be factored out of record.py
and this file into a common place - or you could at least pull the values out of record.py
's add_argument
calls into a global level variable for that module, and from ros2bag.verb.record import COMPRESSION_MODES
- something like that. The duplication will inevitable get desynchronized in the future
compression_fmt: Literal['', 'zstd'], | ||
compression_mode: Literal['', 'none', 'file', 'message']) -> DBMetadata: | ||
# Handle compression | ||
if compression_fmt != '': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if compression_fmt != '': | |
if compression_fmt: |
handles both None
and ''
. I would recommend using None
rather than empty-string for optional values.
Also - this block would be much more readable as
if compression_fmt in ['none', 'message']:
# open normally
elif compression_fmt == 'file':
# open compressed
else:
raise ValueError('unknown compression mode')
The "by process of elimintation !=" pattern is a lot less readable, and includes some duplicated lines.
|
||
def reindex(uri: str, | ||
storage_id: str, | ||
compression_fmt: Literal['', 'zstd'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is really a Literal
value - it'll require shotgun surgery if a new compression format is added. You should probably take an Optional[str]
(for both fmt and mode) - the underlying logic can decide whether or not it can handle this value - overkill and brittle to handle it in the type system.
return super(RosbagYamlDumper, self).increase_indent(flow, False) | ||
|
||
|
||
class MetadataWriter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering we already have a MetadataWriter
in C++, this duplicated code will be prone to rot if the metadata format changes at all - this should be a call into rosbag2_storage MetadataIO
for the logic in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I've seen ROS code in the past calling C++ libraries from within Python code. Is there precedent for this, or is this something that will change going forward?
If you are concerned about duplicating functions that are within C++, should I just go back to the C++ version of the code I had originally pushed? Perhaps with a little more logic on the Python side to grab the relative file paths, and thus avoid having to use Boost? Or should I just try to open up the C++ API to accept calls to both get the metadata, and write it out? |
I'm concerned with code duplication in general. I think when Karsten suggested a Python script, I don't think he intended to suggest that we rewrite existing logic in Python - just that a Python top level script would be easier to manage than fully-C++ logic. We definitely do not want to introduce a boost dependency, we intentionally removed it from the ros2 core and rosbag2 in most places. For path manipulation, the C++ stdlib handles a lot of that now, and there are a good number of shims for any missing functions in https://github.com/ros2/rcpputils I personally think that expanding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that overall the changes look good, and it would be nice to get this functionality merged into future releases.
I could revert all of the python changes and go back to the C++ version. However, that still leaves us with the problem that I can't query what files are in a directory. The only other possibility I can think of that doesn't add a new dependency is to figure out how to have Python send a list of strings to C++, and have it converted into a |
Yes - I am currently in the process of refactoring the compression code pretty seriously. Like I said in an earlier comment on this review, though, I think it's perfectly acceptable to partially implement this feature (not supporting compressed bags for a first pass). As a note - I am finding this PR difficult to review due to the sheer size of it - and would prefer smaller PRs that implement individual atomic functionalities. For example - if you are introducing a |
25a6814
to
d499aeb
Compare
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/tooling-wg-breakout-meeting-2021-01-22-rosbag2-reindexing/18489/3 |
Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
5e5798a
to
a2d6321
Compare
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
Distro A, OPSEC #4584 Signed-off-by: Jacob Hassold <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, thanks a lot for keeping this PR alive, I know it's not trivial. A few things here:
- I would try to keep the change to a minimal set. That is, if possible, I would prefer if we'd started without compression and end-to-end integration to fix the internal architecture of reindexing. Similarly, I was wondering what's the purpose of introducing an interface for the indexer. Don't get me wrong here, there might be a legit reason for it, I just want to rule out that we don't introduce unnecessary complexity. That is, I believe a reindexing of the database will always yield the same result, I don't really see what's a sequential reindexing doing differently from a different method of reindexing.
- We've currently introduced a set of functions within
rcutils
(linked inline) which replaces much of the custom logic to iterate through files in a directory. I strongly advertise this. - I feel there's a lot of code around file handling, rosbag versioning and such which could be reduced or reorganized to be reused across the rosbag2 functionality. I believe there's code within the sequential_indexer wich can be reused within the sequential reader and writer.
- I don't really like that
rosbag2_transport
is involved in this. I know it's currently our only entry-point fromros2bag
torosbag2_cpp
, but would it make sense to rather go throughrosbag2_py
instead? I don't think it's a hard requirement here for me, but I think we should generally rethink the architecture on how to deal with python and cpp, now that we haverosbag2_py
#ifdef WIN32 | ||
{ | ||
// Code placed in scope so variables can't accidentally be used elsewhere | ||
WIN32_FIND_DATA FindFileData; | ||
HANDLE hFind; | ||
|
||
std::string search_dir = base_folder.string() + "\\*"; | ||
hFind = FindFirstFile(search_dir, &FindFileData); | ||
// If an error occurs, we want to abort | ||
if (hFind == INVALID_HANDLE_VALUE) { | ||
DWORD dwError = GetLastError(); | ||
std::error_code ec(dwError, std::system_category()); | ||
throw(std::system_error(ec)); | ||
} | ||
|
||
// Loop through the directory, collecting file names as we go | ||
do { | ||
if (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) { | ||
// I guess it's a directory in the bag file? | ||
// Still, not interested in it. | ||
continue; | ||
} else { | ||
auto temp_path = rcpputils::fs::path(ffd.cFileName); | ||
|
||
// We are ONLY interested in database files | ||
if (temp_path.extension().string() != ".db3") { | ||
continue; | ||
} | ||
output.emplace_back(temp_path); | ||
} | ||
} while (FindNextFile(hFind, &ffd) != 0); | ||
|
||
FindClose(hFind); | ||
} | ||
#else | ||
{ | ||
// Code placed in scope so variables can't accidentally be used elsewhere | ||
auto dirp = opendir(base_folder.string().c_str()); | ||
// If an error occurs, we want to abort | ||
if (dirp == NULL) { | ||
throw std::system_error(errno, std::generic_category()); | ||
} | ||
dirent * dp; | ||
while ((dp = readdir(dirp)) != NULL) { | ||
auto non_const_folder = rcpputils::fs::path(base_folder); | ||
auto temp_path = non_const_folder /= rcpputils::fs::path(dp->d_name); | ||
|
||
// We are ONLY interested in database files | ||
if (temp_path.extension().string() != ".db3") { | ||
continue; | ||
} | ||
|
||
output.emplace_back(temp_path); | ||
} | ||
closedir(dirp); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've recently added code which supersedes this: ros2/rcutils#323
// We're on a UNIX system. Import their filesystem stuff instead | ||
#include <dirent.h> | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
reindexer_impl->reindex(storage_options); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -11,6 +11,7 @@ | |||
<buildtool_depend>ament_cmake</buildtool_depend> | |||
|
|||
<depend>ament_index_cpp</depend> | |||
<depend>libboost-filesystem</depend> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<depend>libboost-filesystem</depend> |
const std::vector<rcpputils::fs::path> & files, | ||
const rosbag2_storage::StorageOptions & storage_options); | ||
|
||
// Compairson function for std::sort with our filepath convention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Compairson function for std::sort with our filepath convention | |
// Comparison function for std::sort with our filepath convention |
std::make_unique<rosbag2_storage::MetadataIo>()); | ||
|
||
virtual ~SequentialReindexer(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
namespace reindexers | ||
{ | ||
namespace details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the following two functions should be accessible independently from the reindexer. Aren't we having similar code in the reader already? I think we're introducing code duplication here.
Especially, given that version differentiation, I believe it's important to keep code as such to a minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a spot that you think this would fit better in?
Also, looking back on the code, not sure why I have the version parameter there. Huh. That should probably come out.
@@ -430,19 +430,19 @@ rosbag2_storage::BagMetadata SqliteStorage::get_metadata() | |||
|
|||
auto statement = database_->prepare_statement( | |||
"SELECT name, type, serialization_format, COUNT(messages.id), MIN(messages.timestamp), " | |||
"MAX(messages.timestamp) " | |||
"MAX(messages.timestamp), offered_qos_profiles " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we needing this change here for the reindexing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the original metadata file includes the QOS profiles of the topics, so I have to query them from the database. I did some searching around, and I don't think this change affects anything else in the project, though if you have an alternate way to query the QOS profiles from the database, that might be preferable.
Closing in favor of #641 |
This pull request addresses issue #395.
A new verb,
reindex
has been added, which attempts to reconstruct ametadata.yaml
file from the sqlite3 databases found within the bag.Functionality allowing different forms of
reindex
for bag types has been added as well.Readme has been updated to indicate new verb, as well as how to use it.
Unit tests for
reindex
have been added torosbag2_tests
.