Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for compressing CLP IR streams. #152

Merged
merged 128 commits into from
Aug 29, 2023
Merged
Changes from 1 commit
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
02d2b48
Backup
haiqi96 May 4, 2023
21581c0
simple working version backup
haiqi96 May 5, 2023
5ff962e
backup
haiqi96 May 6, 2023
791cfee
Allow seeking beyond end of file
haiqi96 May 8, 2023
d542e47
Use BufferReader for Decoder
haiqi96 May 8, 2023
3cb8f7c
small clean up
haiqi96 May 8, 2023
21a37a0
Fixes but yet to verify
haiqi96 May 8, 2023
827c762
small clean up
haiqi96 May 8, 2023
1971ae4
Some refactoring and decompression looks ok
haiqi96 May 8, 2023
1764cfb
ok for compression
haiqi96 May 8, 2023
5319067
bug fix for proper compression
haiqi96 May 9, 2023
f616add
make error code interface matches with what's expected
haiqi96 May 9, 2023
8d2fce0
Support checkpoint in new file reader
haiqi96 May 10, 2023
d2fc530
Make existing clp code use checkpoint feature
haiqi96 May 10, 2023
921517a
updated some comments on potential improvements and add a temporary u…
haiqi96 May 11, 2023
38b0da7
remove eof flag
haiqi96 May 11, 2023
e663312
remove checkpoint related functions from readerbuffer
haiqi96 May 12, 2023
fd02a5f
refactor try_read logic
haiqi96 May 13, 2023
237bb3f
clean up
haiqi96 May 13, 2023
6cd7516
Allow more flexible reset and delay buffer loading for seek
haiqi96 May 14, 2023
eacad4a
refactor
haiqi96 May 14, 2023
dc0d91e
rename new file reader to something else
haiqi96 May 14, 2023
ccad421
More refactoring
haiqi96 May 15, 2023
2324a21
remove fstat dependency from libarchive reader
haiqi96 May 16, 2023
9e48e12
Remove code specific for BFR
haiqi96 May 16, 2023
57dd310
Remove BFR dependency on BR
haiqi96 May 16, 2023
d84276c
remove m_cursor and replace with buffer begin_pos
haiqi96 May 17, 2023
5845805
Add UTF8 utility to the FBR.
haiqi96 May 17, 2023
b90e0d6
replace UTF8 validation logic
haiqi96 May 18, 2023
e9230d0
Some personal preference
haiqi96 May 18, 2023
27af203
remove m_cursor and replace with buffer begin_pos
haiqi96 May 18, 2023
e82fbb4
int8_t -> char refactoring
haiqi96 May 19, 2023
1afc47c
further clean up
haiqi96 May 19, 2023
7868252
remove function that is unnecessary
haiqi96 May 19, 2023
10c1bad
checkpoint for profiling stuff
haiqi96 May 21, 2023
9119493
adding more comments and small refactoring
haiqi96 May 21, 2023
487bc49
Temporary Fix to disable string_view optimization
haiqi96 May 31, 2023
be7951a
Fix for code review
haiqi96 May 31, 2023
edc2acd
Missing fixes
haiqi96 May 31, 2023
e32d95e
Remove redundant constructor
haiqi96 May 31, 2023
fa0bb30
Address code review concern
haiqi96 Jun 1, 2023
6e5b013
BufferReader: Remove possibility for invalid internal buffer; Some cl…
kirkrodrigues Jul 17, 2023
ff52053
Initial change that utilizes BufferReader in the code.
haiqi96 Jul 25, 2023
47f9573
Small fix for read_to_delimiter
haiqi96 Jul 25, 2023
16637d1
simplification
haiqi96 Jul 25, 2023
3f8156f
Try to add some comments
haiqi96 Jul 26, 2023
e587427
Allow size=0 for BufferReader and simply part of the code
haiqi96 Jul 26, 2023
8bf1772
simplification by not requiring buffer to start from an aligned pos
haiqi96 Jul 30, 2023
d802310
My way of simplification
haiqi96 Jul 30, 2023
eae3cae
some other small refactoring
haiqi96 Jul 31, 2023
8f63dc2
Simplify away set buffer_size.
haiqi96 Jul 31, 2023
0776782
Simplify seek_from_begin
haiqi96 Jul 31, 2023
41b9bce
Maybe unnecessary simplification
haiqi96 Jul 31, 2023
0e6e00c
simplify BufferReader
haiqi96 Jul 31, 2023
ff0d016
optimize for buffer alignment
haiqi96 Jul 31, 2023
ca8c05a
Handle buffer combining case
haiqi96 Aug 1, 2023
84316d7
Small fixes
haiqi96 Aug 1, 2023
12ee687
First round of fixes
haiqi96 Aug 8, 2023
7877350
temporary test
haiqi96 Aug 8, 2023
ca7f48b
Fix and refactor for read_to_delim
haiqi96 Aug 8, 2023
f47613e
small touch
haiqi96 Aug 8, 2023
85f2fdc
Further refactoring
haiqi96 Aug 8, 2023
33689d3
Merge branch 'newFileReader' of https://github.com/haiqi96/clp_fork i…
haiqi96 Aug 8, 2023
749e652
small update to the temporary test
haiqi96 Aug 8, 2023
25e8dd0
small clean up
haiqi96 Aug 8, 2023
5aa4595
Refactored try_seek_from_begin function and made it consistent with d…
haiqi96 Aug 9, 2023
d74c631
Refill_buffer_reader refactor
haiqi96 Aug 9, 2023
0906e4d
Fix comments for test
haiqi96 Aug 9, 2023
504a030
fixes
haiqi96 Aug 9, 2023
66aa405
Apply clang-format to merge-conflict files
haiqi96 Aug 9, 2023
39d5af4
Merge branch 'main' into newFileReader
haiqi96 Aug 9, 2023
26b13f7
Apply clang-format to new classes
haiqi96 Aug 9, 2023
426dea3
Small cleanup
haiqi96 Aug 9, 2023
afff981
Some refactor for clang-tidy
haiqi96 Aug 10, 2023
4ca5bd8
Run clang-format on unit-test
haiqi96 Aug 10, 2023
f4a5cef
Replace unique ptr with vector
haiqi96 Aug 10, 2023
1e55966
Add error code to close function and some small refactoring
haiqi96 Aug 10, 2023
d8a50d4
fix
haiqi96 Aug 10, 2023
f3f9ca5
Manually cherrypick the changes
haiqi96 Aug 10, 2023
beb5640
Fix some clang-tidy issues
haiqi96 Aug 10, 2023
f49d7c1
more clean up
haiqi96 Aug 11, 2023
a3fbf31
Write missing docstrings and minor refactoring.
kirkrodrigues Aug 11, 2023
3e9f564
Return num_bytes_read correctly
haiqi96 Aug 12, 2023
8d7cc2c
merge
haiqi96 Aug 12, 2023
1c53b57
Clean-up BufferReader::try_read_to_delimiter
kirkrodrigues Aug 12, 2023
2fe5c77
BufferReader: Reorder methods according to guidelines.
kirkrodrigues Aug 12, 2023
36508b7
Libarchive*: Some clean-up.
kirkrodrigues Aug 13, 2023
47a5218
Undo unnecessary changes to ffi/ir_stream/encoding_methods.hpp
kirkrodrigues Aug 13, 2023
03991d7
Undo unnecessary changes and clean-up ffi/ir_stream/decoding_methods.*
kirkrodrigues Aug 13, 2023
a2ccc4d
Merge remote-tracking branch 'origin/newFileReader' into ir-decompres…
haiqi96 Aug 14, 2023
7bc93c9
Fixes
haiqi96 Aug 14, 2023
c1efd29
BufferedFileReader: Combine OperationFailed and OperationFailedWithMsg.
kirkrodrigues Aug 14, 2023
cf3c6e3
BufferedFileReader: Replace quantize_to_buffer_size with general method.
kirkrodrigues Aug 14, 2023
e5bc643
Remove include-grouping comments since clang-format handles it.
kirkrodrigues Aug 14, 2023
c8a9981
BufferedFileReader: Reset buffer reader even on EOF.
kirkrodrigues Aug 14, 2023
f28dea9
BufferedFileReader: Reduce indentation by rewriting branches.
kirkrodrigues Aug 14, 2023
8051ed2
BufferedFileReader: Return appropriate error code when trying to seek…
kirkrodrigues Aug 14, 2023
3aaf117
BufferedFileReader: Refactor refill_reader_buffer
kirkrodrigues Aug 14, 2023
9f97835
BufferedFileReader: Refactor resize_buffer_from_pos
kirkrodrigues Aug 14, 2023
53c5580
BufferedFileReader: Remaining refactoring
kirkrodrigues Aug 14, 2023
64d3a5d
BufferedFileReader: Fix docstrings
kirkrodrigues Aug 14, 2023
455b3cd
Replace off64_t with generic type.
kirkrodrigues Aug 14, 2023
08581d7
BufferedFileReader: Reorder methods according to guidelines.
kirkrodrigues Aug 14, 2023
c270b79
Make peek functions const
kirkrodrigues Aug 14, 2023
07fba97
Undo my mistake for close
kirkrodrigues Aug 14, 2023
1b5e51a
FileCompressor: Clean-up
kirkrodrigues Aug 14, 2023
9cdfe57
LibarchiveFileReader: Move new methods into the right section.
kirkrodrigues Aug 14, 2023
fd74ed9
LibarchiveReader: Space fix.
kirkrodrigues Aug 14, 2023
84028bb
BufferedFileReader: Basic refactor of unit tests.
kirkrodrigues Aug 14, 2023
ebf9d4b
East-const fixes.
kirkrodrigues Aug 14, 2023
492bd96
Fix macOS build issue.
kirkrodrigues Aug 14, 2023
e1787de
Add missing include.
kirkrodrigues Aug 14, 2023
b6c1c8a
Merge remote-tracking branch 'origin/newFileReader' into ir-decompres…
haiqi96 Aug 15, 2023
5e8cb40
clangformat
haiqi96 Aug 15, 2023
f76caf9
Merge remote-tracking branch 'origin/main' into ir-decompression-new
haiqi96 Aug 15, 2023
7c3dc03
remove unused functions
haiqi96 Aug 15, 2023
d45fd38
Undo formatting changes which should be they're own PR.
kirkrodrigues Aug 16, 2023
dd91a64
Fix includes
kirkrodrigues Aug 16, 2023
1e45fa6
another small clean up
haiqi96 Aug 16, 2023
6473e6e
Merge branch 'ir-decompression-new' of https://github.com/haiqi96/clp…
haiqi96 Aug 16, 2023
788bf36
small fixes
haiqi96 Aug 16, 2023
0629ea0
Refactoring/restructuring:
kirkrodrigues Aug 28, 2023
f8f587a
Address todos: Rename generic_parse_tokens -> deserialize_ir_message …
kirkrodrigues Aug 28, 2023
48d7b01
Add boost-outcome as submodule; LogEventDeserializer: Explicitly spec…
kirkrodrigues Aug 28, 2023
1170333
Clean-up LogEventDeserializer; Move template methods into cpp with ex…
kirkrodrigues Aug 29, 2023
7ae1e86
FileCompressor: Move template implementation into cpp with explicit s…
kirkrodrigues Aug 29, 2023
9e3e6b4
Archive: Move template implementation into cpp with explicit speciali…
kirkrodrigues Aug 29, 2023
8cbefa4
Remove unnecessary cAlwaysFalse const.
kirkrodrigues Aug 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix comments for test
haiqi96 committed Aug 9, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 0906e4dafe908efb8912502c9592c4dfbd38e57f
82 changes: 42 additions & 40 deletions components/core/tests/test-ir_encoding_methods.cpp
Original file line number Diff line number Diff line change
@@ -87,7 +87,7 @@ bool encode_message (epoch_time_ms_t timestamp, string_view message, string& log
* Helper function that decodes a message of encoding type = encoded_variable_t
* from the ir_buf
* @tparam encoded_variable_t Type of the encoded variable
* @param ir_buf
* @param reader
* @param message
* @param decoded_ts Returns the decoded timestamp
* @return IRErrorCode_Success on success, otherwise
@@ -97,7 +97,7 @@ bool encode_message (epoch_time_ms_t timestamp, string_view message, string& log
* encoded_variable_t == four_byte_encoded_variable_t
*/
template <typename encoded_variable_t>
IRErrorCode decode_next_message (BufferReader& ir_buf, string& message, epoch_time_ms_t& decoded_ts);
IRErrorCode decode_next_message (BufferReader& reader, string& message, epoch_time_ms_t& decoded_ts);

/**
* Struct to hold the timestamp info from the IR stream's metadata
@@ -184,15 +184,15 @@ bool encode_message (epoch_time_ms_t timestamp, string_view message, string& log
}

template <typename encoded_variable_t>
IRErrorCode decode_next_message (BufferReader& ir_buf, string& message, epoch_time_ms_t& decoded_ts) {
IRErrorCode decode_next_message (BufferReader& reader, string& message, epoch_time_ms_t& decoded_ts) {
static_assert(is_same_v<encoded_variable_t, eight_byte_encoded_variable_t> ||
is_same_v<encoded_variable_t, four_byte_encoded_variable_t>);

if constexpr (is_same_v<encoded_variable_t, eight_byte_encoded_variable_t>) {
return ffi::ir_stream::eight_byte_encoding::decode_next_message(ir_buf, message,
return ffi::ir_stream::eight_byte_encoding::decode_next_message(reader, message,
decoded_ts);
} else {
return ffi::ir_stream::four_byte_encoding::decode_next_message(ir_buf, message, decoded_ts);
return ffi::ir_stream::four_byte_encoding::decode_next_message(reader, message, decoded_ts);
}
}

@@ -206,37 +206,40 @@ static void set_timestamp_info (const nlohmann::json& metadata_json, TimestampIn

TEST_CASE("get_encoding_type", "[ffi][get_encoding_type]") {
bool is_four_bytes_encoding;

// Test eight-byte encoding
vector<int8_t> eight_byte_encoding_vec{EightByteEncodingMagicNumber,
EightByteEncodingMagicNumber + MagicNumberLength};

// Test eight-byte encoding
BufferReader eight_byte_encoding_buffer{
BufferReader eight_byte_ir_buffer{
size_checked_pointer_cast<const char>(eight_byte_encoding_vec.data()),
eight_byte_encoding_vec.size()
};
REQUIRE(get_encoding_type(eight_byte_encoding_buffer, is_four_bytes_encoding) ==
REQUIRE(get_encoding_type(eight_byte_ir_buffer, is_four_bytes_encoding) ==
IRErrorCode::IRErrorCode_Success);
REQUIRE(match_encoding_type<eight_byte_encoded_variable_t>(is_four_bytes_encoding));

// Test four-byte encoding
vector<int8_t> four_byte_encoding_vec{FourByteEncodingMagicNumber,
FourByteEncodingMagicNumber + MagicNumberLength};

BufferReader four_byte_encoding_buffer{
BufferReader four_byte_ir_buffer{
size_checked_pointer_cast<const char>(four_byte_encoding_vec.data()),
four_byte_encoding_vec.size()
};
REQUIRE(get_encoding_type(four_byte_encoding_buffer, is_four_bytes_encoding) ==
REQUIRE(get_encoding_type(four_byte_ir_buffer, is_four_bytes_encoding) ==
IRErrorCode::IRErrorCode_Success);
REQUIRE(match_encoding_type<four_byte_encoded_variable_t>(is_four_bytes_encoding));

// Test error on incomplete ir_buffer
// Test error on empty and incomplete ir_buffer
BufferReader empty_ir_buffer(size_checked_pointer_cast<const char>(four_byte_encoding_vec.data()), 0);
REQUIRE(get_encoding_type(empty_ir_buffer, is_four_bytes_encoding) ==
IRErrorCode::IRErrorCode_Incomplete_IR);

BufferReader incomplete_buffer{
size_checked_pointer_cast<const char>(four_byte_encoding_vec.data()),
four_byte_encoding_vec.size() - 1
};

REQUIRE(get_encoding_type(incomplete_buffer, is_four_bytes_encoding) ==
IRErrorCode::IRErrorCode_Incomplete_IR);

@@ -264,45 +267,36 @@ TEMPLATE_TEST_CASE("decode_preamble", "[ffi][decode_preamble]", four_byte_encode
const size_t encoded_preamble_end_pos = ir_buf.size();

// Check if encoding type is properly read
BufferReader encoding_buffer{
BufferReader ir_buffer{
size_checked_pointer_cast<const char>(ir_buf.data()), ir_buf.size()
};
bool is_four_bytes_encoding;
REQUIRE(get_encoding_type(encoding_buffer, is_four_bytes_encoding) ==
REQUIRE(get_encoding_type(ir_buffer, is_four_bytes_encoding) ==
IRErrorCode::IRErrorCode_Success);
REQUIRE(match_encoding_type<TestType>(is_four_bytes_encoding));
REQUIRE(MagicNumberLength == encoding_buffer.get_pos());
REQUIRE(MagicNumberLength == ir_buffer.get_pos());

// Test if preamble can be decoded correctly
TimestampInfo ts_info;
encoded_tag_t metadata_type{0};
size_t metadata_pos{0};
uint16_t metadata_size{0};
REQUIRE(decode_preamble(encoding_buffer, metadata_type, metadata_pos, metadata_size) ==
REQUIRE(decode_preamble(ir_buffer, metadata_type, metadata_pos, metadata_size) ==
IRErrorCode::IRErrorCode_Success);
REQUIRE(encoded_preamble_end_pos == encoding_buffer.get_pos());

auto json_metadata_ptr = reinterpret_cast<char*>(ir_buf.data() + metadata_pos);
string_view json_metadata_ref {json_metadata_ptr, metadata_size};
REQUIRE(encoded_preamble_end_pos == ir_buffer.get_pos());

// Test if preamble can be decoded by the string copy method
std::vector<int8_t> json_metadata_vec;
encoding_buffer.seek_from_begin(MagicNumberLength);
REQUIRE(decode_preamble(encoding_buffer, metadata_type, json_metadata_vec) ==
IRErrorCode::IRErrorCode_Success);
string_view json_metadata_copied { reinterpret_cast<const char*>(json_metadata_vec.data()),
json_metadata_vec.size() };
REQUIRE (json_metadata_copied == json_metadata_ref);
char* metadata_ptr{size_checked_pointer_cast<char>(ir_buf.data()) + metadata_pos};
string_view json_metadata{metadata_ptr, metadata_size};

auto metadata_json = nlohmann::json::parse(json_metadata_ref);
auto metadata_json = nlohmann::json::parse(json_metadata);
REQUIRE(ffi::ir_stream::cProtocol::Metadata::VersionValue ==
metadata_json.at(ffi::ir_stream::cProtocol::Metadata::VersionKey));
REQUIRE(ffi::ir_stream::cProtocol::Metadata::EncodingJson == metadata_type);
set_timestamp_info(metadata_json, ts_info);
REQUIRE(timestamp_pattern_syntax == ts_info.timestamp_pattern_syntax);
REQUIRE(time_zone_id == ts_info.time_zone_id);
REQUIRE(timestamp_pattern == ts_info.timestamp_pattern);
REQUIRE(encoded_preamble_end_pos == encoding_buffer.get_pos());
REQUIRE(encoded_preamble_end_pos == ir_buffer.get_pos());

if constexpr (is_same_v<TestType, four_byte_encoded_variable_t>) {
REQUIRE(reference_ts ==
@@ -311,6 +305,15 @@ TEMPLATE_TEST_CASE("decode_preamble", "[ffi][decode_preamble]", four_byte_encode
.get<string>()));
}

// Test if preamble can be decoded by the string copy method
std::vector<int8_t> json_metadata_vec;
ir_buffer.seek_from_begin(MagicNumberLength);
REQUIRE(decode_preamble(ir_buffer, metadata_type, json_metadata_vec) ==
IRErrorCode::IRErrorCode_Success);
string_view json_metadata_copied {size_checked_pointer_cast<const char>(json_metadata_vec.data()), json_metadata_vec.size()};
// Crosscheck with the json_metadata decoded previously
REQUIRE (json_metadata_copied == json_metadata);

// Test if incomplete IR can be detected
ir_buf.resize(encoded_preamble_end_pos - 1);
BufferReader incomplete_preamble_buffer{size_checked_pointer_cast<const char>(ir_buf.data()),
@@ -343,12 +346,11 @@ TEMPLATE_TEST_CASE("decode_next_message_general", "[ffi][decode_next_message]",
const size_t encoded_message_end_pos = ir_buf.size();
const size_t encoded_message_start_pos = 0;

// Test if message can be decoded properly

BufferReader ir_buffer{size_checked_pointer_cast<const char>(ir_buf.data()), ir_buf.size()};
string decoded_message;
epoch_time_ms_t timestamp;

// Test if message can be decoded properly
REQUIRE(IRErrorCode::IRErrorCode_Success ==
decode_next_message<TestType>(ir_buffer, decoded_message, timestamp));
REQUIRE(message == decoded_message);
@@ -393,7 +395,7 @@ TEST_CASE("message_decode_error", "[ffi][decode_next_message]")
epoch_time_ms_t timestamp;

// Test if a trailing escape triggers a decoder error
auto ir_with_extra_escape {ir_buf};
auto ir_with_extra_escape{ir_buf};
ir_with_extra_escape.at(logtype_end_pos - 1) = ffi::cVariablePlaceholderEscapeCharacter;
BufferReader ir_with_extra_escape_buffer{
size_checked_pointer_cast<const char>(ir_with_extra_escape.data()),
@@ -471,11 +473,11 @@ TEMPLATE_TEST_CASE("decode_ir_complete", "[ffi][decode_next_message]",
reference_messages.push_back(message);
reference_timestamps.push_back(ts);

BufferReader complete_encoding_buffer{size_checked_pointer_cast<const char>(ir_buf.data()),
BufferReader complete_ir_buffer{size_checked_pointer_cast<const char>(ir_buf.data()),
ir_buf.size()};

bool is_four_bytes_encoding;
REQUIRE(get_encoding_type(complete_encoding_buffer, is_four_bytes_encoding) ==
REQUIRE(get_encoding_type(complete_ir_buffer, is_four_bytes_encoding) ==
IRErrorCode::IRErrorCode_Success);
REQUIRE(match_encoding_type<TestType>(is_four_bytes_encoding));

@@ -484,11 +486,11 @@ TEMPLATE_TEST_CASE("decode_ir_complete", "[ffi][decode_next_message]",
encoded_tag_t metadata_type;
size_t metadata_pos;
uint16_t metadata_size;
REQUIRE(decode_preamble(complete_encoding_buffer, metadata_type, metadata_pos, metadata_size) ==
REQUIRE(decode_preamble(complete_ir_buffer, metadata_type, metadata_pos, metadata_size) ==
IRErrorCode::IRErrorCode_Success);
REQUIRE(encoded_preamble_end_pos == complete_encoding_buffer.get_pos());
REQUIRE(encoded_preamble_end_pos == complete_ir_buffer.get_pos());

auto json_metadata_ptr = reinterpret_cast<char*>(ir_buf.data() + metadata_pos);
auto* json_metadata_ptr{size_checked_pointer_cast<char>(ir_buf.data() + metadata_pos)};
string_view json_metadata {json_metadata_ptr, metadata_size};
auto metadata_json = nlohmann::json::parse(json_metadata);
REQUIRE(ffi::ir_stream::cProtocol::Metadata::VersionValue ==
@@ -503,10 +505,10 @@ TEMPLATE_TEST_CASE("decode_ir_complete", "[ffi][decode_next_message]",
epoch_time_ms_t timestamp;
for (size_t ix = 0; ix < reference_messages.size(); ix++) {
REQUIRE(IRErrorCode::IRErrorCode_Success ==
decode_next_message<TestType>(complete_encoding_buffer, decoded_message,
decode_next_message<TestType>(complete_ir_buffer, decoded_message,
timestamp));
REQUIRE(decoded_message == reference_messages[ix]);
REQUIRE(timestamp == reference_timestamps[ix]);
}
REQUIRE(complete_encoding_buffer.get_pos() == ir_buf.size());
REQUIRE(complete_ir_buffer.get_pos() == ir_buf.size());
}