Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for compressing CLP IR streams. #152

Merged
merged 128 commits into from
Aug 29, 2023
Merged
Changes from 1 commit
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
02d2b48
Backup
haiqi96 May 4, 2023
21581c0
simple working version backup
haiqi96 May 5, 2023
5ff962e
backup
haiqi96 May 6, 2023
791cfee
Allow seeking beyond end of file
haiqi96 May 8, 2023
d542e47
Use BufferReader for Decoder
haiqi96 May 8, 2023
3cb8f7c
small clean up
haiqi96 May 8, 2023
21a37a0
Fixes but yet to verify
haiqi96 May 8, 2023
827c762
small clean up
haiqi96 May 8, 2023
1971ae4
Some refactoring and decompression looks ok
haiqi96 May 8, 2023
1764cfb
ok for compression
haiqi96 May 8, 2023
5319067
bug fix for proper compression
haiqi96 May 9, 2023
f616add
make error code interface matches with what's expected
haiqi96 May 9, 2023
8d2fce0
Support checkpoint in new file reader
haiqi96 May 10, 2023
d2fc530
Make existing clp code use checkpoint feature
haiqi96 May 10, 2023
921517a
updated some comments on potential improvements and add a temporary u…
haiqi96 May 11, 2023
38b0da7
remove eof flag
haiqi96 May 11, 2023
e663312
remove checkpoint related functions from readerbuffer
haiqi96 May 12, 2023
fd02a5f
refactor try_read logic
haiqi96 May 13, 2023
237bb3f
clean up
haiqi96 May 13, 2023
6cd7516
Allow more flexible reset and delay buffer loading for seek
haiqi96 May 14, 2023
eacad4a
refactor
haiqi96 May 14, 2023
dc0d91e
rename new file reader to something else
haiqi96 May 14, 2023
ccad421
More refactoring
haiqi96 May 15, 2023
2324a21
remove fstat dependency from libarchive reader
haiqi96 May 16, 2023
9e48e12
Remove code specific for BFR
haiqi96 May 16, 2023
57dd310
Remove BFR dependency on BR
haiqi96 May 16, 2023
d84276c
remove m_cursor and replace with buffer begin_pos
haiqi96 May 17, 2023
5845805
Add UTF8 utility to the FBR.
haiqi96 May 17, 2023
b90e0d6
replace UTF8 validation logic
haiqi96 May 18, 2023
e9230d0
Some personal preference
haiqi96 May 18, 2023
27af203
remove m_cursor and replace with buffer begin_pos
haiqi96 May 18, 2023
e82fbb4
int8_t -> char refactoring
haiqi96 May 19, 2023
1afc47c
further clean up
haiqi96 May 19, 2023
7868252
remove function that is unnecessary
haiqi96 May 19, 2023
10c1bad
checkpoint for profiling stuff
haiqi96 May 21, 2023
9119493
adding more comments and small refactoring
haiqi96 May 21, 2023
487bc49
Temporary Fix to disable string_view optimization
haiqi96 May 31, 2023
be7951a
Fix for code review
haiqi96 May 31, 2023
edc2acd
Missing fixes
haiqi96 May 31, 2023
e32d95e
Remove redundant constructor
haiqi96 May 31, 2023
fa0bb30
Address code review concern
haiqi96 Jun 1, 2023
6e5b013
BufferReader: Remove possibility for invalid internal buffer; Some cl…
kirkrodrigues Jul 17, 2023
ff52053
Initial change that utilizes BufferReader in the code.
haiqi96 Jul 25, 2023
47f9573
Small fix for read_to_delimiter
haiqi96 Jul 25, 2023
16637d1
simplification
haiqi96 Jul 25, 2023
3f8156f
Try to add some comments
haiqi96 Jul 26, 2023
e587427
Allow size=0 for BufferReader and simply part of the code
haiqi96 Jul 26, 2023
8bf1772
simplification by not requiring buffer to start from an aligned pos
haiqi96 Jul 30, 2023
d802310
My way of simplification
haiqi96 Jul 30, 2023
eae3cae
some other small refactoring
haiqi96 Jul 31, 2023
8f63dc2
Simplify away set buffer_size.
haiqi96 Jul 31, 2023
0776782
Simplify seek_from_begin
haiqi96 Jul 31, 2023
41b9bce
Maybe unnecessary simplification
haiqi96 Jul 31, 2023
0e6e00c
simplify BufferReader
haiqi96 Jul 31, 2023
ff0d016
optimize for buffer alignment
haiqi96 Jul 31, 2023
ca8c05a
Handle buffer combining case
haiqi96 Aug 1, 2023
84316d7
Small fixes
haiqi96 Aug 1, 2023
12ee687
First round of fixes
haiqi96 Aug 8, 2023
7877350
temporary test
haiqi96 Aug 8, 2023
ca7f48b
Fix and refactor for read_to_delim
haiqi96 Aug 8, 2023
f47613e
small touch
haiqi96 Aug 8, 2023
85f2fdc
Further refactoring
haiqi96 Aug 8, 2023
33689d3
Merge branch 'newFileReader' of https://github.com/haiqi96/clp_fork i…
haiqi96 Aug 8, 2023
749e652
small update to the temporary test
haiqi96 Aug 8, 2023
25e8dd0
small clean up
haiqi96 Aug 8, 2023
5aa4595
Refactored try_seek_from_begin function and made it consistent with d…
haiqi96 Aug 9, 2023
d74c631
Refill_buffer_reader refactor
haiqi96 Aug 9, 2023
0906e4d
Fix comments for test
haiqi96 Aug 9, 2023
504a030
fixes
haiqi96 Aug 9, 2023
66aa405
Apply clang-format to merge-conflict files
haiqi96 Aug 9, 2023
39d5af4
Merge branch 'main' into newFileReader
haiqi96 Aug 9, 2023
26b13f7
Apply clang-format to new classes
haiqi96 Aug 9, 2023
426dea3
Small cleanup
haiqi96 Aug 9, 2023
afff981
Some refactor for clang-tidy
haiqi96 Aug 10, 2023
4ca5bd8
Run clang-format on unit-test
haiqi96 Aug 10, 2023
f4a5cef
Replace unique ptr with vector
haiqi96 Aug 10, 2023
1e55966
Add error code to close function and some small refactoring
haiqi96 Aug 10, 2023
d8a50d4
fix
haiqi96 Aug 10, 2023
f3f9ca5
Manually cherrypick the changes
haiqi96 Aug 10, 2023
beb5640
Fix some clang-tidy issues
haiqi96 Aug 10, 2023
f49d7c1
more clean up
haiqi96 Aug 11, 2023
a3fbf31
Write missing docstrings and minor refactoring.
kirkrodrigues Aug 11, 2023
3e9f564
Return num_bytes_read correctly
haiqi96 Aug 12, 2023
8d7cc2c
merge
haiqi96 Aug 12, 2023
1c53b57
Clean-up BufferReader::try_read_to_delimiter
kirkrodrigues Aug 12, 2023
2fe5c77
BufferReader: Reorder methods according to guidelines.
kirkrodrigues Aug 12, 2023
36508b7
Libarchive*: Some clean-up.
kirkrodrigues Aug 13, 2023
47a5218
Undo unnecessary changes to ffi/ir_stream/encoding_methods.hpp
kirkrodrigues Aug 13, 2023
03991d7
Undo unnecessary changes and clean-up ffi/ir_stream/decoding_methods.*
kirkrodrigues Aug 13, 2023
a2ccc4d
Merge remote-tracking branch 'origin/newFileReader' into ir-decompres…
haiqi96 Aug 14, 2023
7bc93c9
Fixes
haiqi96 Aug 14, 2023
c1efd29
BufferedFileReader: Combine OperationFailed and OperationFailedWithMsg.
kirkrodrigues Aug 14, 2023
cf3c6e3
BufferedFileReader: Replace quantize_to_buffer_size with general method.
kirkrodrigues Aug 14, 2023
e5bc643
Remove include-grouping comments since clang-format handles it.
kirkrodrigues Aug 14, 2023
c8a9981
BufferedFileReader: Reset buffer reader even on EOF.
kirkrodrigues Aug 14, 2023
f28dea9
BufferedFileReader: Reduce indentation by rewriting branches.
kirkrodrigues Aug 14, 2023
8051ed2
BufferedFileReader: Return appropriate error code when trying to seek…
kirkrodrigues Aug 14, 2023
3aaf117
BufferedFileReader: Refactor refill_reader_buffer
kirkrodrigues Aug 14, 2023
9f97835
BufferedFileReader: Refactor resize_buffer_from_pos
kirkrodrigues Aug 14, 2023
53c5580
BufferedFileReader: Remaining refactoring
kirkrodrigues Aug 14, 2023
64d3a5d
BufferedFileReader: Fix docstrings
kirkrodrigues Aug 14, 2023
455b3cd
Replace off64_t with generic type.
kirkrodrigues Aug 14, 2023
08581d7
BufferedFileReader: Reorder methods according to guidelines.
kirkrodrigues Aug 14, 2023
c270b79
Make peek functions const
kirkrodrigues Aug 14, 2023
07fba97
Undo my mistake for close
kirkrodrigues Aug 14, 2023
1b5e51a
FileCompressor: Clean-up
kirkrodrigues Aug 14, 2023
9cdfe57
LibarchiveFileReader: Move new methods into the right section.
kirkrodrigues Aug 14, 2023
fd74ed9
LibarchiveReader: Space fix.
kirkrodrigues Aug 14, 2023
84028bb
BufferedFileReader: Basic refactor of unit tests.
kirkrodrigues Aug 14, 2023
ebf9d4b
East-const fixes.
kirkrodrigues Aug 14, 2023
492bd96
Fix macOS build issue.
kirkrodrigues Aug 14, 2023
e1787de
Add missing include.
kirkrodrigues Aug 14, 2023
b6c1c8a
Merge remote-tracking branch 'origin/newFileReader' into ir-decompres…
haiqi96 Aug 15, 2023
5e8cb40
clangformat
haiqi96 Aug 15, 2023
f76caf9
Merge remote-tracking branch 'origin/main' into ir-decompression-new
haiqi96 Aug 15, 2023
7c3dc03
remove unused functions
haiqi96 Aug 15, 2023
d45fd38
Undo formatting changes which should be they're own PR.
kirkrodrigues Aug 16, 2023
dd91a64
Fix includes
kirkrodrigues Aug 16, 2023
1e45fa6
another small clean up
haiqi96 Aug 16, 2023
6473e6e
Merge branch 'ir-decompression-new' of https://github.com/haiqi96/clp…
haiqi96 Aug 16, 2023
788bf36
small fixes
haiqi96 Aug 16, 2023
0629ea0
Refactoring/restructuring:
kirkrodrigues Aug 28, 2023
f8f587a
Address todos: Rename generic_parse_tokens -> deserialize_ir_message …
kirkrodrigues Aug 28, 2023
48d7b01
Add boost-outcome as submodule; LogEventDeserializer: Explicitly spec…
kirkrodrigues Aug 28, 2023
1170333
Clean-up LogEventDeserializer; Move template methods into cpp with ex…
kirkrodrigues Aug 29, 2023
7ae1e86
FileCompressor: Move template implementation into cpp with explicit s…
kirkrodrigues Aug 29, 2023
9e3e6b4
Archive: Move template implementation into cpp with explicit speciali…
kirkrodrigues Aug 29, 2023
8cbefa4
Remove unnecessary cAlwaysFalse const.
kirkrodrigues Aug 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Allow size=0 for BufferReader and simply part of the code
haiqi96 committed Jul 30, 2023
commit e58742730d6d588c886cb7a93d1201003938ef61
2 changes: 1 addition & 1 deletion components/core/src/BufferReader.cpp
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@
#include <string.h>

BufferReader::BufferReader (const char* data, size_t data_size) {
if (nullptr == data || 0 == data_size) {
if (nullptr == data) {
throw OperationFailed(ErrorCode_BadParam, __FILENAME__, __LINE__);
}
m_internal_buf = data;
90 changes: 29 additions & 61 deletions components/core/src/BufferedFileReader.cpp
Original file line number Diff line number Diff line change
@@ -66,11 +66,9 @@ ErrorCode BufferedFileReader::try_seek_from_begin (size_t pos) {
// adjust the buffer reader pos
m_buffer_reader->seek_from_begin(pos - m_buffer_begin_pos);
} else {
if (m_buffer_reader.has_value()) {
if (ErrorCode_Success == m_buffer_reader->try_seek_from_begin(pos - m_buffer_begin_pos)) {
m_file_pos = pos;
return ErrorCode_Success;
}
if (ErrorCode_Success == m_buffer_reader->try_seek_from_begin(pos - m_buffer_begin_pos)) {
m_file_pos = pos;
return ErrorCode_Success;
}
// Handle the case where buffer is empty or doesn't contain enough data for seek
if (false == m_checkpoint_pos.has_value()) {
@@ -80,9 +78,9 @@ ErrorCode BufferedFileReader::try_seek_from_begin (size_t pos) {
if (offset == -1) {
return ErrorCode_errno;
}
m_buffer_reader.reset();
m_buffer_reader.emplace(m_buffer.get(), 0);
} else {
auto data_size = get_data_size();
auto data_size = m_buffer_reader->get_buffer_size();
size_t num_bytes_to_refill = pos - (m_buffer_begin_pos + data_size);
size_t quantized_refill_size = quantize_to_buffer_size(num_bytes_to_refill);

@@ -113,18 +111,13 @@ ErrorCode BufferedFileReader::try_read (char* buf, size_t num_bytes_to_read,
}

num_bytes_read = 0;
if (false == m_buffer_reader.has_value()) {
// refill the buffer if not initialized
auto error_code = refill_reader_buffer(m_buffer_size);
if (ErrorCode_Success != error_code) {
return error_code;
}
}
// keep reading until enough data is read or an eof is seen
while (true) {
size_t bytes_read {0};
auto remaining_bytes_to_read = num_bytes_to_read - num_bytes_read;
auto error_code = m_buffer_reader->try_read(buf + num_bytes_read, remaining_bytes_to_read, bytes_read);
// here EOF is allowed because it simply means we have exhausted the
// buffer, but not necessarily the file itself
if (ErrorCode_Success != error_code && ErrorCode_EndOfFile != error_code) {
return error_code;
}
@@ -155,26 +148,19 @@ ErrorCode BufferedFileReader::try_read_to_delimiter (char delim, bool keep_delim
}

bool found_delim {false};
size_t read_size {0};
if (false == m_buffer_reader.has_value()) {
// refill the buffer if not initialized
auto error_code = refill_reader_buffer(m_buffer_size);
if (ErrorCode_Success != error_code) {
return error_code;
}
}
size_t total_append_length {0};
while (false == found_delim) {
size_t length {0};
if (ErrorCode_Success == m_buffer_reader->try_read_to_delimiter(delim, keep_delimiter, append, str, length)) {
found_delim = true;
}
m_file_pos += length;
read_size += length;
total_append_length += length;

if (false == found_delim) {
auto error_code = refill_reader_buffer(m_buffer_size);
if (ErrorCode_EndOfFile == error_code) {
if (read_size == 0) {
if (total_append_length == 0) {
return ErrorCode_EndOfFile;
}
return ErrorCode_Success;
@@ -200,7 +186,7 @@ ErrorCode BufferedFileReader::try_open (const string& path) {
m_path = path;
m_file_pos = 0;
m_buffer_begin_pos = 0;
m_buffer_reader.reset();
m_buffer_reader.emplace(m_buffer.get(), 0);
return ErrorCode_Success;
}

@@ -257,11 +243,6 @@ void BufferedFileReader::reset_checkpoint () {
if (false == m_checkpoint_pos.has_value()) {
return;
}
if (false == m_buffer_reader.has_value()) {
m_checkpoint_pos.reset();
return;
}

auto data_size = m_buffer_reader->get_buffer_size();
if (data_size != m_buffer_size) {
// allocate new buffer for buffered data that hasn't been seek passed
@@ -313,7 +294,7 @@ ErrorCode BufferedFileReader::peek_buffered_data (size_t size_to_peek, const cha
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
// Refill the buffer if necessary
// Refill the buffer if it is not loaded yet
if (false == m_buffer_reader.has_value()) {
auto error_code = refill_reader_buffer(m_buffer_size);
if (ErrorCode_Success != error_code) {
@@ -326,13 +307,6 @@ ErrorCode BufferedFileReader::peek_buffered_data (size_t size_to_peek, const cha
return ErrorCode_Success;
}

size_t BufferedFileReader::get_data_size () const {
if (false == m_buffer_reader.has_value()) {
return 0;
}
return m_buffer_reader->get_buffer_size();
}

size_t BufferedFileReader::quantize_to_buffer_size (size_t size) {
return (1 + ((size - 1) >> m_buffer_exp)) << m_buffer_exp;
}
@@ -346,7 +320,7 @@ ErrorCode BufferedFileReader::refill_reader_buffer (size_t refill_size,
size_t& num_bytes_refilled) {
num_bytes_refilled = 0;
if (false == m_checkpoint_pos.has_value()) {
auto data_size = get_data_size();
auto data_size = m_buffer_reader->get_buffer_size();
// recover from a previous reset if necessary
if (data_size > refill_size) {
m_buffer = make_unique<char[]>(refill_size);
@@ -359,38 +333,32 @@ ErrorCode BufferedFileReader::refill_reader_buffer (size_t refill_size,
m_buffer_begin_pos += data_size;
m_buffer_reader.emplace(m_buffer.get(), num_bytes_refilled);
} else {
if (false == m_buffer_reader.has_value()) {
auto error_code = try_read_into_buffer(m_fd, m_buffer.get(), refill_size,
num_bytes_refilled);
if (error_code != ErrorCode_Success) {
return error_code;
}
m_buffer_reader.emplace(m_buffer.get(), num_bytes_refilled);
} else {
// Messy way of copying data from old buffer to new buffer
auto data_size = m_buffer_reader->get_buffer_size();
auto new_buffer = make_unique<char[]>(data_size + refill_size);
memcpy(new_buffer.get(), m_buffer.get(), data_size);
auto error_code = try_read_into_buffer(m_fd, &new_buffer[data_size], refill_size,
num_bytes_refilled);
if (error_code != ErrorCode_Success) {
return error_code;
}
m_buffer = std::move(new_buffer);
data_size += num_bytes_refilled;
m_buffer_reader.emplace(m_buffer.get(), data_size);
// Messy way of copying data from old buffer to new buffer
auto data_size = m_buffer_reader->get_buffer_size();
auto new_buffer = make_unique<char[]>(data_size + refill_size);
memcpy(new_buffer.get(), m_buffer.get(), data_size);

// Read data to the new buffer, with offset = data_size
auto error_code = try_read_into_buffer(m_fd, &new_buffer[data_size], refill_size,
num_bytes_refilled);
if (error_code != ErrorCode_Success) {
return error_code;
}
m_buffer = std::move(new_buffer);
m_buffer_reader.emplace(m_buffer.get(), data_size + num_bytes_refilled);
}
// this line is here to handle if we have seek to a position
// before calling refill. if we
m_buffer_reader->seek_from_begin(m_file_pos - m_buffer_begin_pos);
return ErrorCode_Success;
}

static ErrorCode try_read_into_buffer(int fd, char* buffer, size_t num_bytes_to_read,
size_t& num_bytes_read) {
num_bytes_read = 0;
// keep reading from the fd until seeing a 0, which means eof
// keep reading from the fd until enough bytes are read
while (true) {
size_t remaining_bytes_to_read = num_bytes_to_read - num_bytes_read;
auto remaining_bytes_to_read = num_bytes_to_read - num_bytes_read;
auto bytes_read = ::read(fd, buffer + num_bytes_read, remaining_bytes_to_read);
if (bytes_read == -1) {
return ErrorCode_errno;
2 changes: 0 additions & 2 deletions components/core/src/BufferedFileReader.hpp
Original file line number Diff line number Diff line change
@@ -186,8 +186,6 @@ class BufferedFileReader : public ReaderInterface {

private:
// Methods
[[nodiscard]] size_t get_data_size() const;

/**
* Quantize the given size to be the next integer multiple of buffer_size
* @param size