Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify blocks yield logic for lines reader #6932

Merged
merged 9 commits into from
Jul 11, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- Fixed release link on Help & Info page ([#6854](https://github.com/pymedusa/Medusa/pull/6854))
- Fixed FreeMobile notifier message encode error ([#6867](https://github.com/pymedusa/Medusa/pull/6867))
- Fixed charset on API v2 responses with plain text content ([#6931](https://github.com/pymedusa/Medusa/pull/6931))
- Fixed logger causing an exception in certain cases ([#6932](https://github.com/pymedusa/Medusa/pull/6932))

-----

Expand Down
46 changes: 15 additions & 31 deletions medusa/logger/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ def read_loglines(log_file=None, modification_time=None, start_index=0, max_line
yield formatter(logline)


def blocks_r(filename, size=64 * 1024, reset_offset=True, encoding='utf-8'):
def blocks_r(filename, size=64 * 1024):
"""
Yields the data within a file in reverse-ordered blocks of given size.

Expand All @@ -219,34 +219,23 @@ def blocks_r(filename, size=64 * 1024, reset_offset=True, encoding='utf-8'):
size (int|None): The block size.
sharkykh marked this conversation as resolved.
Show resolved Hide resolved
If int, the file is yielded in blocks of the specified size.
If None, the file is yielded at once.
reset_offset (bool): Reset the file offset.
If True, starts reading from the end of the file.
Otherwise, starts reading from where the file current position is.
encoding (str|None): The encoding for correct block size computation.
If `str`, must be a valid string encoding.
If None, the default encoding is used.

Yields:
block (str): The data within the blocks.
block (bytes): The data within the blocks.

"""
with io.open(filename, 'r', encoding=encoding) as file_obj:
offset = 0
if reset_offset:
file_size = remaining_size = file_obj.seek(0, os.SEEK_END)
else:
file_size = remaining_size = file_obj.tell()
rounding = 0
with io.open(filename, 'rb') as file_obj:
remaining_size = file_obj.seek(0, os.SEEK_END)
while remaining_size > 0:
offset = min(file_size, offset + size)
file_obj.seek(file_size - offset)
block = file_obj.read(min(remaining_size, size))
remaining_size -= size
yield block[:len(block) + rounding] if rounding else block
block_size = min(remaining_size, size)
file_obj.seek(remaining_size - block_size)
block = file_obj.read(block_size)
remaining_size -= block_size
yield block


def reverse_readlines(filename, skip_empty=True, append_newline=False, block_size=512 * 1024,
reset_offset=True, encoding='utf-8'):
def reverse_readlines(filename, skip_empty=True, append_newline=False,
block_size=128 * 1024, encoding='utf-8'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really think that there's a reason to lower the block size?
512KB to 128KB?

Copy link
Contributor Author

@medariox medariox Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that we only show max 1000 lines in the log viewer, which is roughly 128KB according to my tests (this can vary depending on the content ofc). I don't see why we should keep such a high block size. Also, this seems to suggest that 128KB is the default MMAP_THRESHOLD (that doesn't necessarily mean that it is more performant tho).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then, I don't know enough about it but sure.
I guess it's the same thing as the shutil monkeypatch, where if the block size is larger than X, it doesn't have any noticeable improvement to performance.

"""
Flexible function for reversing read lines incrementally.

Expand All @@ -261,9 +250,6 @@ def reverse_readlines(filename, skip_empty=True, append_newline=False, block_siz
block_size (int|None): The block size.
sharkykh marked this conversation as resolved.
Show resolved Hide resolved
If int, the file is processed in blocks of the specified size.
If None, the file is processed at once.
reset_offset (bool): Reset the file offset.
If True, starts reading from the end of the file.
Otherwise, starts reading from where the file current position is.
encoding (str|None): The encoding for correct block size computation.
sharkykh marked this conversation as resolved.
Show resolved Hide resolved
If `str`, must be a valid string encoding.
If None, the default encoding is used.
Expand All @@ -275,20 +261,18 @@ def reverse_readlines(filename, skip_empty=True, append_newline=False, block_siz
newline = '\n'
empty = ''
remainder = empty
block_generator_kws = dict(size=block_size, reset_offset=reset_offset,
encoding=encoding)
block_generator = blocks_r
for block in block_generator(filename, **block_generator_kws):
lines = block.split(newline)
for block in block_generator(filename, size=block_size):
lines = block.split(b'\n')
if remainder:
lines[-1] = lines[-1] + remainder
remainder = lines[0]
mask = slice(-1, 0, -1)
for line in lines[mask]:
if line or not skip_empty:
yield line + (newline if append_newline else empty)
yield line.decode(encoding) + (newline if append_newline else empty)
if remainder or not skip_empty:
yield remainder + (newline if append_newline else empty)
yield remainder.decode(encoding) + (newline if append_newline else empty)


def filter_logline(logline, min_level=None, thread_name=None, search_query=None):
Expand Down