-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify blocks yield logic for lines reader #6932
Conversation
For reference, I found this - https://github.com/RobinNil/file_read_backwards
|
I'm very tired right now, but maybe this will help: # Put in project root
# coding=utf-8
from __future__ import unicode_literals, print_function
data = (
b'AAAAAAAAAAAAAAA\n'
b'AA\xe2\x80\xa0AAAAAAAAAA\n'
b'AAAAAAAAAAAAAAA\n'
b'AAAAAAAAAAAAAAA\n'
b'AAAAAAAAAAAAAAA\n'
b'AAAAAAAAAAAAAAA\n'
b'AAAAAAAAAAAAAAA\n'
b'AAAAAAAAAAAAAAA\n'
b'AAAAAAAAAAAAAAA\n'
)
with open('test.log', 'wb') as fh:
fh.write(data)
import sys
from medusa.logger import reverse_readlines
# Print block sizes that fail (easier to test split points)
for i in range(1, len(data)):
try:
lines = [
line for line in reverse_readlines('test.log', block_size=i)
]
except UnicodeDecodeError:
print('failed with block size =', i)
for index, line in enumerate(lines):
try:
print('lines[' + str(index) + '/' + str(len(lines) - 1) + ']', line)
except UnicodeEncodeError:
if sys.version_info[0] == 2 and isinstance(line, unicode):
line = line.encode('utf-8')
print(line)
else:
raise The idea with this is, it should succeed with any block size and not print any "failed" messages. Results on
|
This is simply awesome! Thank you so much @sharkykh |
Yep. Looks like it. For me running
|
def reverse_readlines(filename, skip_empty=True, append_newline=False, block_size=512 * 1024, | ||
reset_offset=True, encoding='utf-8'): | ||
def reverse_readlines(filename, skip_empty=True, append_newline=False, | ||
block_size=128 * 1024, encoding='utf-8'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really think that there's a reason to lower the block size?
512KB to 128KB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that we only show max 1000 lines in the log viewer, which is roughly 128KB according to my tests (this can vary depending on the content ofc). I don't see why we should keep such a high block size. Also, this seems to suggest that 128KB is the default MMAP_THRESHOLD
(that doesn't necessarily mean that it is more performant tho).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then, I don't know enough about it but sure.
I guess it's the same thing as the shutil
monkeypatch, where if the block size is larger than X, it doesn't have any noticeable improvement to performance.
I'll just leave this here, it's a different test that the old code fails and new one passes, # Put in project root
# coding=utf-8
from __future__ import unicode_literals, print_function
data = (
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ\n'
b'\xe2\x80\xa0\n'
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ\n'
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ\n'
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b'ABCDEFG\xe2\x80\xa0HIJKLMNOPQRSTUVWXYZ'
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ\n'
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
)
unicode_data = data.decode('utf-8').split('\n')
unicode_data.reverse()
with open('test.log', 'wb') as fh:
fh.write(data)
from medusa.logger import reverse_readlines
# Print block sizes that fail (easier to test split points)
for i in range(1, len(data)):
try:
lines = [
line for line in reverse_readlines('test.log', block_size=i)
]
if lines != unicode_data:
print('block_size = %d' % i)
print(sum(len(l) for l in lines), lines)
print(sum(len(l) for l in unicode_data), unicode_data)
except UnicodeDecodeError:
print('failed with block size = %d' % i) |
General overview of the bug: #6927 (comment)
Fixes #6927
Fixes #5463