Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "Corruption: corrupted compressed block contents" #137

Closed
Avnsx opened this issue Dec 12, 2021 · 22 comments
Closed

Error "Corruption: corrupted compressed block contents" #137

Avnsx opened this issue Dec 12, 2021 · 22 comments

Comments

@Avnsx
Copy link

Avnsx commented Dec 12, 2021

Trying to read google chromes, local storage leveldb. It is located in %LOCALAPPDATA%\Google\Chrome\User Data\Default\Local Storage\leveldb. When deleting all contents of the folder and browsing only a couple websites, then closing chrome, I can read it with plyvel. But when I browse too many websites and close chrome(else it is not readable & the most recent changes to local storage are not saved to local storage leveldb, because chrome is still using it and blocking other programs from reading it, unless you create a temporal copy of the folder and read that instead), it starts outputting the error in the title. How do I solve this issue?

used code:

db = plyvel.DB(r'C:\Users\me\AppData\Local\Google\Chrome\User Data\Default\Local Storage\leveldb', compression=None)
tok = db.get(b'mykey').decode('utf-8') # errors here

I'm using the plyvel for windows 10 fork on python 3.8.10, but I'm very sure it's not the issue and that your most up to date repo, which I can't even install on windows - the most used operating system in the world -, will replicate the exact same behaviour.

@wbolster
Copy link
Owner

i guess the contents are compressed?

@Avnsx
Copy link
Author

Avnsx commented Dec 12, 2021

i guess the contents are compressed?

Yeah well, that would make sense I guess since this only happens if there's a lot of data from local storage continously saved into the local storage leveldb folder. I'm very new to leveldb, do you have any experience with, what compression algorithms are being used commonly with leveldb or did you have the issue yourself ever before? Theoretically, I should be able to uncompress and then be able to read it with plyvel again, but how would I figure out the compression algorithm?

@wbolster
Copy link
Owner

don't specify anything and it will detect and use snappy if needed

@wbolster
Copy link
Owner

which I can't even install on windows - the most used operating system in the world

also wondering what you're trying to imply here

@Avnsx
Copy link
Author

Avnsx commented Dec 12, 2021

don't specify anything and it will detect and use snappy if needed

I just read through the documentation again, I couldn't find any functionality to decompress and just not specifying anything, if I understood you correctly you meant it like this(?):

db = plyvel.DB(dbdir)

that stil ends up in the db.get afterwards, to just error out. Even specifying compression='snappy', ends up in the same error.

I also used repair_db and that causes my database to lose around 80% of containing information; including the key that was I guess saved in the compressed parts of the leveldb?

@wbolster
Copy link
Owner

wbolster commented Dec 12, 2021

just tried this on (a copy of) this directory on my machine:

$ ls ~/.config/google-chrome/Default/'Local Storage'/leveldb/
000005.ldb  003096.ldb  003097.ldb  003099.ldb  003101.log  003102.ldb  CURRENT  LOCK  LOG  LOG.old  MANIFEST-000001

with these versions:

>>> import plyvel
>>> plyvel.__version__
'1.3.0'
>>> plyvel.__leveldb_version__
'1.22'

which gives me

>>> db = plyvel.DB('db/')
>>> next(iter(db))
(b'META:chrome-extension://...', b'...')

and similarly:

>>> db.get(b'META:chrome://bookmarks')
b'...'

i see data from chrome, though it's chrome's internal binary format so good luck interpreting that.

@Avnsx
Copy link
Author

Avnsx commented Dec 12, 2021

I've the exact same versioning as you do, but I can't run your code snippet, without getting the error in title, why is this happening for me??

Using this repo: https://github.com/AustEcon/plyvel-wheels, with python 3.8.10 and Chrome 96.0.4664.93 for windows 10 64bit

@wbolster
Copy link
Owner

🤔 perhaps your leveldb build lacks snappy support altogether? (ldd on the .so file will tell you on linux, no clue about other operating systems)

@Avnsx
Copy link
Author

Avnsx commented Dec 12, 2021

🤔 perhaps your leveldb build lacks snappy support altogether? (ldd on the .so file will tell you on linux, no clue about other operating systems)

First off thanks alot for taking your time and actually trying to help me, I really appreciate you alot 👍 I spent the entire time trying to reproduce this, I eventually switched to a virtual machine with fresh windows 10 and english chrome with python 3.10.1

There I could run your snippet without issues, at the start. But later on I figured it out that theres, some kind of additional compression after the local storage is above 800 kb big, all the files that were in the folder before 800 kb was reached, get unioned to 1 single file, which then ends up being around 200 and 300 kb.

You can reproduce this for yourself, if you delete all contents of the local storage leveldb folder and then browse enough websites until 800 kb is reached.

I prepared this short snippet for the run window, that comes up in windows when you press windows + r, this manages to overload local storage leveldb with enough data everytime and when the additional compression kicks in and you try to read the stuff with plyvel, you'll get the error from the title.

chrome de-de.facebook.com ups.com www.asds.net twitter.com www.wattpad.com de.wikipedia.org www.facebook.com www.instagram.com www.reddit.com www.apple.com vimeo.com www.google.com www.twitch.tv www.youtube.com

python code I used afterwards to read the leveldb after additional compression:

import plyvel
db = plyvel.DB(r'C:\Users\MyUserName\Appdata\Local\Google\Chrome\user data\Default\Local Storage\leveldb')
for each in db.iterator():print(each)

I don't understand what compression is used for this, on wikipedia it says leveldb only uses snappy compression and chrome is listed to be using leveldb, so what are they even doing after 800 kb to cause this error?

i see data from chrome, though it's chrome's internal binary format so good luck interpreting that.

Also this was not the case for me, not everything is using the internal binary format around 90% of it was always entirely visible as raw human readable string for me.

@wbolster
Copy link
Owner

do you have a stack trace? which call fails exactly and where?

never heard of two types of compression in leveldb. missing snappy lib leads to compression error messages that can be confusing. plyvel linux binary wheels accidentally suffered from that at some point in the past

@Avnsx
Copy link
Author

Avnsx commented Dec 12, 2021

do you have a stack trace? which call fails exactly and where?

never heard of two types of compression in leveldb. missing snappy lib leads to compression error messages that can be confusing. plyvel linux binary wheels accidentally suffered from that at some point in the past

Short video I recorded: https://youtu.be/vBLqgjMJelw
It does not show, how the files went from below 800 kb to one file only, because the links I selected in the first run, did not set enough local storage data, to get just below the 800 kb limit, instead ended up with 7 kb.

The 2nd time I ran the same code, after opening way more websites, that loaded above 800 kb into local storage level db folder, extra compression kicked in and the file size was reduced to 247 kb in total. I also checked, this stuff has to be compressed, because websites which for example only used the local storage to save account information such as a token, would remember me even after the compression, so chrome somehow decompresses it and feeds it back to local storage in the browser which you can see if you press F12 > application tab > local storage

Here's the leveldb, that plyvel causes a error with: https://easyupload.io/j2z8mr

Traceback (most recent call last):
  File "C:\Users\Rando\Desktop\dog.py", line 3, in <module>
    for each in db.iterator():print(each)
  File "plyvel\_plyvel.pyx", line 841, in plyvel._plyvel.Iterator.__next__
  File "plyvel\_plyvel.pyx", line 886, in plyvel._plyvel.Iterator.real_next
  File "plyvel\_plyvel.pyx", line 91, in plyvel._plyvel.raise_for_status
plyvel._plyvel.CorruptionError: b'Corruption: corrupted compressed block contents'

Since chrome is based on chromium, I guess this might help if you understand C++ because I don't https://github.com/chromium/chromium/search?q=.ldb

@wbolster

@wbolster
Copy link
Owner

wbolster commented Dec 13, 2021

i tried this on my chrome profile's local storage database which is >10 mb large, and i cannot reproduce at all:

import plyvel
db = plyvel.DB('db')
print(list(db))

this dumps lots of stuff to the screen.

my plyvel is compiled with libsnappy support, as ldd on the installed .so files shows:

$ ldd .direnv/python-3.9.7+/lib/python3.9/site-packages/plyvel/_plyvel.cpython-39-x86_64-linux-gnu.so 
	linux-vdso.so.1 (0x00007ffc64941000)
	libleveldb-44f63a48.so.1.22.0 => /tmp/foo/.direnv/python-3.9.7+/lib/python3.9/site-packages/plyvel/../plyvel.libs/libleveldb-44f63a48.so.1.22.0 (0x00007f81c448b000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f81c424d000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f81c4109000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f81c40ee000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f81c40cd000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f81c3eff000)
	libsnappy-63ba3ec5.so.1.1.8 => /tmp/foo/.direnv/python-3.9.7+/lib/python3.9/site-packages/plyvel/../plyvel.libs/libsnappy-63ba3ec5.so.1.1.8 (0x00007f81c3ce1000)
	/usr/lib64/ld-linux-x86-64.so.2 (0x00007f81c492a000)

@wbolster
Copy link
Owner

i tried the same on your sample database, and it also worked fine:

>>> import plyvel
>>> import pprint
>>> db = plyvel.DB('leveldb')
>>> pprint.pprint(list(db.iterator(include_value=False)))
[b'META:https://de.wikipedia.org',
 b'META:https://vimeo.com',
 b'META:https://www.apple.com',
 ...  # snip
 b'_https://www.youtube.com\x00\x01yt.innertube::nextId',
 b'_https://www.youtube.com\x00\x01yt.innertube::requests',
 b'_https://www.youtube.com\x00\x01ytidb::LAST_RESULT_ENTRY_KEY']

@Avnsx
Copy link
Author

Avnsx commented Dec 13, 2021

i tried the same on your sample database, and it also worked fine:

>>> import plyvel
>>> import pprint
>>> db = plyvel.DB('leveldb')
>>> pprint.pprint(list(db.iterator(include_value=False)))
[b'META:https://de.wikipedia.org',
 b'META:https://vimeo.com',
 b'META:https://www.apple.com',
 ...  # snip
 b'_https://www.youtube.com\x00\x01yt.innertube::nextId',
 b'_https://www.youtube.com\x00\x01yt.innertube::requests',
 b'_https://www.youtube.com\x00\x01ytidb::LAST_RESULT_ENTRY_KEY']

Did you try this with the plyvel for windows version? Maybe @AustEcon did not compile it properly or do you have any idea why I can't run it, because if libsnappy or whatever didn't work at all, I should've been not able to build it / use plyvel on the smaller database in first place right? But as you see it works on the video, unfortunately as soon as it gets bigger and I try to read it with plyvel again I get the error 😨

@wbolster
Copy link
Owner

my testing was on an up-to-date linux system using the official (built by myself 🙃) plyvel wheel packages. i have not tried on windows, and i cannot / will not either; i have not used windows at all for ~20 years now.

that said, technically, snappy is an optional dependency for leveldb, but not compiling leveldb against it is setting yourself up for nasty surprises… since it means databases using compression (most of them in the real world!) cannot be opened. i further suspect leveldb+snappy use opportunistic compression, meaning only data that benefits from it gets compressed. this could explain the ‘tipping point’ you see.

@wbolster
Copy link
Owner

closing since this is very likely not an issue in this repo

@Avnsx
Copy link
Author

Avnsx commented Jul 7, 2022

Since chrome is based on chromium, I guess this might help if you understand C++ because I don't https://github.com/chromium/chromium/search?q=.ldb

Just a side note; I think it's pretty funny how someone from the chromium project / google read through this issue ticket and removed every single line of code that was assosciated with .ldb. I think at this point they're intentionally trying to dodge decompression of chrome's leveldb

@iamqiz
Copy link

iamqiz commented Aug 18, 2022

@Avnsx same question when use window plyvel from AustEcon/plyvel-wheels,
😂
trying to use leveldb in window is very difficult in window 😂
i am so curious why google dont compile leveldb for window 😂

@iamqiz
Copy link

iamqiz commented Aug 18, 2022

@Avnsx a workaround is to use leveldb in window WSL,
see more here
https://gist.github.com/Aceralon/d94a562840b858adc8585d7e44cbaa96

@QGB
Copy link

QGB commented Aug 31, 2022

does plyvel has RepairDB?

@zmic
Copy link

zmic commented Sep 14, 2022

Did you try this with the plyvel for windows version? Maybe @AustEcon did not compile it properly or do you have any idea why I can't run it, because if libsnappy or whatever didn't work at all, I should've been not able to build it / use plyvel on the smaller database in first place right? But as you see it works on the video, unfortunately as soon as it gets bigger and I try to read it with plyvel again I get the error 😨

FYI, I can confirm this problem occurs on Chromium database if your build of leveldb does not link in the snappy library. I had to rebuild leveldb with snappy (on Windows), then the problem disappeared.

@Avnsx
Copy link
Author

Avnsx commented Nov 21, 2022

had to rebuild leveldb with snappy (on Windows), then the problem disappeared.

Can you publish your build, so I can try it? @zmic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants