Trouble parsing .mca region files #1

JeremyFlorence · 2018-04-08T06:17:00Z

I am trying to read the NBT data from region files but the output seems for every file seems to be an empty dictionary.

Here's an example of what I'm getting using the CLI, though this also happens while trying to read .mca files in python code as well.

$ nbt -r r.-1.-1.mca --plain
{}

The text was updated successfully, but these errors were encountered:

vberlier · 2018-04-12T10:35:35Z

.mca files are not nbt files. The Region File Format uses several distinct regions of binary data to store minecraft chunks and some of these regions contain compressed nbt data. You can't use nbtlib to read .mca files directly. I'm planning to write a python 3 alternative to pymclevel based on nbtlib but for now you'll need to extract the nbt regions from the .mca files yourself and feed them to nbtlib manually.

MestreLion · 2019-12-08T08:30:17Z

First of all, huge thanks for this library, @vberlier ! I was a long time user of pymclevel for my hobby projects, but its API is bad and coding is even worse, not to mention it's stuck in Python 2. Your API looks clean and elegant, I love the way tags are subclasses of Python's builtins, and coding is very high-quality. I'll replace pymclevel and use this as my backend from now on. You're a godsend!

That said...

.mca files are not nbt files. The Region File Format uses several distinct regions of binary data to store minecraft chunks and some of these regions contain compressed nbt data. You can't use nbtlib to read .mca files directly. I'm planning to write a python 3 alternative to pymclevel based on nbtlib but for now you'll need to extract the nbt regions from the .mca files yourself and feed them to nbtlib manually.

What's the status on this .mca reader? Is there any prototype or draft? Would like me to implement one and send a PR? If so, any recommendations on the API? Any preferences on module, classes and method names? Any suggestion on method / function signatures?

vberlier · 2019-12-10T09:49:25Z

Thanks for the feedback! It means a lot to me :)

I've been thinking about writing a pythonic alternative to pymclevel ever since I started working on nbtlib. I have a bunch of ideas when it comes to API design but I've had to work on other things and haven't had enough time left to get started with the project and truly commit to it.

I guess if there's enough interest I could try to finally begin working on it.

MestreLion · 2019-12-12T05:20:00Z

Take a look on this WIP, already fully functional to read and save region files:

https://github.com/MestreLion/mcworldlib/blob/master/mcworldlib/region.py

I tried my best to follow nbtlib's API: RegionFile has .load(filename) for file paths, .from_buffer(buff) for file-likes, .parse(buff) to do the actual parsing, .save(filename) with optional filename (to save as a different file) and .write(buff) to do the actual data write.

It also follow some great nbtlib concepts: RegionFile is a dict (actually an abc.MutableMapping, which has the same functionality but fewer caveats) of RegionChunks, and Chunk and RegionChunk are nbtlib.Compound. So you can do things like:

region = mc.load('somefile.mca')
for pos, chunk in region.items():
    chunk[Path('"".Level.x.y...)] = ...
del region[(0, 1)]
region.update(another_region)
if (3, 4) in region:
    print region.get(5, 6)
    print(chunk['']['DataVersion'])
region.save()

And so on...

It was intentionally designed to integrate nicely with nbtlib, so if you want I can prepare a version for PR. Being a part of nbtlib I would adjust the import from absolute to relative, possibly rename load() to load_region() (or maybe do so in your __init__.py), chance license to MIT, etc etc.

My goal with MCWorldLib is to resurrect my former PyMCToolsLib, now using Python 3 and nbtlib as its backend instead of the venerable but defunct pymclevel

ch-yx · 2019-12-12T07:24:07Z

I heard that this method have been fixed by mojang. https://youtu.be/uw7vEGhKoH8 Is that true? I don't think it is possible to be fixed without changing .mca format. 🤔

MestreLion · 2019-12-12T08:46:28Z

I heard that this method have been fixed by mojang. https://youtu.be/uw7vEGhKoH8 Is that true? I don't think it is possible to be fixed without changing .mca format. thinking

I didn't watch the whole video, but it seems to be discussing the 1MB limit for chunk, correct? Well, the only thing that really imposes this limit is the 1-byte unsigned chunk sector_count in the region header. So sector count is at most 255, and each sector is 4096 bytes, so each chunk would have at most 1MB.

But... this sector_count is completely irrelevant for reading the region and its chunks. Because in the chunk header there's another size information: 4-byte unsigned length, which can express sizes of up to 4GB. And that's the size used to actually read the chunk data.

I've just created an .mca reader and didn't even use sector_count. It's completely redundant as the actual sector count could be derived from length. I only checked to see if both matched, but other than that it can be safely ignored. My tool could easily drop this check and it would read and write chunks over 1MB just fine. So other tools and Minecraft itself could do the same too, no format change required.

vberlier · 2019-12-14T17:11:23Z

Your project looks really good! I'm trying to keep nbtlib focused on dealing with nbt itself instead of its various applications in the Minecraft ecosystem. I think it would definitely make sense for your project to be a package of its own on PyPI 👍

Don't hesitate opening other issues if you think there are things that could be tweaked in nbtlib to make your implementation easier.

ch-yx · 2019-12-17T03:58:22Z

MestreLion/mcworldlib@22e5897#r36412924

I think that nbtlib.File.root is good enough hiding the unnamed root.

Compound has a lot of method and I am afraid that what you are to do will bring side effects or make it no longer consistent .

Even if you can slove those problem, It's not worth it.😥

MestreLion · 2019-12-19T12:30:12Z

Your project looks really good! I'm trying to keep nbtlib focused on dealing with nbt itself instead of its various applications in the Minecraft ecosystem. I think it would definitely make sense for your project to be a package of its own on PyPI +1

Thanks! It's been my pet project in the last few weeks. I'll surely publish it on PyPI once I have established a functional API.

That said, I still think nbtlib itself should have an mca parser, because the Region/Anvil, along with NBT, are the only actual data formats in Minecraft ecosystem, everything else can be considered data semantics. Adding region file parsing to nbtlib will enable it to handle all raw data parsing required by any Minecraft tool, so it becomes a python library equivalent of, for example, NBTExplorer. It's a good spot to draw the line and set the scope.

All you need is my RegionFile, you don't even need Chunk, and RegionChunk can be considered an implementation detail, after all a chunk is a Compound tag.

Don't hesitate opening other issues if you think there are things that could be tweaked in nbtlib to make your implementation easier.

Thanks for such a welcoming response! Right now I've only found 3-4 issues that would definitely make my life easier. In order of importance:

Split File functionality into Root (as in root compound tag), inheriting from Compound, making File inherit from Root.
In String, do not use the default serialize_tag() behavior from Base.__str__(). str(foo) should give the un-quoted and un-escaped string value, which is really ugly to obtain otherwise.
For Root tags, hide the unnamed tag on access so self['a'] acts like current self.root['a'], as if [''] didn't exist, adding it back only on .write(). @ch-yx already dissuaded me on that, but it's worth pointing it here.
Read/Write methods API is somewhat weird.

I'll add issues and PRs on each of them, elaborating more about rationale and proposals. I'll also add more the more I use nbtlib it as the backend of my MCWorldLib project. So far it's been a huge pleasure to use this!

ch-yx · 2019-12-19T15:58:19Z

In String, do not use the default serialize_tag() behavior from Base.str(). str(foo) should give the un-quoted and un-escaped string value, which is really ugly to obtain otherwise.

I have notice that before. I m using nbtlib.String("ab\"c") + "" to bypass it.

ch-yx · 2019-12-19T16:02:50Z

but I did not say it because I think that consistency > convenience .
Let @vberlier decide .

ch-yx · 2019-12-19T16:21:53Z

as we have come to serialize_tag, I do some testing and found something wrong.
str(nbtlib.Byte(4)) == 'Byte(4)b'
int.__str__(nbtlib.Byte(4)) == 'Byte(4)'

nbtlib/nbtlib/literal/serializer.py

Lines 128 to 131 in 0b454b7

    
           def serialize_numeric(self, tag): 
        
               """Return the literal representation of a numeric tag.""" 
        
               str_func = int.__str__ if isinstance(tag, int) else float.__str__ 
        
               return str_func(tag) + tag.suffix

#probably due to python 3.8 ?

yes,it is.

float.__str__ is int.__str__ is object.__str__ ==>true

even int.__str__(4.0) is accepted now.

python 3.8 :

bpo-36793: Removed str implementations from builtin types bool, int, float, complex and few classes from the standard library. They now inherit str() from object.

python/cpython@96aeaec

MestreLion · 2019-12-20T04:53:11Z

I have notice that before. I m using nbtlib.String("ab\"c") + "" to bypass it.

You can use str.__str__(tag), and that's what I will suggest in my PR to use as String.__str__()

MestreLion · 2019-12-20T05:00:16Z

but I did not say it because I think that consistency > convenience .
Let @vberlier decide .

I believe the opposite, convenience should trump consistency. But, regardless of our opinions, it seems that was convenience, not consistency, that drove the decision to use tags' .__str__() as a means to convert it to snbt representation, instead of, say, a .to_snbt() or .serialize() method. It's handy and convenient (for those who care about snbt at least), even if breaks some consistency with python conventions (for example, collections' such as dict and list, usually on their __str__() they cast repr() on each member, not str().

MestreLion · 2019-12-20T05:03:09Z

Let me create the issues so we can discuss each one separately without flooding this one, so this can focus solely on whether or not nbtlib should have a built-in mca parser.

vberlier · 2019-12-24T17:35:18Z

Alright I'm going to look into all of this, thank you for contributing! And now that I think about it a built-in mca parser could definitely belong in something like nbtlib.contrib. I think you're right, one of the strength of the python ecosystem is the batteries-included philosophy, popular projects tend to include a contrib package for common use-cases and utilities.

Examples: $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[{name: "Compound tag #0"}]' {name: "Compound tag #0", created-on: 1264099775885L} $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[0]' --json {"name": "Compound tag #0", "created-on": 1264099775885} $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[].name' --json "Compound tag #0" "Compound tag #1" $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[].name' --unpack Compound tag #0 Compound tag #1

vberlier closed this as completed Apr 12, 2018

Worf2340 mentioned this issue Jul 14, 2019

Help with Minecraft Region Data #32

Closed

This was referenced Dec 20, 2019

Move some File functionality to an intermediate Root class #55

Merged

return the string itself in String.__str__(), without quotes or escaping #57

Merged

This was referenced Dec 24, 2019

Python 3.8 support #59

Closed

Add nbtlib.contrib package? #60

Open

ch-yx mentioned this issue Dec 24, 2019

serialize_tag, python 3.8 #61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble parsing .mca region files #1

Trouble parsing .mca region files #1

JeremyFlorence commented Apr 8, 2018

vberlier commented Apr 12, 2018

MestreLion commented Dec 8, 2019

vberlier commented Dec 10, 2019

MestreLion commented Dec 12, 2019 •

edited

Loading

ch-yx commented Dec 12, 2019 via email

MestreLion commented Dec 12, 2019

vberlier commented Dec 14, 2019

ch-yx commented Dec 17, 2019

MestreLion commented Dec 19, 2019

ch-yx commented Dec 19, 2019

ch-yx commented Dec 19, 2019

ch-yx commented Dec 19, 2019 •

edited

Loading

MestreLion commented Dec 20, 2019

MestreLion commented Dec 20, 2019 •

edited

Loading

MestreLion commented Dec 20, 2019

vberlier commented Dec 24, 2019

Trouble parsing .mca region files #1

Trouble parsing .mca region files #1

Comments

JeremyFlorence commented Apr 8, 2018

vberlier commented Apr 12, 2018

MestreLion commented Dec 8, 2019

vberlier commented Dec 10, 2019

MestreLion commented Dec 12, 2019 • edited Loading

ch-yx commented Dec 12, 2019 via email

MestreLion commented Dec 12, 2019

vberlier commented Dec 14, 2019

ch-yx commented Dec 17, 2019

MestreLion commented Dec 19, 2019

ch-yx commented Dec 19, 2019

ch-yx commented Dec 19, 2019

ch-yx commented Dec 19, 2019 • edited Loading

MestreLion commented Dec 20, 2019

MestreLion commented Dec 20, 2019 • edited Loading

MestreLion commented Dec 20, 2019

vberlier commented Dec 24, 2019

MestreLion commented Dec 12, 2019 •

edited

Loading

ch-yx commented Dec 19, 2019 •

edited

Loading

MestreLion commented Dec 20, 2019 •

edited

Loading