Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble parsing .mca region files #1

Closed
JeremyFlorence opened this issue Apr 8, 2018 · 16 comments
Closed

Trouble parsing .mca region files #1

JeremyFlorence opened this issue Apr 8, 2018 · 16 comments

Comments

@JeremyFlorence
Copy link

I am trying to read the NBT data from region files but the output seems for every file seems to be an empty dictionary.

Here's an example of what I'm getting using the CLI, though this also happens while trying to read .mca files in python code as well.

$ nbt -r r.-1.-1.mca --plain
{}

@vberlier
Copy link
Owner

.mca files are not nbt files. The Region File Format uses several distinct regions of binary data to store minecraft chunks and some of these regions contain compressed nbt data. You can't use nbtlib to read .mca files directly. I'm planning to write a python 3 alternative to pymclevel based on nbtlib but for now you'll need to extract the nbt regions from the .mca files yourself and feed them to nbtlib manually.

@MestreLion
Copy link
Contributor

First of all, huge thanks for this library, @vberlier ! I was a long time user of pymclevel for my hobby projects, but its API is bad and coding is even worse, not to mention it's stuck in Python 2. Your API looks clean and elegant, I love the way tags are subclasses of Python's builtins, and coding is very high-quality. I'll replace pymclevel and use this as my backend from now on. You're a godsend!

That said...

.mca files are not nbt files. The Region File Format uses several distinct regions of binary data to store minecraft chunks and some of these regions contain compressed nbt data. You can't use nbtlib to read .mca files directly. I'm planning to write a python 3 alternative to pymclevel based on nbtlib but for now you'll need to extract the nbt regions from the .mca files yourself and feed them to nbtlib manually.

What's the status on this .mca reader? Is there any prototype or draft? Would like me to implement one and send a PR? If so, any recommendations on the API? Any preferences on module, classes and method names? Any suggestion on method / function signatures?

@vberlier
Copy link
Owner

Thanks for the feedback! It means a lot to me :)

I've been thinking about writing a pythonic alternative to pymclevel ever since I started working on nbtlib. I have a bunch of ideas when it comes to API design but I've had to work on other things and haven't had enough time left to get started with the project and truly commit to it.

I guess if there's enough interest I could try to finally begin working on it.

@MestreLion
Copy link
Contributor

MestreLion commented Dec 12, 2019

Take a look on this WIP, already fully functional to read and save region files:

https://github.com/MestreLion/mcworldlib/blob/master/mcworldlib/region.py

I tried my best to follow nbtlib's API: RegionFile has .load(filename) for file paths, .from_buffer(buff) for file-likes, .parse(buff) to do the actual parsing, .save(filename) with optional filename (to save as a different file) and .write(buff) to do the actual data write.

It also follow some great nbtlib concepts: RegionFile is a dict (actually an abc.MutableMapping, which has the same functionality but fewer caveats) of RegionChunks, and Chunk and RegionChunk are nbtlib.Compound. So you can do things like:

region = mc.load('somefile.mca')
for pos, chunk in region.items():
    chunk[Path('"".Level.x.y...)] = ...
del region[(0, 1)]
region.update(another_region)
if (3, 4) in region:
    print region.get(5, 6)
    print(chunk['']['DataVersion'])
region.save()

And so on...

It was intentionally designed to integrate nicely with nbtlib, so if you want I can prepare a version for PR. Being a part of nbtlib I would adjust the import from absolute to relative, possibly rename load() to load_region() (or maybe do so in your __init__.py), chance license to MIT, etc etc.

My goal with MCWorldLib is to resurrect my former PyMCToolsLib, now using Python 3 and nbtlib as its backend instead of the venerable but defunct pymclevel

@ch-yx
Copy link

ch-yx commented Dec 12, 2019 via email

@MestreLion
Copy link
Contributor

I heard that this method have been fixed by mojang. https://youtu.be/uw7vEGhKoH8 Is that true? I don't think it is possible to be fixed without changing .mca format. thinking

I didn't watch the whole video, but it seems to be discussing the 1MB limit for chunk, correct? Well, the only thing that really imposes this limit is the 1-byte unsigned chunk sector_count in the region header. So sector count is at most 255, and each sector is 4096 bytes, so each chunk would have at most 1MB.

But... this sector_count is completely irrelevant for reading the region and its chunks. Because in the chunk header there's another size information: 4-byte unsigned length, which can express sizes of up to 4GB. And that's the size used to actually read the chunk data.

I've just created an .mca reader and didn't even use sector_count. It's completely redundant as the actual sector count could be derived from length. I only checked to see if both matched, but other than that it can be safely ignored. My tool could easily drop this check and it would read and write chunks over 1MB just fine. So other tools and Minecraft itself could do the same too, no format change required.

@vberlier
Copy link
Owner

Your project looks really good! I'm trying to keep nbtlib focused on dealing with nbt itself instead of its various applications in the Minecraft ecosystem. I think it would definitely make sense for your project to be a package of its own on PyPI 👍

Don't hesitate opening other issues if you think there are things that could be tweaked in nbtlib to make your implementation easier.

@ch-yx
Copy link

ch-yx commented Dec 17, 2019

MestreLion/mcworldlib@22e5897#r36412924

I think that nbtlib.File.root is good enough hiding the unnamed root.

Compound has a lot of method and I am afraid that what you are to do will bring side effects or make it no longer consistent .

Even if you can slove those problem, It's not worth it.😥

@MestreLion
Copy link
Contributor

Your project looks really good! I'm trying to keep nbtlib focused on dealing with nbt itself instead of its various applications in the Minecraft ecosystem. I think it would definitely make sense for your project to be a package of its own on PyPI +1

Thanks! It's been my pet project in the last few weeks. I'll surely publish it on PyPI once I have established a functional API.

That said, I still think nbtlib itself should have an mca parser, because the Region/Anvil, along with NBT, are the only actual data formats in Minecraft ecosystem, everything else can be considered data semantics. Adding region file parsing to nbtlib will enable it to handle all raw data parsing required by any Minecraft tool, so it becomes a python library equivalent of, for example, NBTExplorer. It's a good spot to draw the line and set the scope.

All you need is my RegionFile, you don't even need Chunk, and RegionChunk can be considered an implementation detail, after all a chunk is a Compound tag.

Don't hesitate opening other issues if you think there are things that could be tweaked in nbtlib to make your implementation easier.

Thanks for such a welcoming response! Right now I've only found 3-4 issues that would definitely make my life easier. In order of importance:

  • Split File functionality into Root (as in root compound tag), inheriting from Compound, making File inherit from Root.

  • In String, do not use the default serialize_tag() behavior from Base.__str__(). str(foo) should give the un-quoted and un-escaped string value, which is really ugly to obtain otherwise.

  • For Root tags, hide the unnamed tag on access so self['a'] acts like current self.root['a'], as if [''] didn't exist, adding it back only on .write(). @ch-yx already dissuaded me on that, but it's worth pointing it here.

  • Read/Write methods API is somewhat weird.

I'll add issues and PRs on each of them, elaborating more about rationale and proposals. I'll also add more the more I use nbtlib it as the backend of my MCWorldLib project. So far it's been a huge pleasure to use this!

@ch-yx
Copy link

ch-yx commented Dec 19, 2019

In String, do not use the default serialize_tag() behavior from Base.str(). str(foo) should give the un-quoted and un-escaped string value, which is really ugly to obtain otherwise.

I have notice that before. I m using nbtlib.String("ab\"c") + "" to bypass it.

@ch-yx
Copy link

ch-yx commented Dec 19, 2019

but I did not say it because I think that consistency > convenience .
Let @vberlier decide .

@ch-yx
Copy link

ch-yx commented Dec 19, 2019

as we have come to serialize_tag, I do some testing and found something wrong.
str(nbtlib.Byte(4)) == 'Byte(4)b'
int.__str__(nbtlib.Byte(4)) == 'Byte(4)'

def serialize_numeric(self, tag):
"""Return the literal representation of a numeric tag."""
str_func = int.__str__ if isinstance(tag, int) else float.__str__
return str_func(tag) + tag.suffix

#probably due to python 3.8 ?


yes,it is.

float.__str__ is int.__str__ is object.__str__ ==>true

even int.__str__(4.0) is accepted now.


python 3.8 :

bpo-36793: Removed str implementations from builtin types bool, int, float, complex and few classes from the standard library. They now inherit str() from object.

python/cpython@96aeaec

@MestreLion
Copy link
Contributor

I have notice that before. I m using nbtlib.String("ab\"c") + "" to bypass it.

You can use str.__str__(tag), and that's what I will suggest in my PR to use as String.__str__()

@MestreLion
Copy link
Contributor

MestreLion commented Dec 20, 2019

but I did not say it because I think that consistency > convenience .
Let @vberlier decide .

I believe the opposite, convenience should trump consistency. But, regardless of our opinions, it seems that was convenience, not consistency, that drove the decision to use tags' .__str__() as a means to convert it to snbt representation, instead of, say, a .to_snbt() or .serialize() method. It's handy and convenient (for those who care about snbt at least), even if breaks some consistency with python conventions (for example, collections' such as dict and list, usually on their __str__() they cast repr() on each member, not str().

@MestreLion
Copy link
Contributor

Let me create the issues so we can discuss each one separately without flooding this one, so this can focus solely on whether or not nbtlib should have a built-in mca parser.

@vberlier
Copy link
Owner

Alright I'm going to look into all of this, thank you for contributing! And now that I think about it a built-in mca parser could definitely belong in something like nbtlib.contrib. I think you're right, one of the strength of the python ecosystem is the batteries-included philosophy, popular projects tend to include a contrib package for common use-cases and utilities.

This was referenced Dec 24, 2019
vberlier added a commit that referenced this issue Dec 24, 2020
Examples:

    $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[{name: "Compound tag #0"}]'
    {name: "Compound tag #0", created-on: 1264099775885L}

    $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[0]' --json
    {"name": "Compound tag #0", "created-on": 1264099775885}

    $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[].name' --json
    "Compound tag #0"
    "Compound tag #1"

    $ nbt -r tests/nbt_files/bigtest.nbt --path 'Level."listTest (compound)"[].name' --unpack
    Compound tag #0
    Compound tag #1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants