Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B-tree v2 for links from LINK_INFO messages #47

Merged
merged 8 commits into from
May 9, 2022

Conversation

woutdenolf
Copy link
Collaborator

Closes #46

@woutdenolf woutdenolf changed the title WIP: Prepare dataobjects for B-tree V2. B-Tree class inheritance. WIP: B-tree v2 for links from LINK_INFO messages Mar 18, 2021
@woutdenolf
Copy link
Collaborator Author

woutdenolf commented Apr 6, 2021

I've been stuck for a while on reading a fractal heap

import h5py
import pyfive

filename = "test.h5"
n = 10
with h5py.File(filename, mode="w", track_order=True) as f:
    for i in range(n):
        f.create_group(str(i)*10)

with pyfive.File(filename) as f:
    assert len(f.keys()) == n

prints this

FRACTAL HEAP DIRECT BLOCK
 h5debug test.h5 8909 6917 512
OrderedDict([('signature', b'FHDB'),
             ('version', 0),
             ('heap_header_adddress', 6917),
             ('block_offset', 0),
             ('checksum', 436527222)])
 header size 21
 data size 491
 managed object ID size 4
 block data  010400000000000000000a3030303030 ...
 first managed object (0, 0, 1) address 4 size 0

B-TREE V2 RECORDS
[{'namehash': 1032326176, 'objectid': 31885837240576},
 {'namehash': 1328140396, 'objectid': 31885837248000},
 {'namehash': 1679198546, 'objectid': 31885837255424},
 {'namehash': 1779768826, 'objectid': 31885837210880},
 {'namehash': 2418919233, 'objectid': 31885837233152},
 {'namehash': 2529727048, 'objectid': 31885837262848},
 {'namehash': 3052067229, 'objectid': 31885837270272},
 {'namehash': 3140234048, 'objectid': 31885837218304},
 {'namehash': 3744768751, 'objectid': 31885837277696},
 {'namehash': 3976419996, 'objectid': 31885837225728}]
h5debug test.h5 7221 7063 10

B-TREE V2 RECORDS
[{'creationorder': 0, 'objectid': 31885837210880},
 {'creationorder': 1, 'objectid': 31885837218304},
 {'creationorder': 2, 'objectid': 31885837225728},
 {'creationorder': 3, 'objectid': 31885837233152},
 {'creationorder': 4, 'objectid': 31885837240576},
 {'creationorder': 5, 'objectid': 31885837248000},
 {'creationorder': 6, 'objectid': 31885837255424},
 {'creationorder': 7, 'objectid': 31885837262848},
 {'creationorder': 8, 'objectid': 31885837270272},
 {'creationorder': 9, 'objectid': 31885837277696}]
h5debug test.h5 7733 7101 10

I'm assuming that the data of a direct block in a fractal heap is a sequence of object IDs but that doesn't make sense when I see this:

block data 010400000000000000000a3030303030 ...

which results in first managed object (0, 0, 1) address 4 size 0 (3030303030 is the first object data (the name "00000000" of an HDF5 group)).

Also I have no idea what the "objectid" in the B-tree is supposed to represent (it's not an address, the numbers are too big). It refers to an object managed in the fractal heap but not sure how. Each record (and hence each fractal object) corresponds to one HDF5 group (there are 10 HDF5 groups in the example).

Any help @jjhelmus would be appriciated.

@bmaranville
Copy link
Collaborator

bmaranville commented Jul 9, 2021

I think the objectid should not be parsed as an int (as is currently being done in BTreeV2GroupNames._parse_record and BTreeV2GroupOrders._parse_record)

For example, for the first (name=='0000000000') group in test.h5 the objectid as bytes is [0, 21, 0, 0, 0, 29, 0], where 21 is the offset of the data in the fractal heap and 29 is the length of the data.

For the second group the objectid is [0, 50, 0, 0, 0, 29, 0]...

I think the referenced data is not a fractal heap object ID, but seems to be 0x01 and 0x04 followed by a unique id in the third byte (0, 1, 2, 3... for the 10 objects) and the data size == 10 in byte 11, followed by the data (and a checksum after the data?) with some more zero padding after that for unknown reasons. I was not able to match this up to anything in the HDF5 specification yet.

EDIT: it does match the Link Message as far as I can see:
byte 0: version = 1
byte 1: flags, indicating hard link with "size of the Length of Link Name field is 1 byte" and "creation order is present"
bytes 2-9: 64-bit int indicating creation order
byte 10: length of link name (1 byte, indicating a length of 10)
bytes 11-20: link name (non-null-terminated string) = "0000000000"
bytes 21-28: link info, "The address of the object header for the object that the link points to." = 347

@bmaranville
Copy link
Collaborator

in FractalHeap, self._managed_object_offset_size is being calculated incorrectly - the value "maximum_heap_size" in the header is log2 of the real value (it says so in the spec) and so self._managed_object_offset_size can be calculated exactly the same way as block_offset_size = n // 8 + min(n % 8, 1): since n will be the number of bits, this is a correct way to calculate the number of bytes needed to represent n bits.

If these fixes are made, then self._managed_object_offset_size = 4 for test.h5, and the objectid returned in _read_node can be parsed correctly as a "Fractal Heap ID for Managed Objects".

@woutdenolf
Copy link
Collaborator Author

I very much appreciate your help @bmaranville ! I'll come back to this asap and try out your correction.

@woutdenolf
Copy link
Collaborator Author

woutdenolf commented Jul 12, 2021

This indeed fixes one problem. However what is still unclear from the HDF5 specs is what the "heap ID" actually is:

Version 2 B-tree, Type 5 Record Layout - Link Name for Indexed Group
ID | This is a 7-byte sequence of bytes and is the heap ID for the link record in the group’s fractal heap.

Then when looking at the fractal heap

Fields: Fractal Heap Direct Block
Object Data | This section of the direct block stores the actual data for objects in the heap. The size of this section is determined by the direct block’s size minus the size of the other fields stored in the direct block (for example, the Signature, Version, and others including the Checksum if it is present).

How is the "heap ID" from the b-tree record related to the direct block data of the fractal heap?

I will asked on the HDF5 forum: https://forum.hdfgroup.org/t/hdf5-specs-unclear-b-tree-v2-and-fractal-heap/8723

@bmaranville
Copy link
Collaborator

bmaranville commented Jul 12, 2021

The fractal heap ID in the spec has different forms based on what what type of object it is - for "tiny" objects the ID contains the data, for "huge" objects it contains either a B-tree key or a direct pointer, while for "managed objects" (like the ones we've been looking at) it contains the offset (within the fractal heap) and length of the data; e.g. the first complete "link message" is found at exactly offset 21 with length 29 in the fractal heap, which is within the direct data block (though the addressing is from the beginning of the heap, not the data block)

One thing that remains unclear to me is how you are supposed to know which type of heap ID you are reading - are you supposed to try to match the structure to the various sub-types and then use the first one that matches?

@woutdenolf
Copy link
Collaborator Author

If we know the offset of the fractal heap and the "heap ID" itself contains an offset (with respect to the heap offset) and a size, then a pure reader doesn't need to analyze the fractal heap at all it seems ...

@bmaranville
Copy link
Collaborator

But how do we know that the 7-byte "heap ID" in the "Version 2 B-tree, Type 5 Record Layout - Link Name for Indexed Group" corresponds to a managed object ID? I don't see that explicitly stated anywhere in the spec.

@woutdenolf
Copy link
Collaborator Author

From the first byte to the heap ID:

6-7 | The current version of ID format. This document describes version 0.
4-5 | The ID type. Managed objects have a value of 0.
0-3 | Reserved.

ID type is 0: managed, 1: tiny, 2: huge

@bmaranville
Copy link
Collaborator

Ah! Wonderful. Somehow I didn't notice that the first byte of all the ID sub-types had those same bits defined.

@woutdenolf
Copy link
Collaborator Author

woutdenolf commented Jul 12, 2021

Revisiting the size issue, in the spec I see two definitions

  1. The number of bytes used to encode this field is the Maximum Heap Size (in the heap’s header) divided by 8 and rounded up to the next highest integer, for values that are not a multiple of 8.
  2. This field’s size is the minimum number of bytes necessary to encode the Maximum Heap Size value (from the Fractal Heap Header). For example, if the value of the Maximum Heap Size is less than 256 bytes, this field is 1 byte in length, a Maximum Heap Size of 256-65535 bytes uses a 2 byte length, and so on.

So in python for me this is

  1. n//8 + min(n%7,1)
  2. value.bit_length()//8 + min(value.bit_length()%7,1)

It's not the same thing.

@woutdenolf
Copy link
Collaborator Author

Ah ok but it is saved as log2 in the header sigh

@woutdenolf
Copy link
Collaborator Author

woutdenolf commented Jul 12, 2021

We can now retrieve the fractal heap data used by the b-tree that stores sorted HDF5 groups:

FRACTAL HEAP DIRECT BLOCK
 h5debug test.h5 8959 6967 512
OrderedDict([('signature', b'FHDB'),
             ('version', 0),
             ('heap_header_adddress', 6967),
             ('block_offset', 0),
             ('checksum', 991944457)])
 header size 21
 data size 491
 block data  b'\x01\x04\x00\x00\x00\x00\x00\x00\x00\x00\x0b00000000000[\x01\x00\x00\x00\x00\x00\x00' ...

B-TREE V2 RECORDS
h5debug test.h5 7783 7151 10
TODO: decode fractal heap data
b'\x01\x04\x00\x00\x00\x00\x00\x00\x00\x00\x0b00000000000[\x01\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x01\x00\x00\x00\x00\x00\x00\x00\x0b11111111111\x1b\x04\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x02\x00\x00\x00\x00\x00\x00\x00\x0b22222222222\xdb\x06\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x03\x00\x00\x00\x00\x00\x00\x00\x0b33333333333\x9b\t\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x04\x00\x00\x00\x00\x00\x00\x00\x0b44444444444[\x0c\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x05\x00\x00\x00\x00\x00\x00\x00\x0b55555555555G\x0f\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x06\x00\x00\x00\x00\x00\x00\x00\x0b66666666666W\x12\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x07\x00\x00\x00\x00\x00\x00\x00\x0b77777777777g\x15\x00\x00\x00\x00\x00\x00'
b'\x01\x04\x08\x00\x00\x00\x00\x00\x00\x00\x0b88888888888w\x18\x00\x00\x00\x00\x00\x00'
b'\x01\x04\t\x00\x00\x00\x00\x00\x00\x00\x0b99999999999\x1b\x0f\x00\x00\x00\x00\x00\x00'

I didn't find any specification of how to decode that information. You can recognize the group index and name so we probably have the correct data blobs.

@bmaranville
Copy link
Collaborator

In the link info message definition, it specifies that the data found in the heap should be decoded as a Link Message - I decoded one by hand in one of the comments above.

@woutdenolf woutdenolf force-pushed the feature_btreev2_issue_46 branch from aa7e879 to f3683c0 Compare July 12, 2021 18:46
@woutdenolf woutdenolf force-pushed the feature_btreev2_issue_46 branch from c9441d6 to 4dd0bcc Compare July 12, 2021 18:52
@woutdenolf woutdenolf marked this pull request as ready for review July 12, 2021 18:53
@woutdenolf woutdenolf changed the title WIP: B-tree v2 for links from LINK_INFO messages B-tree v2 for links from LINK_INFO messages Jul 12, 2021
@woutdenolf
Copy link
Collaborator Author

Perfect, thanks @bmaranville ! This can be reviewed.

Copy link
Collaborator

@bmaranville bmaranville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good! I had only minor comments.

pyfive/dataobjects.py Outdated Show resolved Hide resolved
tests/test_new_style_groups.py Outdated Show resolved Hide resolved
@bmaranville
Copy link
Collaborator

@woutdenolf and @jjhelmus I already ported this enhancement to a new branch of the jsfive library... I want to make sure that project makes a proper attribution of your work. If you want me to alter the LICENSE.txt please let me know.

@woutdenolf
Copy link
Collaborator Author

Oeph that's a lot of code duplication. I will probably add more things to pyfive in the future as we want to use it in combination with h5py for recovering data from corrupt files ( pyfive to browse the tree and skip corrupt parts, h5py to read the datasets).

@woutdenolf
Copy link
Collaborator Author

@jjhelmus I guess you are no longer actively managing this project? Any objection if @bmaranville and myself merge things?

@kmuehlbauer kmuehlbauer mentioned this pull request Jan 24, 2022
@jjhelmus
Copy link
Owner

jjhelmus commented May 9, 2022

@jjhelmus I guess you are no longer actively managing this project? Any objection if @bmaranville and myself merge things?

I've let this project idle for too long, sorry about.

@woutdenolf and @bmaranville, if you are still interesting in helping maintain or taking over this project I'd be happy to help make this possible. I've invited both of you as collaborators to this repository.

@jjhelmus jjhelmus merged commit 0b1aaba into jjhelmus:master May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fails for groups with order tracking
3 participants