-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iso9660 #82
Iso9660 #82
Conversation
… cannot parse valid ISO + rockridge anymore
…d which doc it references
filesystem/iso9660.ksy
Outdated
- id: rec_date_time | ||
doc-ref: ecma-119 9.1.5 | ||
type: recdatetime | ||
if: len_dr > 0x0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to move the comparison into an instance and use in if
that instance.
filesystem/iso9660.ksy
Outdated
- id: publisher_identifier | ||
doc-ref: ecma-119 8.5.14 | ||
type: str | ||
size: 0x80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving these stings of predefined lengths into own types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
filesystem/iso9660.ksy
Outdated
vol_desc_boot_record: | ||
doc-ref: ecma-119 8.1.3 | ||
contents: [0x01] | ||
- id: boot_record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch should be here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
filesystem/iso9660.ksy
Outdated
type: b1 | ||
- id: creation_short | ||
type: datetime_short | ||
if: ( creation == true ) and ( long_form == false ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see lots of long_form == true
and long_form == false
.
Consider moving long and short forms into separate types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
filesystem/iso9660.ksy
Outdated
if: long_form == false | ||
- id: datetime_long | ||
type: rrip_tf_long | ||
if: long_form == true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess == true
is not needed. x == false
is not x
.
Use switch-on
and cases
instead of sequential if
s and make sure that the types have compatible interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thank you for the clarification, and especially for the spec. |
encoding: UTF-8 | ||
- id: bibliographic_file_id | ||
size: 0x2 | ||
- id: minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in another occurence it was min
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My variables are the same as in the document, but I follow multiple documents.
So the naming style does sometimes change.
If I rename things the validation against the original spec would get much harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-orig-id:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same discussion: #82 (comment)
- id: volume_flags_not_iso2375 | ||
doc-ref: ecma-119 8.5.3 b0 | ||
type: b1 | ||
- id: system_identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it. One is system other is volume, even the doc-ref is different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented these standards exactly!
The fact that ref. 8.4 & 8.5 partially contain the same items and not a reference to example a u4bi is not for me to fix.
primary:
doc-ref: ecma-119 8.4
- id: system_identifier
doc-ref: ecma-119 8.4.5
type: text32
- id: volume_identifier
doc-ref: ecma-119 8.4.6
type: text32
supplementary:
doc-ref: ecma-119 8.5
- id: system_identifier
doc-ref: ecma-119 8.5.4
type: text32
- id: volume_identifier
doc-ref: ecma-119 8.5.5
type: text32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the fact that it is written so in the original spec shouldn't prevent us from eliminating redundancy. So, it is for you to fix, their spec is their, but the ksy spec is yours!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I guess we conflict on 2 items... this and the naming of some identifiers as that topic came up 2 times.
This PR works and will greatly extent the support of the format.
If you don't want my code, then let me know and I will close the pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't close your PR, your PR is welcome here! The notices made are not requirements, but advices.
type: b1 | ||
- id: content | ||
type: str | ||
size: length - 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to move the 5 bytes into a subtype and replace 5
with sizeof<the_subtype>
. Thsre are probably other places it can be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well in the spec they give the total size of the block incl the header, so you have to reduce the size of the header.
And making breaking it into a subtype will just make more of a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And making breaking it into a subtype will just make more of a mess.
Yes, I have to agree - it probably will. However, I have a better idea - you use these specific types in header
:
kaitai_struct_formats/filesystem/iso9660.ksy
Lines 513 to 535 in 619e1ba
- id: specific | |
type: | |
switch-on: signature | |
cases: | |
'signature::aaip_attribute_list': susp_unknown # AL | |
'signature::rras_amiga_specific': rras_as # AS | |
'signature::susp_continuation_area': susp_ce # CE | |
'signature::rrip_child_link': rrip_cl # CL | |
'signature::susp_extensions_reference': susp_er # ER | |
'signature::susp_extension_selector': susp_es # ES | |
'signature::rrip_alternate_name': rrip_nm # NM | |
'signature::susp_padding_field': susp_pd # PD | |
'signature::rrip_parent_link': rrip_pl # PL | |
'signature::rrip_posix_device_number': rrip_pn # PN | |
'signature::rrip_posix_file_attributes': rrip_px # PX | |
'signature::rrip_relocated_directory': rrip_re # RE | |
'signature::rrip_extensions_in_use_indicator': susp_unknown # RR | |
'signature::rrip_sparse_file': rrip_sf # SF | |
'signature::rrip_symbolic_link': susp_sl # SL | |
'signature::susp_indicator': susp_sp # SP | |
'signature::susp_terminator': susp_st # ST | |
'signature::rrip_time_file': rrip_tf # TF | |
'signature::rrzf_zisofs': rrzf_zf # ZF |
and in fact, all of them start with length
and version
.
kaitai_struct_formats/filesystem/iso9660.ksy
Lines 958 to 964 in 619e1ba
susp_unknown: # default for now | |
seq: | |
- id: length | |
type: u1 | |
- id: version | |
type: u1 | |
- id: data |
kaitai_struct_formats/filesystem/iso9660.ksy
Lines 558 to 566 in 619e1ba
rras_as: | |
doc-ref: rras | |
seq: | |
- id: length | |
type: u1 | |
- id: version | |
type: u1 | |
valid: 1 | |
- id: reserved |
kaitai_struct_formats/filesystem/iso9660.ksy
Lines 645 to 653 in 619e1ba
susp_ce: | |
doc-ref: susp 5.1 | |
seq: | |
- id: length | |
type: u1 | |
- id: version | |
type: u1 | |
valid: 1 | |
- id: ca_location |
and so on. I checked that this pair appears in the beginning of all fields, with the only exception that susp_unknown
doesn't have the valid: 1
constraint on version
(if this is intentional and not an oversight, we can easily get around it by using if
or valid/expr
).
It would be much better if you would extract these common fields right into header
(because in fact, why not, it makes sense in my opinion - the names length
and version
sound like they totally fit into a header structure):
kaitai_struct_formats/filesystem/iso9660.ksy
Lines 508 to 513 in 619e1ba
header: | |
seq: | |
- id: signature | |
type: u2be | |
enum: signature | |
- id: specific |
header:
seq:
- id: signature
type: u2be
enum: signature
+ - id: length
+ type: u1
+ - id: version
+ type: u1
+ valid: 1
- id: specific
The nice thing about it is that now you have the length
available in the parent struct, so you can simply put the field specific
in a substream. Then you will be able to throw out all the size: length - 5
calculations and just use size-eos: true
. Also, it won't allow the specific types to possibly get out of the designated substream and the parsing will fail on the first field that doesn't fit in there (not by ending up with a negative size from those size: length - 5
calculations, which will also lead to an error, but fields with these calculations are not even present in all types - not sure if it's explicitly written in the reference spec that the length
should be disregarded in some specific types, but if it's not, I'd definitely confine every specific type within that substream).
header:
seq:
@@ ... @@
- id: specific
+ size: length - (signature._sizeof + length._sizeof + version._sizeof) # note: this `_sizeof` sum should probably be in a `value` instance
type:
switch-on: signature
cases:
'signature::aaip_attribute_list': susp_unknown # AL
@@ ... @@
'signature::rrzf_zisofs': rrzf_zf # ZF
Now, you will just take each specific type, for example:
kaitai_struct_formats/filesystem/iso9660.ksy
Lines 722 to 731 in 619e1ba
susp_pd: | |
doc-ref: susp 5.2 | |
seq: | |
- id: length | |
type: u1 | |
- id: version | |
type: u1 | |
valid: 1 | |
- id: padding_area | |
size: length - 4 |
..., simply throw out length
and version
, and replace size: length - 4
with size-eos: true
:
susp_pd:
doc-ref: susp 5.2
seq:
- - id: length
- type: u1
- - id: version
- type: u1
- valid: 1
- id: padding_area
- size: length - 4
+ size-eos: true
You see how much better it is? You'll replace 18 identical length
, version
pairs in every single specific type with 1 common pair in header
, you'll treat the length
consistently and correctly, and eliminate the magic numbers in manual size: length - 5
calculations. What more could we possibly wish for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"What more could we possibly wish for?"
For the feeling to go away that this PR is eternally unmergeable.
You can have it as-is and improve on it in your own time.
type: b1 | ||
- id: current | ||
type: b1 | ||
- id: continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend changing this to continued
as continue
is a reserved keyword in Python and it is causing some issues.
type: b1 | ||
- id: current | ||
type: b1 | ||
- id: continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also recommend changing this to continued
as continue
is a reserved keyword in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe continuation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have read it and what you are suggesting is exactly what I am doing. I am using it (in another project) and verifying and adapting it. I am just leaving the comments here for future reference.
- id: header | ||
type: header | ||
if: _io.size - _io.pos >= 2 | ||
repeat: eos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to IEEE P1281 a valid SUSP entry is at least 4 bytes long:
If the remaining allocated space following the last recorded System Use Entry in a System Use field or Continuation Area is less than four bytes long, it cannot contain a System Use Entry and shall be ignored.
I have adapted susp
to the following:
susp:
doc: |
We check if we have at least 4 bytes left.
The SUSP magic is 2 bytes. A complete SUSP entry
is at least 4 bytes long.
seq:
- id: header
type: header
if: _io.size - _io.pos >= 4
repeat: until
repeat-until: _io.size - _io.pos < 4
type: u1 | ||
- id: body | ||
type: body | ||
if: len_dr > 0x0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size: len_dr - len_dr._sizeof
The size of the system_use is defined as: | ||
( len_dr - size of directory_record = 33 bytes ) - len_fi | ||
type: susp | ||
size: ( _parent.len_dr - 33 ) - len_fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not correct: susp
is whatever it is from up to that point, to the end of the directory record. If there is a padding byte, then it should be decreased by 1.
By specifying the size it becomes really easy to avoid this problem:
size-eos: true
- id: directory_record | ||
type: directory_record | ||
repeat: until | ||
repeat-until: _.len_dr == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not correct, as there can be additional padding bytes. Section 6.8.1.1 of the ISO9660 spec says:
Each Directory Record shall end in the Logical Sector in which it begins. Unused byte positions after the last Directory Record in a Logical Sector shall be set to (00).
So this means that you also need to look at the remaining bytes in the logical sector as well as the total length of the directory records.
This is far from done, but it the review should not become too big