Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iso9660 #82

Closed
wants to merge 73 commits into from
Closed

Iso9660 #82

wants to merge 73 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Feb 3, 2018

This is far from done, but it the review should not become too big

filesystem/iso9660.ksy Outdated Show resolved Hide resolved
filesystem/iso9660.ksy Outdated Show resolved Hide resolved
- id: rec_date_time
doc-ref: ecma-119 9.1.5
type: recdatetime
if: len_dr > 0x0
Copy link
Contributor

@KOLANICH KOLANICH Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to move the comparison into an instance and use in if that instance.

- id: publisher_identifier
doc-ref: ecma-119 8.5.14
type: str
size: 0x80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving these stings of predefined lengths into own types

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

vol_desc_boot_record:
doc-ref: ecma-119 8.1.3
contents: [0x01]
- id: boot_record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch should be here too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

type: b1
- id: creation_short
type: datetime_short
if: ( creation == true ) and ( long_form == false )
Copy link
Contributor

@KOLANICH KOLANICH Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see lots of long_form == true and long_form == false.
Consider moving long and short forms into separate types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

if: long_form == false
- id: datetime_long
type: rrip_tf_long
if: long_form == true
Copy link
Contributor

@KOLANICH KOLANICH Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess == true is not needed. x == false is not x.
Use switch-on and cases instead of sequential ifs and make sure that the types have compatible interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@KOLANICH
Copy link
Contributor

KOLANICH commented Apr 4, 2021

This ISO9660 spec works, the file in kaitai-io/kaitai_struct_formats does not.

Thank you for the clarification, and especially for the spec.

filesystem/iso9660.ksy Outdated Show resolved Hide resolved
encoding: UTF-8
- id: bibliographic_file_id
size: 0x2
- id: minute
Copy link
Contributor

@KOLANICH KOLANICH Apr 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in another occurence it was min

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My variables are the same as in the document, but I follow multiple documents.
So the naming style does sometimes change.

If I rename things the validation against the original spec would get much harder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-orig-id:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same discussion: #82 (comment)

- id: volume_flags_not_iso2375
doc-ref: ecma-119 8.5.3 b0
type: b1
- id: system_identifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplication

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it. One is system other is volume, even the doc-ref is different

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented these standards exactly!
The fact that ref. 8.4 & 8.5 partially contain the same items and not a reference to example a u4bi is not for me to fix.

      primary:
        doc-ref: ecma-119 8.4
          - id: system_identifier
            doc-ref: ecma-119 8.4.5
            type: text32
          - id: volume_identifier
            doc-ref: ecma-119 8.4.6
            type: text32
      supplementary:
        doc-ref: ecma-119 8.5
          - id: system_identifier
            doc-ref: ecma-119 8.5.4
            type: text32
          - id: volume_identifier
            doc-ref: ecma-119 8.5.5
            type: text32

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the fact that it is written so in the original spec shouldn't prevent us from eliminating redundancy. So, it is for you to fix, their spec is their, but the ksy spec is yours!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I guess we conflict on 2 items... this and the naming of some identifiers as that topic came up 2 times.

This PR works and will greatly extent the support of the format.
If you don't want my code, then let me know and I will close the pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't close your PR, your PR is welcome here! The notices made are not requirements, but advices.

type: b1
- id: content
type: str
size: length - 5
Copy link
Contributor

@KOLANICH KOLANICH Apr 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to move the 5 bytes into a subtype and replace 5 with sizeof<the_subtype>. Thsre are probably other places it can be done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in the spec they give the total size of the block incl the header, so you have to reduce the size of the header.
And making breaking it into a subtype will just make more of a mess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And making breaking it into a subtype will just make more of a mess.

Yes, I have to agree - it probably will. However, I have a better idea - you use these specific types in header:

- id: specific
type:
switch-on: signature
cases:
'signature::aaip_attribute_list': susp_unknown # AL
'signature::rras_amiga_specific': rras_as # AS
'signature::susp_continuation_area': susp_ce # CE
'signature::rrip_child_link': rrip_cl # CL
'signature::susp_extensions_reference': susp_er # ER
'signature::susp_extension_selector': susp_es # ES
'signature::rrip_alternate_name': rrip_nm # NM
'signature::susp_padding_field': susp_pd # PD
'signature::rrip_parent_link': rrip_pl # PL
'signature::rrip_posix_device_number': rrip_pn # PN
'signature::rrip_posix_file_attributes': rrip_px # PX
'signature::rrip_relocated_directory': rrip_re # RE
'signature::rrip_extensions_in_use_indicator': susp_unknown # RR
'signature::rrip_sparse_file': rrip_sf # SF
'signature::rrip_symbolic_link': susp_sl # SL
'signature::susp_indicator': susp_sp # SP
'signature::susp_terminator': susp_st # ST
'signature::rrip_time_file': rrip_tf # TF
'signature::rrzf_zisofs': rrzf_zf # ZF

and in fact, all of them start with length and version.

susp_unknown: # default for now
seq:
- id: length
type: u1
- id: version
type: u1
- id: data

rras_as:
doc-ref: rras
seq:
- id: length
type: u1
- id: version
type: u1
valid: 1
- id: reserved

susp_ce:
doc-ref: susp 5.1
seq:
- id: length
type: u1
- id: version
type: u1
valid: 1
- id: ca_location

and so on. I checked that this pair appears in the beginning of all fields, with the only exception that susp_unknown doesn't have the valid: 1 constraint on version (if this is intentional and not an oversight, we can easily get around it by using if or valid/expr).

It would be much better if you would extract these common fields right into header (because in fact, why not, it makes sense in my opinion - the names length and version sound like they totally fit into a header structure):

header:
seq:
- id: signature
type: u2be
enum: signature
- id: specific

 header:
   seq:
     - id: signature
       type: u2be
       enum: signature
+    - id: length
+      type: u1
+    - id: version
+      type: u1
+      valid: 1
     - id: specific

The nice thing about it is that now you have the length available in the parent struct, so you can simply put the field specific in a substream. Then you will be able to throw out all the size: length - 5 calculations and just use size-eos: true. Also, it won't allow the specific types to possibly get out of the designated substream and the parsing will fail on the first field that doesn't fit in there (not by ending up with a negative size from those size: length - 5 calculations, which will also lead to an error, but fields with these calculations are not even present in all types - not sure if it's explicitly written in the reference spec that the length should be disregarded in some specific types, but if it's not, I'd definitely confine every specific type within that substream).

 header:
   seq:
@@ ... @@
     - id: specific
+      size: length - (signature._sizeof + length._sizeof + version._sizeof) # note: this `_sizeof` sum should probably be in a `value` instance
       type:
         switch-on: signature
         cases:
           'signature::aaip_attribute_list': susp_unknown # AL
@@ ... @@
           'signature::rrzf_zisofs': rrzf_zf # ZF

Now, you will just take each specific type, for example:

susp_pd:
doc-ref: susp 5.2
seq:
- id: length
type: u1
- id: version
type: u1
valid: 1
- id: padding_area
size: length - 4

..., simply throw out length and version, and replace size: length - 4 with size-eos: true:

 susp_pd:
   doc-ref: susp 5.2
   seq:
-    - id: length
-      type: u1
-    - id: version
-      type: u1
-      valid: 1
     - id: padding_area
-      size: length - 4
+      size-eos: true

You see how much better it is? You'll replace 18 identical length, version pairs in every single specific type with 1 common pair in header, you'll treat the length consistently and correctly, and eliminate the magic numbers in manual size: length - 5 calculations. What more could we possibly wish for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"What more could we possibly wish for?"
For the feeling to go away that this PR is eternally unmergeable.

You can have it as-is and improve on it in your own time.

filesystem/iso9660.ksy Outdated Show resolved Hide resolved
filesystem/iso9660.ksy Outdated Show resolved Hide resolved
filesystem/iso9660.ksy Outdated Show resolved Hide resolved
filesystem/iso9660.ksy Outdated Show resolved Hide resolved
type: b1
- id: current
type: b1
- id: continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend changing this to continued as continue is a reserved keyword in Python and it is causing some issues.

type: b1
- id: current
type: b1
- id: continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also recommend changing this to continued as continue is a reserved keyword in Python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe continuation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see

I have read it and what you are suggesting is exactly what I am doing. I am using it (in another project) and verifying and adapting it. I am just leaving the comments here for future reference.

- id: header
type: header
if: _io.size - _io.pos >= 2
repeat: eos
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to IEEE P1281 a valid SUSP entry is at least 4 bytes long:

If the remaining allocated space following the last recorded System Use Entry in a System Use field or Continuation Area is less than four bytes long, it cannot contain a System Use Entry and shall be ignored.

I have adapted susp to the following:

              susp:
                doc: |
                  We check if we have at least 4 bytes left.
                  The SUSP magic is 2 bytes. A complete SUSP entry
                  is at least 4 bytes long.
                seq:
                  - id: header
                    type: header
                    if: _io.size - _io.pos >= 4
                    repeat: until
                    repeat-until: _io.size - _io.pos < 4

type: u1
- id: body
type: body
if: len_dr > 0x0
Copy link
Collaborator

@armijnhemel armijnhemel Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            size: len_dr - len_dr._sizeof

The size of the system_use is defined as:
( len_dr - size of directory_record = 33 bytes ) - len_fi
type: susp
size: ( _parent.len_dr - 33 ) - len_fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not correct: susp is whatever it is from up to that point, to the end of the directory record. If there is a padding byte, then it should be decreased by 1.

By specifying the size it becomes really easy to avoid this problem:

                size-eos: true

- id: directory_record
type: directory_record
repeat: until
repeat-until: _.len_dr == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not correct, as there can be additional padding bytes. Section 6.8.1.1 of the ISO9660 spec says:

Each Directory  Record  shall  end  in  the  Logical  Sector  in  which  it  begins.  Unused  byte  positions  after  the  last Directory Record in a Logical Sector shall be set to (00).

So this means that you also need to look at the remaining bytes in the logical sector as well as the total length of the directory records.

@ghost ghost closed this by deleting the head repository Nov 23, 2024
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants