Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special character in file name (subtitles) #74

Open
liloneum opened this issue Mar 17, 2018 · 4 comments
Open

Special character in file name (subtitles) #74

liloneum opened this issue Mar 17, 2018 · 4 comments

Comments

@liloneum
Copy link

A special character in a subtitle name (ç in my case) will cause a fatal error of ingest with Doremi servers without mentioning the error.

@wolfgangw
Copy link
Owner

@liloneum Thanks, noted - shall be fixed.

@matmat
Copy link

matmat commented Dec 30, 2023

As for special characters in filenames in general. I put some notes here: https://dcpomatic.com/mantis/view.php?id=2465

Repeating below:

The Interop (UDF) constraints are a bit messy, I think it would be easier to just enforce the SMPTE rules also for interop. Mainly:

  • Each path segment shall match [a-zA-Z0-9-_.]
  • No path segment shall have more than 100 characters
  • The value of the Path element shall not exceed 100 characters in length
  • A Path element value shall have no more than 10 segments

References:

SMPTE:

ST 429-9:2014

7.1 Path

The Path element indicates the complete path for the Chunk, represented as a URI per [RFC 3986]. Its semantics and format are delivery-medium dependent, and constrained by each Map Profile (see Section 9). The value is encoded as an xs:anyURI.
Note: Annex A presents a basic Map Profile.

Annex A Basic Map Profile v2 (Normative)

A.2 Path

Each Path element value shall be a relative-path reference as specified in RFC 3986. No query or fragment component shall be present.
Given a Path element in an Asset Map, the relative-path reference shall be resolved, as specified in RFC 3986, relative to a Base URI consisting of the location of the Asset Map.
(...)
Each path segment, as specified in IETF RFC 3986, shall consist of characters from the set a-z, A-Z, 0-9, “-“ (dash), “_” (underscore) and “.” (period). No segment shall have more than 100 characters, and the value of the Path element shall not exceed 100 characters in length. A Path element value shall have no more than 10 segments. The Path element value shall preserve case (the path and the filename on the filesystem shall have identical case). No two paths in an Asset Map shall have identical value, regardless of case.

INTEROP:

https://interop-docs.cinepedia.com/Document_Release_2.0/mpeg_ii_am_spec.pdf

6.4 Chunk Path Format

The path and filename shall conform to the UDF specification.

http://www.osta.org/specs/pdf/udf201.pdf

Basic Restrictions & Requirements

File Name Length: Maximum of 255 bytes

4.2.2.1 char FileIdentifier

...
[this section with subsections contain quite involved algorithms for translation of "illegal" names to be used on specific OSes]

@wolfgangw
Copy link
Owner

@matmat thanks for the notes and reminders!

Added checks for outsider chars in AM asset paths. Depending on AM type (SMPTE/Interop) the return will be Error (SMPTE) and Hint (Interop), respectively. What do you think? (4c3977c)

Also added a length check for AM asset paths that should have been in there 10 years ago -_-

@matmat
Copy link

matmat commented Jan 1, 2024

Looks good, thank you!

In practice I think lots of DCPs will fail this but still play back without probles (in most cases).. But that's how it is I guess.

The festival is coming up and I will battle test this in the coming weeks! :)

If/when you have time these additional checks would be nice to have (but some of them maybe unneccecary..):

  • No two paths in an Asset Map shall have identical value, regardless of case.
  • Check that the path is relative
  • No more than 10 path segments
  • Check that it is a valid path according to RFC 3986
  • No query component
  • No fragment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants