Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No support for non-ASCII filenames #89

Open
maxmerben opened this issue Sep 18, 2024 · 0 comments
Open

No support for non-ASCII filenames #89

maxmerben opened this issue Sep 18, 2024 · 0 comments

Comments

@maxmerben
Copy link

Hi there! I was trying to analyze an audio file using fasttrackpy. It went great with simple file names such as ZOOM0009_7_sodana.TextGrid but did not work with file names like ZOOM0003_2_sətva.TextGrid or ZOOM0001_1_+zaṭṭə.TextGrid. Such files give the following error:

UnicodeDecodeError                        Traceback (most recent call last)
Cell In[92], line 1
----> 1 results = process_audio_textgrid(
      2     audio_path, grid_path,
      3     entry_classes=["v"],
      4     target_tier="v",
      5     target_labels=VOWELS)

File ~\AppData\Roaming\Python\Python311\site-packages\fasttrackpy\patterns\audio_textgrid.py:155, in process_audio_textgrid(audio_path, textgrid_path, entry_classes, target_tier, target_labels, min_duration, min_max_formant, max_max_formant, nstep, n_formants, window_length, time_step, pre_emphasis_from, smoother, loss_fun, agg_fun)
    100 def process_audio_textgrid(
    101         audio_path: str|Path,
    102         textgrid_path: str|Path,
   (...)
    116         agg_fun: Agg = Agg()
    117 )->list[CandidateTracks]:
    118     """Process an audio and TextGrid file together.
    119 
    120     Args:
   (...)
    152         (list[CandidateTracks]): A list of candidate tracks.
    153     """
--> 155     if not is_audio(str(audio_path)):
    156         raise TypeError(f"The file at {str(audio_path)} is not an audio file")
    158     sound = pm.Sound(str(audio_path))

File ~\AppData\Roaming\Python\Python311\site-packages\fasttrackpy\patterns\just_audio.py:50, in create_audio_checker.<locals>.magic_checker(path)
     41 def magic_checker(path: str)->bool:
     42     """Checks whether a file is an audio file using libmagic
     43 
     44     Args:
   (...)
     48         (bool): Whether or not the file is an audio file
     49     """
---> 50     file_mime = magic.from_file(str(path), mime=True)
     51     return "audio" in file_mime

File ~\AppData\Roaming\Python\Python311\site-packages\magic\magic.py:135, in from_file(filename, mime)
    126 """"
    127 Accepts a filename and returns the detected filetype.  Return
    128 value is the mimetype if mime=True, otherwise a human readable
   (...)
    132 'application/pdf'
    133 """
    134 m = _get_magic_type(mime)
--> 135 return m.from_file(filename)

File ~\AppData\Roaming\Python\Python311\site-packages\magic\magic.py:89, in Magic.from_file(self, filename)
     87 with self.lock:
     88     try:
---> 89         return maybe_decode(magic_file(self.cookie, filename))
     90     except MagicException as e:
     91         return self._handle509Bug(e)

File ~\AppData\Roaming\Python\Python311\site-packages\magic\magic.py:214, in maybe_decode(s)
    212     return s
    213 else:
--> 214     return s.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 44: invalid continuation byte

As far as I understand, the problem is in the use of the magic library, which apparetly does not support non-ASCII characters. Frankly, I don’t understand what the necessity for this library is in fasttrackpy, but I am not the creator of fasttrackpy :) Yet, it would be great if there was full Unicode support. For now, the solution I see is as follows: rename the files automatically before using process_corpus and then automatically rename them back after. Quite cumbersome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant