No support for non-ASCII filenames #89

maxmerben · 2024-09-18T14:06:09Z

Hi there! I was trying to analyze an audio file using fasttrackpy. It went great with simple file names such as ZOOM0009_7_sodana.TextGrid but did not work with file names like ZOOM0003_2_sətva.TextGrid or ZOOM0001_1_+zaṭṭə.TextGrid. Such files give the following error:

UnicodeDecodeError                        Traceback (most recent call last)
Cell In[92], line 1
----> 1 results = process_audio_textgrid(
      2     audio_path, grid_path,
      3     entry_classes=["v"],
      4     target_tier="v",
      5     target_labels=VOWELS)

File ~\AppData\Roaming\Python\Python311\site-packages\fasttrackpy\patterns\audio_textgrid.py:155, in process_audio_textgrid(audio_path, textgrid_path, entry_classes, target_tier, target_labels, min_duration, min_max_formant, max_max_formant, nstep, n_formants, window_length, time_step, pre_emphasis_from, smoother, loss_fun, agg_fun)
    100 def process_audio_textgrid(
    101         audio_path: str|Path,
    102         textgrid_path: str|Path,
   (...)
    116         agg_fun: Agg = Agg()
    117 )->list[CandidateTracks]:
    118     """Process an audio and TextGrid file together.
    119 
    120     Args:
   (...)
    152         (list[CandidateTracks]): A list of candidate tracks.
    153     """
--> 155     if not is_audio(str(audio_path)):
    156         raise TypeError(f"The file at {str(audio_path)} is not an audio file")
    158     sound = pm.Sound(str(audio_path))

File ~\AppData\Roaming\Python\Python311\site-packages\fasttrackpy\patterns\just_audio.py:50, in create_audio_checker.<locals>.magic_checker(path)
     41 def magic_checker(path: str)->bool:
     42     """Checks whether a file is an audio file using libmagic
     43 
     44     Args:
   (...)
     48         (bool): Whether or not the file is an audio file
     49     """
---> 50     file_mime = magic.from_file(str(path), mime=True)
     51     return "audio" in file_mime

File ~\AppData\Roaming\Python\Python311\site-packages\magic\magic.py:135, in from_file(filename, mime)
    126 """"
    127 Accepts a filename and returns the detected filetype.  Return
    128 value is the mimetype if mime=True, otherwise a human readable
   (...)
    132 'application/pdf'
    133 """
    134 m = _get_magic_type(mime)
--> 135 return m.from_file(filename)

File ~\AppData\Roaming\Python\Python311\site-packages\magic\magic.py:89, in Magic.from_file(self, filename)
     87 with self.lock:
     88     try:
---> 89         return maybe_decode(magic_file(self.cookie, filename))
     90     except MagicException as e:
     91         return self._handle509Bug(e)

File ~\AppData\Roaming\Python\Python311\site-packages\magic\magic.py:214, in maybe_decode(s)
    212     return s
    213 else:
--> 214     return s.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 44: invalid continuation byte

As far as I understand, the problem is in the use of the magic library, which apparetly does not support non-ASCII characters. Frankly, I don’t understand what the necessity for this library is in fasttrackpy, but I am not the creator of fasttrackpy :) Yet, it would be great if there was full Unicode support. For now, the solution I see is as follows: rename the files automatically before using process_corpus and then automatically rename them back after. Quite cumbersome!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No support for non-ASCII filenames #89

No support for non-ASCII filenames #89

maxmerben commented Sep 18, 2024

No support for non-ASCII filenames #89

No support for non-ASCII filenames #89

Comments

maxmerben commented Sep 18, 2024