Textgrid issues #752

stannam · 2021-02-16T20:53:31Z

PCT error when trying to create a corpus from a multi-tier textgrid file.

Traceback (most recent call last):
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\iogui.py", line 98, in run
    corpus = load_discourse_textgrid(**self.kwargs)
  File "D:\PycharmProjects\CorpusTools\corpustools\corpus\io\pct_textgrid.py", line 324, in load_discourse_textgrid
    data = textgrid_to_data(corpus_name, path, annotation_types, call_back=call_back, stop_check=stop_check)
  File "D:\PycharmProjects\CorpusTools\corpustools\corpus\io\pct_textgrid.py", line 203, in textgrid_to_data
    for si in spelling_tier:
TypeError: 'NoneType' object is not iterable

Following is how the textgrid file looks like. The file is shared in Dropbox (TextGrid_sample folder)

Although PCT can create corpus if all tiers are ignored except for transcription, PCT still crashes while loading it.

Traceback (most recent call last):
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\main.py", line 293, in do_check
    function(self)
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\main.py", line 398, in loadCorpus
    self.corpus.lexicon = self.compatibility_check(self.corpus.lexicon)
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\main.py", line 508, in compatibility_check
    word = corpus.random_word()
  File "D:\PycharmProjects\CorpusTools\corpustools\corpus\classes\lexicon.py", line 3217, in random_word
    word = random.choice(list(self.wordlist.keys()))
  File "C:\Users\Stanley\anaconda3\envs\PCT\lib\random.py", line 261, in choice
    raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence

The text was updated successfully, but these errors were encountered:

YuHsiangLo · 2021-04-14T03:08:13Z

The first error is caused by the mismatch between the displayed tier names and the names stored internally in Python. But I got another error even after I fixed this problem...

Traceback (most recent call last):
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/main.py", line 294, in do_check
    function(self)
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/main.py", line 452, in loadCorpus
    self.corpusModel = CorpusModel(self.corpus, self.settings)
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/models.py", line 363, in __init__
    BaseCorpusTableModel.__init__(self, corpus, settings, parent)
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/models.py", line 112, in __init__
    self.rows = self.corpus.words
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/corpus/classes/lexicon.py", line 2739, in words
    return sorted(list(self.wordlist.keys()))
TypeError: '<' not supported between instances of 'NoneType' and 'str'

YuHsiangLo · 2021-04-15T17:49:45Z

Commit 00080ce should fix the first problem.

YuHsiangLo · 2021-04-16T21:59:38Z

Okay, apparently the second error is caused by the assumption that one of the tiers imported has to be orthography. Fixing this error requires more fundamental changes on the code side. Maybe we should wait until the next meeting to see if we really want to do this at this point.

kchall · 2021-04-26T19:29:24Z

Hmm, I think there may be some flaws with how we're trying to fix this, even in the short term. I just tried to import one of the WebMaus-generated TextGrids. These have two 'transcription' tiers (by default, one called KAN and one called MAU). The KAN tier shows the canonical pronunciation of each word, while the MAU tier shows the actual transcription in situ.

-It used to be the case that these files were read in without issue, no re-naming needed. Of course, that's the ideal situation.

-Currently, the user is prompted to re-name at least one of the tiers to actually be 'Transcription,' which is definitely a bit of a pain if you have a whole directory of auto-generated TextGrids.

-If both tiers are called "Transcription," PCT throws an error and says they need to have unique names.

-If one of the tiers is labelled 'transcription' and the other has an alternative name (e.g. leaving MAU as is and telling PCT that it should vary within lexical items) or even is set to be 'other / character', the following error is thrown:
Traceback (most recent call last):
File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/iogui.py", line 97, in run
corpus = load_discourse_textgrid(**self.kwargs)
File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 355, in load_discourse_textgrid
discourse.lexicon.specifier = modernize.modernize_specifier(discourse.lexicon.specifier)
File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/modernize.py", line 110, in modernize_specifier
features = sorted(list(specifier.matrix[seg].keys()))
AttributeError: 'Segment' object has no attribute 'keys'

kchall · 2021-05-04T22:16:50Z

Part of the errors was actually from an outdated version of the feature file. Other functionality has been updated.

stannam added the bug label Feb 16, 2021

stannam mentioned this issue Mar 8, 2021

For PCT 1.5.0 #754

Closed

25 tasks

kchall closed this as completed May 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Textgrid issues #752

Textgrid issues #752

stannam commented Feb 16, 2021

YuHsiangLo commented Apr 14, 2021

YuHsiangLo commented Apr 15, 2021

YuHsiangLo commented Apr 16, 2021

kchall commented Apr 26, 2021

kchall commented May 4, 2021

Textgrid issues #752

Textgrid issues #752

Comments

stannam commented Feb 16, 2021

YuHsiangLo commented Apr 14, 2021

YuHsiangLo commented Apr 15, 2021

YuHsiangLo commented Apr 16, 2021

kchall commented Apr 26, 2021

kchall commented May 4, 2021