Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textgrid issues #752

Closed
stannam opened this issue Feb 16, 2021 · 5 comments
Closed

Textgrid issues #752

stannam opened this issue Feb 16, 2021 · 5 comments
Labels

Comments

@stannam
Copy link
Member

stannam commented Feb 16, 2021

  • PCT error when trying to create a corpus from a multi-tier textgrid file.
Traceback (most recent call last):
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\iogui.py", line 98, in run
    corpus = load_discourse_textgrid(**self.kwargs)
  File "D:\PycharmProjects\CorpusTools\corpustools\corpus\io\pct_textgrid.py", line 324, in load_discourse_textgrid
    data = textgrid_to_data(corpus_name, path, annotation_types, call_back=call_back, stop_check=stop_check)
  File "D:\PycharmProjects\CorpusTools\corpustools\corpus\io\pct_textgrid.py", line 203, in textgrid_to_data
    for si in spelling_tier:
TypeError: 'NoneType' object is not iterable

Following is how the textgrid file looks like. The file is shared in Dropbox (TextGrid_sample folder)
image

  • Although PCT can create corpus if all tiers are ignored except for transcription, PCT still crashes while loading it.
Traceback (most recent call last):
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\main.py", line 293, in do_check
    function(self)
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\main.py", line 398, in loadCorpus
    self.corpus.lexicon = self.compatibility_check(self.corpus.lexicon)
  File "D:\PycharmProjects\CorpusTools\corpustools\gui\main.py", line 508, in compatibility_check
    word = corpus.random_word()
  File "D:\PycharmProjects\CorpusTools\corpustools\corpus\classes\lexicon.py", line 3217, in random_word
    word = random.choice(list(self.wordlist.keys()))
  File "C:\Users\Stanley\anaconda3\envs\PCT\lib\random.py", line 261, in choice
    raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence
@stannam stannam added the bug label Feb 16, 2021
@stannam stannam mentioned this issue Mar 8, 2021
25 tasks
@YuHsiangLo
Copy link
Contributor

The first error is caused by the mismatch between the displayed tier names and the names stored internally in Python. But I got another error even after I fixed this problem...

Traceback (most recent call last):
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/main.py", line 294, in do_check
    function(self)
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/main.py", line 452, in loadCorpus
    self.corpusModel = CorpusModel(self.corpus, self.settings)
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/models.py", line 363, in __init__
    BaseCorpusTableModel.__init__(self, corpus, settings, parent)
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/gui/models.py", line 112, in __init__
    self.rows = self.corpus.words
  File "/Users/YuHsiangLo/Documents/CorpusTools/corpustools/corpus/classes/lexicon.py", line 2739, in words
    return sorted(list(self.wordlist.keys()))
TypeError: '<' not supported between instances of 'NoneType' and 'str'

@YuHsiangLo
Copy link
Contributor

Commit 00080ce should fix the first problem.

@YuHsiangLo
Copy link
Contributor

Okay, apparently the second error is caused by the assumption that one of the tiers imported has to be orthography. Fixing this error requires more fundamental changes on the code side. Maybe we should wait until the next meeting to see if we really want to do this at this point.

@kchall
Copy link
Member

kchall commented Apr 26, 2021

Hmm, I think there may be some flaws with how we're trying to fix this, even in the short term. I just tried to import one of the WebMaus-generated TextGrids. These have two 'transcription' tiers (by default, one called KAN and one called MAU). The KAN tier shows the canonical pronunciation of each word, while the MAU tier shows the actual transcription in situ.

-It used to be the case that these files were read in without issue, no re-naming needed. Of course, that's the ideal situation.

-Currently, the user is prompted to re-name at least one of the tiers to actually be 'Transcription,' which is definitely a bit of a pain if you have a whole directory of auto-generated TextGrids.

-If both tiers are called "Transcription," PCT throws an error and says they need to have unique names.

-If one of the tiers is labelled 'transcription' and the other has an alternative name (e.g. leaving MAU as is and telling PCT that it should vary within lexical items) or even is set to be 'other / character', the following error is thrown:
Traceback (most recent call last):
File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/iogui.py", line 97, in run
corpus = load_discourse_textgrid(**self.kwargs)
File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 355, in load_discourse_textgrid
discourse.lexicon.specifier = modernize.modernize_specifier(discourse.lexicon.specifier)
File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/modernize.py", line 110, in modernize_specifier
features = sorted(list(specifier.matrix[seg].keys()))
AttributeError: 'Segment' object has no attribute 'keys'

@kchall
Copy link
Member

kchall commented May 4, 2021

Part of the errors was actually from an outdated version of the feature file. Other functionality has been updated.

@kchall kchall closed this as completed May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants