-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phonological search: Add option to list up target segments separately in summary results #801
Comments
GUI changes. #801 (neg/pos) search type overrides the present summary option.
So, that’s alarming that our machines are giving us different results! But you also seem to have deleted the comment that showed that example, so maybe you discovered something in the meantime?
BUT, if I do the same search but ask for the results to be separated out by targets, I get 5 for [s] and 9 for [ʃ]. While it’s fine in this case for words that have both [s] AND [ʃ] to get ‘double counted’ (one in each result row), that should give the result of 4 for [s] and 6 for [ʃ]. But instead, it’s doing the double-counting targets by word that you had originally reported, @stannam, so that e.g. [sasi] counts as two ‘types’ for [s] instead of 1. Note that if I do now go back to the summary results, it reverts completely to the not-separated-by-target option (as mentioned in point (1) above), i.e., that entails re-combining the type frequencies and not double-counting within words, i.e., I get 8 again and not 14 (which would be 9+5): I think the desired behaviour should be:
[As a side note, I’ll just comment to remind myself: I did check to see whether the behaviour was the same if the targets were specified with features instead of segments, i.e., [+continuant, -syllabic], and the results are identical to those described above.] |
|
I think I fixed the type frequency issue. I tested on the following two types of use case scenarios. It worked as expected. 1. Use case 1: Searching for [s, ʃ] (see above)
. 2. Use case 2: Searching for [s, ʃ], but the environment is specified as [_a, _i]
@kchall Can you update to the latest codes and test on your machine whether the summary setting is static and the results are correct? |
This should fix the issue that Kathleen reported.
cf. #356 |
Thanks, Stanley! This mostly seems to be fixed, but there are two separate follow-ups. I'll list them in two separate comments for clarity. (But more generally, I can confirm that the combined searches all seem to be working for me as expected, i.e., searching for [s, ʃ] gets me a result of 8 types and 446 tokens; asking PCT to separating the results out by target segments gives me 4 types and 179 tokens for [s], and 6 types and 275 tokens for [ʃ], and the results are now 'sticky' / 'static' rather than reverting to unseparated out when toggling between individual and summary results). |
The second issue is more about clarity of what we mean when we someone selects "List target segments separately in summary results." Currently, what is actually happening (and thanks for pointing this out through your second test case above!) is that both the targets AND the environments are getting split out. For example, if the basic search definition is to search for [s, ʃ] before [i, a], and the results are separated, PCT currently separates out both the targets and the environments, yielding four results rows (for [si], [sa], [ʃi], and [ʃa]) (and this seems to be working correctly). But with the way the option is worded, I would have expected it to separate out the targets and NOT the environments, so that you get just two results rows (for [s] before [i, a] and [ʃ] before [i, a]). I think we should proceed as follows:
|
This change partially undoes f359d5e where I made the summary result to have a 0 frequency row when the search has no hit. The introduction of 0 frequency rows was due to the new way of summary presentation. It is not relevant to the old way.
Thanks, Kathleen for looking into this. re: One search with separate environments gives wrong numbers I think I fixed this error now. I confirmed that the numbers are correct. It turned out that a conditional I added to the summary function (f359d5e) overapplies to some words. I simply made changes to bypass this conditional if the user selected the 'split' option. Previously when I was working on the summary results, I added a feature that makes 'zero frequency rows' if some conditions are met. For example, if we search for a readily illegal sequence like [gst] on example without the 'split' option, PCT still returns a row, rather than an empty window. It depends on whether the main phonological search function returned None ('not found'). The logic is: if there is a value, then a search must have happened with that parameters, but None means nothing found --- so add a row but give zero values for token and type frequencies. However, for some reason, the phonological search function wrongly adds redundant None values if more than one environment is used. These redundant None values are wrongly translated by the summary side as a need for adding zero frequencies, messing the frequency counts for the second environment. For now, I do not understand why this is happening in the main function. Someone should investigate this eventually, but I think it does not need to be a top priority since the final results are not affected either on the individual or summary window. re: Splitting the environment segments I emailed the user to ask if the other option would be needed. Once I hear back, I will make changes accordingly. |
Thanks, @stannam! But, weirdly, this doesn't seem to have fixed the error on my Mac -- I synced to the latest version and see your commit in the history, but when I do the two searches together, I still get the wrong result for the second one: (Interestingly, this does work correctly if I check the option for separating out the targets in the results!) |
I have updated the https://github.com/PhonologicalCorpusTools/CorpusTools/blob/master/docs/source/phonological_search.rst documentation, including updating the existing search interface images, to include the "List separately" option. So far, though, it doesn't seem to be actually syncing with https://corpustools.readthedocs.io/en/latest/phonological_search.html -- the build is failing. But I'm not savvy enough to understand what the actual issue is. |
Passed Tests 1, 2, 3.1, 3.2, and 3.3 Need more work to pass test 3.3, i.e., syllable search with the listing by seg set as False and [sa] and [ʃa] in a single env.
@kchall could you synced to the latest codes and see if problems are all fixed? re: documentation build failing (#801 (comment)) re: fixing the PS summary function |
There are issues that have been unnoticed because they didn't raise errors (raised warnings). In the API reference part, couple of subsections are missing for example: https://corpustools.readthedocs.io/en/v1.5.0/apireference.html#textgrids https://corpustools.readthedocs.io/en/v1.5.0/apireference.html#functional-load That parts were not updated when the corresponding codes were changed, causing 'warnings'. Well, the warnings have now evolved (?) into errors and newer version of sphinx just refuses to build any new commits. This was the cause of the problem mentioned in this comment #801 (comment)
I can confirm that all of the above seems to be working correctly now on my end as well! With the (exception?) that in part 3 above, "[sɑ] [ʃɑ] Syllable search," I am able to do a negative search as long as I have the environments combined into a single set and don't try to separate out the results. My syllabified example corpus doesn't have [ga.ga] in it, but the results for the search do seem to be correct for my corpus, i.e., with type frequency of 11 and token of 362 for "words that do not contain either [sɑ] or [ʃɑ]." So, I think it is all working correctly...yay! |
I think I was wrong in Part 3 (syllable search) about the negative search option. The documentation does not exclude the option to do a negative search with syllables, and I can also confirm that the negative search works correctly. However, selecting the 'syllables' button wrongly greys out 'negative.' The option reopens when 'New environment' is clicked. I think this is a very strange behaviour, so I'll disable it. After that, new executables will be created. yay! / Notes to (future) myself: why that behaviour? CorpusTools/corpustools/gui/psgui.py Lines 694 to 696 in adfb9a7
Out of concern that any changes would not end well, I dug into my notes trying to find why I added the codes in the first place. As shown above, I did intentionally disable neg syllables searches in 2021, but that edit should have been reverted already. Disabling neg + syll was a solution to this problem #746 (comment) (part 4) where the target and environment columns became empty in the negative syllable search results. I think I just removed the problem as a makeshift (how ostrich!). When issue #746 was fixed, the codes above were not needed and I should have deleted them. Instead, I added more codes that reenact the disabled option 🤦♂️. So when changing from the seg mode to syllable mode, the negative option got disabled -> enabled -> and then disabled again (with more than 1 env).. |
Currently, the phonological search result window repeats the user input. But it will be better off by adding an option to view the summary results in the old way, i.e., each row by segments.
Need to implement/solve/decide
Test the function
Update documentation
The text was updated successfully, but these errors were encountered: