-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about reproduce the result #36
Comments
Hi, for the VoxConverse dataset, the DER result is 0.23988039048252469. |
Hi @Shoawen0213, I recommend you read issue #15 for some background on the problem of expected outputs. First of all (and this relates to your second question), the default latency in the demo is 500ms (see here), so looking at the performance of the system in Figure 5 (the one you posted) for that latency, it looks like you got the expected performance. Aside from that, note that the DER should be calculated as the DER of the total false alarm, missed detection and confusion of the entire test set, and not as the average DER of the files. You can calculate this easily like this: metric = DiarizationErrorRate()
for ref, hyp in zip(all_references, all_hypothesis):
metric(ref, hyp)
final_der = abs(metric) As mentioned in #15, this implementation is a bit different than the one used in the paper but normally the performance should be very close and possibly slightly better. I haven't had the time to measure this properly though, which is why that issue is still open. Could you please post the DER you obtain with the method I just described? |
Btw I recommend you replace these lines with: pipeline.from_source(audio_source).subscribe(RTTMWriter(path=output_dir / "output.rttm")) That way you get rid of buffering and plotting, which will accelerate inference quite a bit. This would be a nice option to add to the demo in the future actually. |
hi! thanks for your quick and useful reply! I will delve into #15 more, I have already seen that before I asked actually. another question:
BTW, Thanks for your recommendation!!!!! |
I can't get does all_references and all_hypothesis are rttm file? |
You can of course change the latency without retraining, that's one of the advantages of diart. You can modify this easily when running the demo, just add The segmentation model can be trained for any chunk duration, just keep in mind that Concerning the evaluation, you can take a look at the pyannote.metrics documentation to understand how it works. |
Hi @Shoawen0213, I just merged PR #46 into I'm aiming to release this as part of version 0.3.0. For now you can use these features by installing from the |
Hi! @juanmc2005 thanks for your reply!!! |
Hi! It's me again, sorry for bothering you. I have several questions...
Q1.
I try to reproduce the results of the paper by using the following hyper-parameters.
SO~ I test them on the AMI test dataset and VoxConverse test dataset, but the result seems different.
In the AMI dataset, there are 24 wav files. I wrote a script containing several python codes to do the testing.
For each wav file, i use the command provided below.
python -m diart.demo $wavpath --tau=0.507 --rho=0.006 --delta=1.057 --output ./output/AMI_dia_fintetune/
After obtain each rttm files, i calculate the DER for each wav files, like "der = metric(reference, hypothesis)"
The reference rttm is from "AMI/MixHeadset.test.rttm"
them I calculate "sum of DER"/total file(24files in this case), i got 0.3408511916216123 (which means 34.085% DER).
Do i do something wrong....?
I could provide the rttm or the DER for each wav file.
The VoxConverse dataset is still processing. I'm afraid I misunderstood something, so I ask about the problem first...
BTW, I use the pyannote v1.1 to do the same things, them i got 0.48516973769490174 as final DER.
# v1.1
import torch
pipeline = torch.hub.load("pyannote/pyannote-audio", "dia")
diarization = pipeline({"audio": "xxx.wav"})
So o'm afraid that i did something wrong....
Q2
At the same time, I have another question.
Here show that you try lots of methods with different latency.
Does python -m diart.demo using the "5.0 latency" way which has the greatest result in the paper?
If the answer is yes, how to change different model for other latency for inference?
And how to train this part?
Again, thanks for your awesome project!!
Sorry for all those stupid questions...
Looking forward to your reply...
The text was updated successfully, but these errors were encountered: