Replies: 1 comment
-
I am facing a similar issue. My input is PCM int 16, 16khz audio and my application has a minimum step length of 0.02. Somehow, the probability is always around 0.57 regardless of whether there is speech or not. I have tried with larger step lengths but nothing changed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I have been trying to use nvidia’s marblenet for voice activity detection for real time audio and have run into some trouble.
following the notebook from nemo’s github, specifically the part talking about online microphone inference. When testing with some of my data I get inconsistent results. The probabilities of speech and non speech are very close to each other, reaching a verdict by a very thin margin (around 0.01), icreasing the threshold to anything above 0.5 results in constant non-speech labels.
Any insights are welcome!
Beta Was this translation helpful? Give feedback.
All reactions