Marblenet vad for real time streaming applications #8326

arielrado · 2024-02-05T08:41:45Z

arielrado
Feb 5, 2024

Hello,

I have been trying to use nvidia’s marblenet for voice activity detection for real time audio and have run into some trouble.

following the notebook from nemo’s github, specifically the part talking about online microphone inference. When testing with some of my data I get inconsistent results. The probabilities of speech and non speech are very close to each other, reaching a verdict by a very thin margin (around 0.01), icreasing the threshold to anything above 0.5 results in constant non-speech labels.

Any insights are welcome!

smallsudarshan · 2024-02-18T08:04:02Z

smallsudarshan
Feb 18, 2024

I am facing a similar issue. My input is PCM int 16, 16khz audio and my application has a minimum step length of 0.02. Somehow, the probability is always around 0.57 regardless of whether there is speech or not. I have tried with larger step lengths but nothing changed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marblenet vad for real time streaming applications #8326

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Marblenet vad for real time streaming applications #8326

arielrado Feb 5, 2024

Replies: 1 comment

smallsudarshan Feb 18, 2024

arielrado
Feb 5, 2024

smallsudarshan
Feb 18, 2024