Voting for the right anwser: Adversarial defense for speaker verification

Authors: Haibin Wu*, Yang Zhang*, Zhiyong Wu, Dong Wang, Hung-yi Lee

Abstract: Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control. However, previous works have shown that ASV is under the radar of adversarial attacks, which are very similar to their original counterparts from human’s perception, yet will manipulate the ASV render wrong prediction. Due to the very late emergence of adversarial attacks for ASV, effective countermeasures against them are limited. Given that the security of ASV is of high priority, in this work, we propose the idea of “voting for the right answer” to prevent risky decisions of ASV in blind spot areas, by employing random sampling and voting. Experimental results show that our proposed method improves the robustness against both the limited-knowledge attackers by pulling the adversarial samples out of the blind spots, and the perfect-knowledge attackers by introducing randomness and increasing the attackers’ budgets. The code for reproducing main results is available at https://github.com/thuhcsi/adsv voting.

ASV system setup (EER=2.52%)

Our system achieved 2.52% equal error rate in VoxCeleb1 test, and then we fix the model's decision thresholds τ for rest of experiments.

1. Raw audio samples from VoxCeleb1 test (FAR=2.24%, FRR=2.56%)

Enrollment	Test(target)	Test(nontarget)

2. Basic Iterative Method(BIM) adversarial attack samples (ε=5, iteration=5, FAR=71.83%, FRR=74.92%)

Enrollment	Test(target)	Test(nontarget)

3. Attack samples with random Gaussian noise for voting defense (σ=60, k=50, FAR=4.63%, FRR=10.74% )

Enrollment	Test(target)	Test(nontarget)

4. Attack samples with Gaussian filter for defense (FAR=56.86%, FRR=62.22%)

Enrollment	Test(target)	Test(nontarget)