Saurabh Kataria's Homepage

Biography

I am now a postdoc fellow at Emory Nursing AI in healthcare space. Website is currently not updated. Stay tuned!

I was a PhD candidate in the Department of Electrical and Computer Engineering at Johns Hopkins University working in deep learning based speech enhancement and automatic speaker recognition. I am fortunate to work with Prof. Najim Dehak and Prof. Jesús Villalba. I am also affiliated with the Center for Language and Speech Processing and Human Language Technology Center of Excellence. Previously, I interned at Tencent America with Dr. Shi-Xiong Zhang at Dr. Dong Yu speech group, INRIA with Dr. Antoine Deleforge in Dr. Remi Gribonval PANAMA group, and New York University with Prof. Siddharth Garg. I obtained my bachelor’s and master’s degree from Indian Institute of Technology (IIT) Kanpur in Electrical Engineering while working with Prof. Tanaya Guha and Prof. Rajesh Hegde. I hold a minor in Artificial Intelligence from Dept. of Computer Science and Engineering from IIT Kanpur.

During my PhD, I worked on building speech enhancement solutions with focus on 1) state-of-the-art effectiveness (validated by real & noisy test sets from NIST Speaker Recognition Evaluation), 2) speaker-identity preservation, and 3) transfer learning. I also worked on domain adaptation and bandwidth extension (super-resolution) problems for bridging the domain gap between (low-bandwidth) telephone and (high-fidelity) microphone audio. I also collaborated on problems like spoken language recognition, adversarial signals (attack & defense), and automatic speech recognition. In terms of techniques, we relied on Generative Adversarial Networks (GAN), self-supervised models, and perceptual losses.

I am interested to work more broadly in deep learning (self-supervised learning, multi-modal learning) and human language technology (speech processing, natural language processing) problems.

News

[Dec 2023] I joined Emory University School of Nursing as a postdoctoral fellow in the Center for Data Science (CDS).

Notable Publications (reverse chronological)[Google Scholar (all papers)][ResearchGate]

Journal pre-print

Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, and Najim Dehak

Under review.

PDF BibTex

INTERSPEECH

Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition

Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, and Najim Dehak

INTERSPEECH 2023

PDF BibTex

INTERSPEECH

Advest: Adversarial perturbation estimation to classify and detect adversarial attacks against speaker identification

Sonal Joshi, Saurabh Kataria, Jesús Villalba, and Najim Dehak

INTERSPEECH 2022

PDF BibTex

INTERSPEECH

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, and Najim Dehak

INTERSPEECH 2022

PDF BibTex

INTERSPEECH

Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser

Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba, Sanjeev Khudanpur, and Najim Dehak

INTERSPEECH 2022

PDF BibTex

Odyssey

Advances in Cross-Lingual and Cross-Source Audio-Visual Speaker Recognition: The JHU-MIT System for NIST SRE21

Jesús Villalba, Bengt J. Borgstrom, Saurabh Kataria, Magdalena Rybicka, Carlos D. Castillo, Jaejin Cho, L. Paola Garcıa-Perera, Pedro A. Torres-Carrasquillo, and Najim Dehak

Odyssey 2022

PDF BibTex

ICASSP

Perceptual loss based speech denoising with an ensemble of audio pattern recognition and self-supervised models

Saurabh Kataria, Jesús Villalba, and Najim Dehak

ICASSP 2021

PDF BibTex

INTERSPEECH

Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification

Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, and Najim Dehak

INTERSPEECH 2021

PDF BibTex

INTERSPEECH

Multi-Channel Speaker Verification for Single and Multi-talker Speech

Saurabh Kataria, Shi-Xiong Zhang, and Dong Yu

INTERSPEECH 2021

PDF BibTex

ICASSP

Feature enhancement with deep feature losses for speaker verification

Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola Garcia-Perera, and Najim Dehak

ICASSP 2020

PDF BibTex

Odyssey

Analysis of deep feature loss based enhancement for speaker verification

Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, and Najim Dehak

Odyssey 2020

PDF BibTex

Odyssey

Speaker detection in the wild: Lessons learned from JSALT 2019

Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, and Najim Dehak

Odyssey 2020

PDF BibTex

ASRU

Low-resource domain adaptation for speaker recognition using cycle-gans

Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, and Najim Dehak

ASRU 2019

PDF BibTex

ICASSP

Hearing in a shoe-box: binaural source position and wall absorption estimation using virtually supervised learning

Saurabh Kataria, Clément Gaultier, and Antoine Deleforge

ICASSP 2017

PDF BibTex

LVA-ICA

VAST: The virtual acoustic space traveler dataset

Clément Gaultier, Saurabh Kataria, and Antoine Deleforge

LVA-ICA 2017

PDF BibTex

Older (course) project reports (non-peer reviewed)

Object Recognition and Object Counting using CNNs

L.S. Vishnu Sai Rao, Saurabh Kataria

PDF BibTex

SPIN

Dictionary Learning Based Applications in Image Processing using Convex Optimisation

Abhay Kumar, Saurabh Kataria

Int. Conf. on Signal Processing and Integrated Networks (SPIN), Noida, India

PDF BibTex

Scene Intensity Estimation and Ranking for Movie Scenes Through Direct Content Analysis

Saurabh Kataria, Abhay Kumar

PDF BibTex

Image segmentation using Dirichlet process mixture model

Anay Pattanaik, Anupreet Porwal, Saurabh Kataria

PDF

Foreground-background classification and ROI detection in institute surveillance footages

Dheeraj Mekala, Keerti Anand, Aakash Ghosh, Saurabh Kataria

PDF

Sound source counting using non-parametric statistical methods

Lakshay Garg, Narsupalli Navin Kumar, Prakhar Kulshreshtha, Saurabh Kataria

PDF

Representation learning for EEG signals

Saurabh Kataria, Sandeep Reddy Kothinti

PDF

Differential geometry in sensor array processing

PDF

Model compression for deep neural networks

End-to-End Speaker Verification system using Transformers

Lung sound analysis of childrens' speech