Slim Essid's research page

ADASP reseach group | S²A team | LTCI lab | Télécom Paris | Institut Polytechnique de Paris

me.jpg

Research interests

Machine Learning, Artificial Intelligence and Signal Processing, especially:

  • multimodal and multiview learning;
  • representation learning, in particular self-supervised learning;
  • structured prediction;

with applications to:

  • multimodal language models, especially audio-vision language models;
  • speech processing, machine listening, music content analysis (MIR);
  • multimodal perception, social and affective computing;
  • physiological, especially EEG, data analysis.

For more information about my research activities check my publications. You can also read about the research projects I have been involved in, including those of the PhD students and post-docs I have advised.

News

  • Feb. 2nd 2025: Our PhD student David Perera successfully defended his thesis.
  • Dec. 12th 2024: 5 papers accepted at ICASSP 2025.
  • Nov. 6th 2024: Our PhD student Morgan Buisson successfully defended his thesis.
  • Sep. 25th 2024: 2 papers accepted at NeurIPS 2024.

Short bio

Slim Essid is Full Professor of Télécom Paris and the coordinator of the Audio Data Analysis and Signal Processing (ADASP) group. He received the state engineering degree from the École Nationale d’Ingénieurs de Tunis in 2001; the M.Sc. (D.E.A.) degree in digital communication systems from the École Nationale Supérieure des Télécommunications, Paris, France, in 2002; the Ph.D. degree from the Université Pierre et Marie Curie (UPMC), in 2005; and the habilitation (HDR) degree from UPMC in 2015.

Over the past 15 years, he has been involved in various French and European research projects. He has collaborated with 14 post-docs and has graduated 15 PhD students; he is currently co-advising 10 others. He has published over 150 peer-reviewed conference and journal papers with more than 100 distinct co-authors. On a regular basis he serves as a reviewer for various machine learning, signal processing, audio and multimedia conferences and journals, for instance various IEEE transactions, and as an expert for research funding agencies.

Selected recent publications

  1. taco-overview.png
    TACO: TRAINING-FREE SOUND PROMPTED SEGMENTATION VIA SEMANTICALLY CONSTRAINED AUDIO-VISUAL CO-FACTORIZATION
    H. Malard, M. Olvera, S. Lathuiliere, and S. Essid
    2025
    Pre-print
  2. perera_neurips-24.png
    ANNEALED MULTIPLE CHOICE LEARNING: OVERCOMING LIMITATIONS OF WINNER-TAKES-ALL WITH ANNEALING
    D. Perera, V. Letzelter, T. Mariotte, A. Cortes, G. Richard, S. Essid, and M. Chen
    In Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024) , Dec 2024
  3. malard_neurips-24.png
    AN EYE FOR AN EAR: ZERO-SHOT AUDIO DESCRIPTION LEVERAGING AN IMAGE CAPTIONER WITH AUDIO-VISUAL TOKEN DISTRIBUTION MATCHING
    H. Malard, M. Olvera, S. Lathuilière, and S. Essid
    In Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024) , Dec 2024
  4. larger_probes_speech-24.png
    SPEECH SELF-SUPERVISED REPRESENTATIONS BENCHMARKING: A CASE FOR LARGER PROBING HEADS
    S. Zaiem, Y. Kemiche, T. Parcollet, S. Essid, and M. Ravanelli
    Computer Speech & Language, Dec 2024
  5. letzelter_icml-24.png
    WINNER-TAKES-ALL LEARNERS ARE GEOMETRY-AWARE CONDITIONAL DENSITY ESTIMATORS
    V. Letzelter, D. Perera, C. Rommel, M. Fontaine, S. Essid, G. Richard, and P. Pérez
    In International Conference on Machine Learning (ICML 2024) , Jul 2024
  6. benigmim_cvpr-24.png
    COLLABORATING FOUNDATION MODELS FOR DOMAIN GENERALIZED SEMANTIC SEGMENTATION
    Y. Benigmim, S. Roy, S. Essid, V. Kalogeiton, and S. Lathuilière
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) , Jul 2024
  7. buisson_taslp-24.png
    SELF-SUPERVISED LEARNING OF MULTI-LEVEL AUDIO REPRESENTATIONS FOR MUSIC SEGMENTATION
    M. Buisson, B. Mcfee, S. Essid, and H. Crayencour
    IEEE/ACM Transactions on Audio, Speech and Language Processing, Mar 2024
  8. letzelter_neurips-23.png
    RESILIENT MULTIPLE CHOICE LEARNING: A LEARNED SCORING SCHEME WITH APPLICATION TO AUDIO SCENE ANALYSIS
    V. Letzelter, M. Fontaine, P. Perez, G. Richard, S. Essid, and M. Chen
    In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023) , Dec 2023
  9. zaiem_jstsp-23.png
    PRETEXT TASKS SELECTION FOR MULTITASK SELF-SUPERVISED AUDIO REPRESENTATION LEARNING
    S. Zaiem, T. Parcollet, S. Essid, and A. Heba
    IEEE Journal of Selected Topics in Signal Processing, Dec 2022
  10. furnon_taslp-21.png
    DNN-BASED MASK ESTIMATION FOR DISTRIBUTED SPEECH ENHANCEMENT IN SPATIALLY UNCONSTRAINED MICROPHONE ARRAYS
    N. Furnon, R. Serizel, S. Essid, and I. Illina
    IEEE/ACM Transactions on Audio, Speech and Language Processing, Dec 2021
  11. parekh_taslp-19.png
    WEAKLY SUPERVISED REPRESENTATION LEARNING FOR AUDIO-VISUAL SCENE ANALYSIS
    S. Parekh, S. Essid, A. Ozerov, N. Duong, P. Pérez, and G. Richard
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, Dec 2019

Contact

Télécom Paris - Room 5C
19, place Marguerite Perey 91120 Palaiseau - FRANCE
Indications on how to get there can be found here.