DCASE2017 Workshop

Workshop on Detection and Classification of Acoustic Scenes and Events
16 - 17 November 2017, Munich, Germany
The workshop aims to provide a venue for researchers working on computational analysis of sound events and scene analysis to present and discuss their results.

DCASE 2017 Workshop is the second workshop on Detection and Classification of Acoustic Scenes and Events, being organized for the second time in conjunction with the DCASE challenge. We aim to bring together researchers from many different universities and companies with interest in the topic, and provide the opportunity for scientific exchange of ideas and opinions.

The technical program will include invited speakers on the topic of computational everyday sound analysis and recognition, and oral and poster presentations of accepted papers. In addition, a special poster session will be dedicated to the DCASE 2017 challenge entries.

The workshop is organized as a one-and-a-half day event, to be held on 16-17 November 2017 at Hotel Maritim, Munich, Germany.


We invite submissions on the topics of computational analysis of acoustic scenes and sound events, including but not limited to:

Tasks in computational environmental audio analysis

  • Acoustic scene classification
  • Sound event detection and localization
  • Audio tagging
  • Challenges in real-life applications (e.g., rare events, overlapping sound events, weak labels)

Methods for computational environmental audio analysis

  • Signal processing methods
  • Machine learning methods
  • Auditory-motivated methods
  • Cross-disciplinary methods involving, e.g., acoustics, biology, psychology, geography, materials science, transports science

Resources, applications, and evaluation of computational environmental audio analysis

  • Publicly available datasets or software, taxonomies and ontologies, evaluation procedures
  • Applications
  • Description of systems submitted to the DCASE 2017 Challenge

The results of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge 2017 will also be announced at the workshop.

Preliminary schedule

Day 1, Thursday 16.11.2017

9:00 - 18:00

Lunch and coffee breaks included

  • Keynote
    General-Purpose Sound Event Recognition
    Dan Ellis, Google

  • Technical sessions (oral presentations and posters)
  • DCASE Challenge results analysis
  • Open discussion

Day 2, Friday 17.11.2017

9:00 - 12:30

Morning coffee break included

  • Keynote
    Sound Texture Perception via Summary Statistics
    Josh McDermott, MIT

  • Technical sessions (oral presentations and posters)
  • DCASE Challenge special session

Keynote speakers

General-Purpose Sound Event Recognition

Dan Ellis

Sound Understanding Group, Google Research
Day 1, Thursday 16.11.2017

Inspired by the success of general-purpose object recognition in images, we have been working on automatic, real-time systems for recognizing sound events regardless of domain. Our goal is a system that can tag or describe an arbitrary soundtrack - as might be found on a media sharing site like YouTube - using terms that make sense to a human. I will cover the process of defining this task, our deep learning approach, our efforts to collect training data, and our current results. I'll discuss some factors important for accurate models, and some ideas about how to get the best return from manual labeling investment.


Daniel P. W. Ellis received the Ph.D. degree in electrical engineering from MIT. He spent several years as a Research Scientist at ICSI in Berkeley, CA. In 2000, he took a faculty position with Columbia University, New York. In 2015, he left for his current position as a Research Scientist with Google in New York. His research is concerned with all aspects of extracting high-level information from audio, including speech recognition, music description, and environmental sound processing, with around 200 publications. He also runs the AUDITORY email list of over 3000 worldwide researchers in perception and cognition of sound.

Sound Texture Perception via Summary Statistics

Josh McDermott

Fred & Carole Middleton Career Development Assistant Professor, Department of Brain and Cognitive Science, Massachusetts Institute of Technology
Day 2, Friday 17.11.2017

Sound textures are produced by superpositions of large numbers of similar acoustic features (as in rain, swarms of insects, or galloping horses). Textures are noteworthy for being stationary, raising the possibility that time-averaged statistics might capture their structure. I will describe several lines of work testing this idea. I will show how the synthesis of textures from statistics of biological auditory models provides evidence for statistical texture representations. I will then describe experiments that characterize the process by which texture statistics are measured by the auditory system, and that explore their role in auditory scene analysis.


Josh McDermott is a perceptual scientist studying sound and hearing in the Department of Brain and Cognitive Sciences at MIT, where he is the Fred & Carole Middleton Career Development Assistant Professor and heads the Laboratory for Computational Audition. His research addresses human and machine audition using tools from experimental psychology, engineering, and neuroscience. McDermott obtained a BA in Brain and Cognitive Science from Harvard, an MPhil in Computational Neuroscience from University College London, a PhD in Brain and Cognitive Science from MIT, and postdoctoral training in psychoacoustics at the University of Minnesota and in computational neuroscience at NYU. He is the recipient of a Marshall Scholarship, a James S. McDonnell Foundation Scholar Award, and an NSF CAREER Award.

Accepted papers

Altogether 26 papers were accepted to be presented in the workshop. The list of accepted papers can be accessed here.

Full proceedings will be available shortly before the Workshop.


Hotel Maritim, Munich, Germany

