The workshop aims to provide a venue for researchers working on computational analysis of sound events and scene analysis to present and discuss their results.
DCASE 2017 Workshop is the second workshop on Detection and Classification of Acoustic Scenes and Events, being organized for the second time in conjunction with the DCASE challenge. We aim to bring together researchers from many different universities and companies with interest in the topic, and provide the opportunity for scientific exchange of ideas and opinions.
The technical program will include invited speakers on the topic of computational everyday sound analysis and recognition, and oral and poster presentations of accepted papers. In addition, a special poster session will be dedicated to the DCASE 2017 challenge entries.
The workshop is organized as a one-and-a-half day event, to be held on 16-17 November 2017 at Hotel Maritim, Munich, Germany.
We invite submissions on the topics of computational analysis of acoustic scenes and sound events, including but not limited to:
Tasks in computational environmental audio analysis
- Acoustic scene classification
- Sound event detection and localization
- Audio tagging
- Challenges in real-life applications (e.g., rare events, overlapping sound events, weak labels)
Methods for computational environmental audio analysis
- Signal processing methods
- Machine learning methods
- Auditory-motivated methods
- Cross-disciplinary methods involving, e.g., acoustics, biology, psychology, geography, materials science, transports science
Resources, applications, and evaluation of computational environmental audio analysis
- Publicly available datasets or software, taxonomies and ontologies, evaluation procedures
- Description of systems submitted to the DCASE 2017 Challenge
The results of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge 2017 will also be announced at the workshop.
Day 1 Thursday 16.11.2017, 9:00 - 18:00
|10:30||Oral session I||
|14:00||Oral session II||
|15:20||Poster session I||
Day 2 Friday 17.11.2017, 9:00 - 12:10
|9:50||Oral session III||
|10:50||Poster session II||
General-Purpose Sound Event Recognition
Day 1, Thursday 16.11.2017, 9:20
Inspired by the success of general-purpose object recognition in images, we have been working on automatic, real-time systems for recognizing sound events regardless of domain. Our goal is a system that can tag or describe an arbitrary soundtrack - as might be found on a media sharing site like YouTube - using terms that make sense to a human. I will cover the process of defining this task, our deep learning approach, our efforts to collect training data, and our current results. I'll discuss some factors important for accurate models, and some ideas about how to get the best return from manual labeling investment.
Shawn Hershey is a software engineer at Google Research, working in the Machine Hearing Group on machine learning for speech and audio processing. He is currently working on soundtrack classification and audio event detection. Before Google he worked as the first Software Engineer at Lyric Semiconductors, building tools to aid the development of hardware accelerators for AI. On the side, Shawn travels the world teaching Lindy Hop and blues dancing and playing in swing and blues bands. Long ago Shawn graduated from the University of Rochester with a BA in Computer Science and half of a degree from the Eastman School of Music.
Sound Texture Perception via Summary Statistics
Day 2, Friday 17.11.2017, 9:00
Sound textures are produced by superpositions of large numbers of similar acoustic features (as in rain, swarms of insects, or galloping horses). Textures are noteworthy for being stationary, raising the possibility that time-averaged statistics might capture their structure. I will describe several lines of work testing this idea. I will show how the synthesis of textures from statistics of biological auditory models provides evidence for statistical texture representations. I will then describe experiments that characterize the process by which texture statistics are measured by the auditory system, and that explore their role in auditory scene analysis.
Josh McDermott is a perceptual scientist studying sound and hearing in the Department of Brain and Cognitive Sciences at MIT, where he is the Fred & Carole Middleton Career Development Assistant Professor and heads the Laboratory for Computational Audition. His research addresses human and machine audition using tools from experimental psychology, engineering, and neuroscience. McDermott obtained a BA in Brain and Cognitive Science from Harvard, an MPhil in Computational Neuroscience from University College London, a PhD in Brain and Cognitive Science from MIT, and postdoctoral training in psychoacoustics at the University of Minnesota and in computational neuroscience at NYU. He is the recipient of a Marshall Scholarship, a James S. McDonnell Foundation Scholar Award, and an NSF CAREER Award.
Altogether 27 papers were accepted to be presented in the workshop. The full proceedings can be accessed here.
Hotel Maritim, Munich, Germany
Instructions for authors
Detailed instructions for preparation and submission of workshop papers can be found here.
The registration is currently open for authors and non-authors. The registration system (ConfTool system) can be accessed here. The registration fee is 150 Euros, including lunch and coffee breaks.
At least one author per accepted paper must register for the Workshop before 20th October 2017.
Personalized invitation letter (mentioning the name of the participant) is accessible in the user account in ConfTool after registration and payment of the workshop fee. The document is in the PDF format and can be printed for visa application purposes.
If you require a more detailed letter, please fill in your passport information in the personal details within the ConfTool user account, and send a request email to firstname.lastname@example.org. Please also mention if you need an original signature on it.