ºìÐÓÊÓÆµ

Skip to main content

Mahya Khazaei

  • BEng (Iran University of Science and Technology, 2018)

Notice of the Final Oral Examination for the Degree of Master of Science

Topic

Real-Time Gesture-Based Sound Control System

Department of Computer Science

Date & location

  • Wednesday, December 11, 2024

  • 9:30 A.M.

  • Engineering Computer Science Building

  • Room 467

Reviewers

Supervisory Committee

  • Dr. George Tzanetakis, Department of Computer Science, University of Victoria (Supervisor)

  • Dr. Alex Thomo, Department of Computer Sciences, UVic (Member) 

External Examiner

  • Dr. Jentery Sayers, Department of English, University of Victoria

Chair of Oral Examination

  • Prof. Ajtony Csaba Szakacs, School of Music, UVic

     

Abstract

This thesis presents a real-time, human-in-the-loop music control and manipulation system that dynamically adapts audio outputs based on the analysis of human movement captured via live-stream video. This project creates a responsive link between visual and auditory stimuli, fostering an interactive experience where dancers not only respond to music but dynamically influence it through their movements. The system enhances live performances, interactive installations, and personal entertainment, creating an immersive experience where users’ movements directly shape the music in real time. This project demonstrates how machine learning and signal processing techniques can create responsive audio-visual systems that evolve with each movement, bridging human interaction and machine response in a seamless loop.

The system leverages computer vision techniques and machine learning tools to track and interpret the motion of individuals dancing or moving, enabling them to participate actively in shaping audio adjustments, such as tempo, pitch, effects, and playback sequence in real time. Constantly improving through ongoing training, the system allows users to generalize models for user-independent use by providing varied samples; around 50–80 samples are typically sufficient to label a simple gesture. Through an integrated pipeline of gesture training, cue mapping, and audio manipulation, this human-centered system continuously adapts to user input. Gestures are trained as signals from human to model, mapped to sound control commands, and then used to naturally manipulate audio elements.