On the social and technical conditions of machine listening
This doctoral thesis investigates machine listening, defined as the computational automation of sound recognition and event detection, through two original conceptual frameworks: soundscape addressability and algorithmic filtering. Working against the tendency in cultural criticism to diagnose AI’s social biases without engaging its technical substrates, the research develops a critique that moves between signal processing, archival theory, and philosophy of perception.
Through artistic experimentation with a corpus of Australian Magpie vocalisations, the thesis demonstrates how machine listening systems classify and recombine sound at scales inaccessible to human audition. This is not done by imposing verbal categories onto pre-existing sonic data, but by synthesising categories from statistical structure and bioacoustics knowledge. Soundscape addressability theorises the archival and bibliographical dimensions of this process; algorithmic filtering challenges the sieve metaphor common in neural network discourse, arguing that filters produce rather than refine categories.
The central argument is that machine listening does not substitute human perception but redistributes its labour — a technological doubling that retains deep dependence on human judgement despite performing the appearance of autonomy. Social biases, in this account, are not intrinsic to algorithms but emerge contextually, through the operation of these systems within complex sociotechnical environments.