Blind Camera – Point-and-shoot sound to image

Created by Diego Trujillo Pisanty, Blind Camera is an AI powered device that generates pictures from sound instead of light while preserving the experience of using a point-and-shoot camera. Its horn is aimed at a sound and with the push of a button it converts it into an image. The resulting ‘photo’ is indexical of the surrounding soundscape and not of the scene in front of the camera.

The project uses a custom-made artificial neural network (ANN) that finds a common representation between a sound and an image. Comparable to how the word ‘dog’ can refer to both the picture of a furry animal and to barking noise.

The ANN was trained with bespoke videos taken in Mexico City. Each video frame was associated with its previous second of sound. The network was then trained to (1) encode the sound into a vector, (2) decode it back to the matching image, and (3) try and convince another network that the resulting image is a photograph. Blind Camera thus combines elements of Autoencoders and Generative Adversarial Networks.

Restricting the training data to the city where the camera was made imbues it with a local worldview. Most artificial intelligence attempts to reduce bias, instead, Blind Camera embraces itself as an urban and Mexican object by interpreting the sounds it hears from this position. The object’s local perspective invites speculation about other non-global and underrepresented views that AI could take.

Developing an ANN and its training dataset seemed daunting and even absurd for an artist. However it provided the flexibility and control to tweak the resulting image’s aesthetics and narratives. It also allowed the code to be written foregrounding artistic intent. The process of producing Blind Camera’s code prompted reflection about how AI written by artists would work and how it could differ from currently available models.

Blind Camera‘s AI model was written using Python3 and Tensorflow 2. It combines elements of variational autoencoders and GANs. The model was trained on a single RTX 3080 graphics card and then optimized to run on a Raspberry PI 3b using TFLite.

Project Page | Diego Trujillo Pisanty