Fluks is a generative video installation that transforms speech into a living artwork by utilizing a number of cutting-edge AI technologies. It was first displayed at the 175 year anniversary of investment company Ferd AS, where there would be a debate about the cultural, societal and environmental impacts of AI. As the participants discussed, the topics were visualized in real-time on the giant cinema screen behind them.
The first version of the installation was built in 2023 and called Flux, and was a strictly visual installation, without the AI-enhancements.
Fluks was designed and developed from scratch by myself. Everything from the AI integrations, to the visual processing, to the UI and midi interface.
Fluks transforms speech into live art by utilizing three different types of AI processing.
First, an audio signal from stage is fed into OpenAI's Whisper library, where the Norwegian speech is transcribed in plain text. Fluks uses a Large Language Model (LLM) to turn this text into an English image prompt. The model I eventually chose for Fluks was Llama 3 by Meta. Finally, the prompt is piped into Stable Diffusion where an image gets generated. The model I used was SDXL Lightning 4-step. On a powerful machine with an Nvidia 4090 GPU, full HD images get generated in less that 1.7 seconds.
The generated images are blended together using a novel method that makes them flow together in painterly swirls, perfectly emulating the ethereal and free-flow nature of verbal communication. In simple terms, this is done by displacing the images using the RGB values of their own pixels, while at the same time gradually cross-fading between the current and next image in queue.
To ensure stability and control in a high-stakes live setting, much effort was put into the UI that the operator uses to control the installation. It allows for previewing and overriding every step of the AI processing, enabling text overlays, enabling a control net that utilizes the camera feed to influence the generations, fine-tuning the visual transitions between images and their durations, and much more. It was all mapped to a MIDI Fighter Twister controller for ease of use.