AI Sculpting – The unpredictable strategies and outcomes of co-creation

Created by onformative, a studio for digital art and design based in Berlin, AI Sculpting is an exploration into a machine-learning process. Imagined as a tool to provide assistance to a conventional approach to sculpting, aka subtractive manufacturing, here an AI model is developed to seek out strategies that provide a constant improvement to how a given form is achieved. By feeding it with different tools, rules and rewards through reinforcement learning, the team steer the process revealing unpredictable outcomes.

The core aspect of this co-creation process is that we — to an extent — let go of control. We seek an equilibrium of setting a frame, giving instructions and observing the outcome: a simple block being transformed into an increasingly recognizable shape with every iteration.


With the given goal to sculpt a 3D model, the AI was trained through reinforcement learning based on rewards and punishments. The agent, which is defined as a certain machine-learning model, was programmed to seek maximum reward. In the voxel-based environment infinite data and a clear reward structure was provided for the agent to move through. The starting state of the environment was one big cube. Out of this, the agent needed to remove mass to get closer to a predefined target state. With each step, the agent could decide where to go and if to remove a mass of voxels around itself and how. To enable its learning, it was conditioned in a specific way: when extraneous mass was removed it was rewarded and it was given a penalty when mass that ought to be part of the final sculpture was removed. Through trial and error the agent created strategies to achieve the desired shape by removing mass from the original cube. The AI was able to perform many training sets at the same time, which means a great variety of sculptural output. Observing the evolution of the learning curve including the strategies, behaviour and visual output, the team started to experiment with different parameters and predefined rules. 

A first RL experiment in 2D. top-left: predefined state the agent needs to draw. top-right: current state, bottom-left: the residual pixels (target – current), bottom-right: the agent’s view, having only partial observability of the environment. 

The environment is voxel-based, and the agent (a machine learning model) can move around through this environment. Each step, the agent can decide where to go and if to remove a mass of voxels around itself. It’s sculpting objective is one big cube/block out of which it needs to remove mass to get closer to a predefined target state. The target state is a 3D shape (most often a human posing) and the agent, this puts each voxel in one of three states: (1) target → don’t remove, (2) residual → remove, (3) empty. 

As the agent is removing as many residual voxels in as few steps as possible it is rewarded. However, to avoid collateral target voxel damage, the reward is scaled down simply to encourage it to work in as few steps as possible. Similarly, to obtain a more visually interesting process, the team incentivised the agent to do this by giving higher reward for voxels further away from the cube’s centre. The tools the agent uses also vary in size, ie the larger ones being more risky thus  the agent tends to converge to use only its smaller tools. To encourage a more balanced tool-usage early on, the agent gets an extra reward bonus for using the bigger tool sizes.

To learn the optimal balance, the agent was trained using deep reinforcement learning. Specifically, the team used a PPO implementation with Unity’s ml-agents library. During training, the agent was exposed to random samples of the following set of environments that it must sculpt within n steps: Random complex, Orientation, Orientation-obstacle, Motorblock and Validation. The team also used signed distance functions and raymarching to render the environment, apply actions, import target shapes, and generate the complex shapes during training. Their custom SDF API was faster than instancing voxels every step, and it looks smoother as well. In the end they could let the agent sculpt in real time, with an acceptable FPS.

The Agent’s Training – Reinforcement Learning visualised in Houdini

For higher quality output in terms of visual effects, and more artistic freedom for designers with regards to materials / lighting / physics, we replicated the agent’s behaviour in Houdini. This replication was done by storing all of the agent’s actions to a JSON file and reading that into Houdini. Also for generating data-driven sound to accompany the renders, they used the same JSON file to transform the agent’s behaviour and reward signals into MIDI notes. The MIDI signals and velocities were then mapped in Ableton to generate sound: Different tools drove different instruments, and reward was mapped to filter frequencies and loudness in the composition.

In the attempt to interpret the agent’s decision-making the team experimented with visualizing AI data such as confidence, or penalty and reward for individual steps within the 3D environment. By highlighting the agent’s path through the block, for example, they gained another perspective on the process itself.

Confidence, or penalty and reward for individual Agent steps visualised
Highlighting the agent’s path through the block

Finally, by implementing different tools for the agent to choose from, the process allowed more complexity and resulted in a variety and depth of strategies as well as visual outcomes. By regarding the traces of different tools on the surface of the emerging shapes, the sculpting process becomes visible. Various tools have their own unique fingerprint — from rough to fine and quick to slow — each decision the agent made by choosing a different tool or way of orientation had a different visual outcome and impact.

Various tools have their own unique fingerprint — from rough to fine and quick to slow — each decision the agent undertakes by choosing a different tool or way of orientation has a different visual outcome and impact.

For more information about the project, see the paper team put together which explains more of the above mentioned in a lot more detail.

Tools used: Unity3D, Ml-agents, Houdini, Ableton, open-source Grid Sensor by MBaske (for an extra efficient visual sensors implementation), the mesh-to-sdf python pip package (transforming mesh files into SDF) and Python’s MIDO package (converting data to MIDI).

Project Page | onformative | Detailed Paper

Credits: Cedric Kiefer (Creative Direction), Tobias Ziegler (Production), Mark Tensen (ML / Reinforcement Learning), Alexander Hahn (Lead 3D Artist), Norman Wassmuth, Piotr Rymer, Bernd Marbach (Code & Design).