
The final movement of the Phantom of Utopia trilogy, The Divergence transforms the act of Chinese calligraphy into an immersive, four-dimensional meditation on memory, embodiment, and poetic abstraction. As director and co-programmer of the audio-visual system, I guided the conceptual and technical development of this extended reality performance alongside a deeply collaborative team, including literary scholar Irina Kruchinina and programmer Matin Esmaeili. Together, we wove a living visual language from the strokes and radicals of Tao Yuan-ming’s ancient text The Peach Blossom Spring—a language that dances across time, screen, and body.
Through a combination of pre-recorded OptiTrack motion capture, real-time Pose AI tracking, and 3D-scanned poses rendered in Unreal Engine, the performer navigates between static calligraphic forms and animated gestures. Visual sequences morph from full Chinese characters to their constituent radicals, dissolving into human poses and particle flows that reconstruct meaning across space. These virtual elements do not merely illustrate the poem—they become it: animated glyphs emerging from the dancer’s body like ink from a brush.
In parallel, a generative audio engine responds to the dancer’s movements, sonifying embodied gestures and tracing the emotional arc of the text through spatialized sound. Cantonese recitation fragments and reassembles in real time, guiding the listener through landscapes of displacement, nostalgia, and philosophical return.
As image, motion, and sound converge and diverge in continuous transformation, The Divergence offers not a conclusion, but a dispersion—an invitation to rediscover meaning through fragmentation and to dwell in the ambiguity between language and sensation, tradition and reinvention.
Technical Innovation
The production explores the interactivity of audio-movement and visual-movement perspectives. The audio-movement interactivity was done in Max, which runs Mubu library and Mediapipe. Mediapipe captured three-dimensional body motion data, including wrists shoulders, hip, knee, foot, etc., and reduction on dimensionality from three to two was intentional since traditional Chinese calligraphy was performed in two dimensions only, and gesture recognition from three- dimensional data for recognizing Chinese characters is not necessary. However, the sonification of dance movement that incorporates non-categorized gestures was programmed using three- dimensional motion data because the combination of three-dimensional data thoroughly and completely transcript the motions in dance that takes place in a 3D space, and provide more details in the sonification, leading to better sonic effects.
The Mubu library performs gesture recognition system on basic strokes in Chinese characters using the motion captured data from Mediapipe. Then the resultant gesture categorization triggers sonic events.
Apart from the sound created from gesture recognition, the sonification of non-categorized gestures are processed through an artificial intelligent system made from the flucoma library in Max. The system essentially takes the audio file of Cantonese recitation of the full poem, segment the sound file based on the signal spikes, list the start time in audio samples in which segments are detected, calculate the mean of of all segments, slicing the segments by the start and end timepoint, running the segments in the database, and arrange them at a sonic map through plotting.
When user moves their hand, the kdtree structure looks up nearest neighbors of the dataset, playing the corresponding sound segments. The segments then run through live signal processing that takes the motion sensing data as the input to alter the sound effects.
The visual component of the third movement is all performed in Unreal Engine, and it utilizes 3D scanning in LumaAI, FBX animation recording in OptiTrack, strokes/radicals/sentences in the third section of Peach Blossom Spring, and animation of particle system as the fabrics of materials. 3D scanning was applied to my human body poses that simulate the pictorial meaning of the strokes and/or radicals and semantic meaning of the Chinese characters. Particle systems are applied to the 3D meshes of static poses and animation, and the animating effect of the particle system is designed to arrange the emergence and disintegration of the particles attached to the meshes. FBX animation elaborates the 3D static poses to series of three-dimensionalmovements, making the words come to live. For example, the 桃 (peach), was embodied in the animation at the beginning.

The time sequence of the visual component is arranged in a loop of appearance of a sentence from the poem, morphing into one word that highlight the essence from the poem. The words then disintegrate and split into different strokes/radicals, which morph into static human body poses and animation. Different poses and animation serve as the pivotal points to the convey of literature meaning in the poem.

Demonstration of Zhong Pose animation recording in Motive, software of OptiTrack.
3D scanning of the poses are conducted in an artificial intelligence app, Luma AI. It requires 360-degree scan on top, middle and bottom view to complete the full three-dimensional imagery. Good lighting and high resolution of camera in digital gadgets are required for higher performance result.
3D scanning of a human body in various poses presents significant challenges due to the natural movement and breathing of the subject. The scanning process can capture small movements without fixed quantization, which can affect the quality of the scan. To address this, two solutions were implemented to improve scanning quality. First, for poses that were too difficult to hold for the average recording time of three to five minutes, pauses in recording were necessary. Resuming the recording required positioning the camera at an angle very close to where the pause occurred to maintain continuity. The second solution involved post-processing in Blender. This clean-up process included removing glitches attached to the body mesh, filling in missing parts, and smoothing lines and curves to enhance the overall presentation of the poses.

Demonstration of pose embodiment of a radical, 凵, in world (世 sai3)
Artistic Choice
While dancer is performing the part of choreography that does not involves the gesture recognition, sonification of her three-dimensional motion data was captured and used to express the process of turning two-dimensional Chinese calligraphy to three-dimensional imagery or vision that contains pictorial, ideological, and semantic facets of Chinese characters. Series of three-dimensional imagery that undergoes integration to form meaning in depth of the movement lead to a four-dimensional performance. For instance, the choreography of water (水 seoi2) and forest (林 lam4) was expressed as streams of waterflow and repeated pattern of woods in three-dimensional space respectively. The dancer perform motion of repeating stroke patterns of Chinese word, 木 (wood, muk6), and circle around the space repeating the series of motions of the wood to complete the semantic and pictorial meaning of forest (林 lam4). The series of movements using body embodiment to create depth in the time series that gifting meaning to the pictorial, ideological and semantic definition of the words, translates the combination of four-dimensional space-time continuum (x, y, z, t). t in the continuum is expressed as a unidirectional flow of time, which is expressed in the measurements of seconds, minutes, hours, and years, etc. The temporal dimension of the dance performance undergoes integration from smaller motion fabrics to bigger meaning creates depth.
This process is deliberately literal and straightforward to facilitate users’ learning and comprehension of calligraphic contours and movements through interaction. Users of the system or performers are encouraged not merely to replicate the poses but to infuse their personal emotions and interpretations into their interactions with the system, thereby engaging with the underlying message. An invitation is extended to users and collaborators to explore this system, anticipating a rich cultural exchange.
What does the word in the below picture mean? And what message it conveys?


Humane (人 jan4) is highlighted in the third movement, and it originated from the concept
derived from “The Five Aspects of Conduct” that explains humane (人) as an internalized interaction with ourselves. According to Shusterman (2012), Guodian translation of “The Five Aspects of Conduct” that best illustrates this intimate relation:
Humane (ren) ideas are clear; clarity of thought leads to keen insight; keen insight leads to ease; ease motivates to gentleness; gentleness leads to happiness; happiness allows a pleasant demeanor; pleasantness yields intimacy; intimacy creates loving; loving results in a jade-like countenance; jade-like countenance generates formation (形); formation results in humaneness.
The word, humane (ren) comes from the sentence此人一一為具言所聞 (translation: a villager in utopia delineated his astonishment and lamentation at every story he hears), which depicts the Utopian inhabitants inquiring from the traveler about the external world. The 180-degree orbit from a distant to a close angle illustrates their eagerness for knowledge about the chaotic outside world. The Chinese character 人 (human) symbolizes the traveler and is spatially integrated inside the poses, highlighting the Daoist principle (from Laozi’s “Tao Te Ching,” Chapter 42) that the Dao leads to unity, from which duality emerges, followed by trinity, and subsequently the myriad phenomena (道生一,一生二,二生三,三生萬物). But all phenomena would all lead back to unity through self-cultivation, making us “human”. The 180-degree orbit around the pose of humane (人) embodied the process denoted in The Five Aspects of Conduct as the close-up to long shot camera angle change shows from part to full body formation.