VRN: Notating the Voice the Way We Notate Movement

What Music Notation Actually Captures

Western music notation is, by any reasonable measure, one of the great achievements of human knowledge transmission. A score written in 1750 can be performed in 2026 with reasonable fidelity to what the composer intended, by performers who have never met the composer, never spoken their language, and never seen any other music from the same period. The notation captures pitch, duration, dynamics, articulation, and a layer of expressive markings that are precise enough for ensemble coordination and broad enough to leave room for individual interpretation.

What it does not capture, and was never designed to capture, is how the performer is supposed to produce those sounds. The score for a soprano aria specifies the notes. It does not specify whether to use head voice or chest voice, how to manage the passaggio, what kind of onset to use on each phrase, or how to shape the resonance for the vowel on a sustained high A. All of that — the whole field of vocal technique — has historically been transmitted from teacher to student through demonstration, imitation, and verbal coaching, with no notational layer at all.

This works because vocal pedagogy has been, for most of its history, an oral tradition. A student studies with a teacher. The teacher demonstrates. The student approximates. The teacher corrects. Over thousands of repetitions, the student internalizes a way of producing sound that the teacher is also using. The notation in the score never had to capture the production technique because the production technique was being transmitted in person, in real time, by a body in the same room.

The arrangement breaks down the moment the teacher is not in the room.

What the IPA Captures

For students of vocal music who study song texts in unfamiliar languages, the International Phonetic Alphabet is an essential second notation. The IPA documents which phonemes appear in a piece's text, with enough precision to let a non-native speaker produce a recognizable approximation of the language. A soprano studying a Russian art song uses an IPA transcription to know that a particular vowel is closed-front-rounded rather than open-mid-front-unrounded.

The IPA is excellent at what it does. It documents the targets — the specific articulator configurations that produce a given speech sound — with sufficient detail for any phonetician anywhere to read a transcription and understand it. For singers, this is a real and important capability that the score alone does not provide.

What the IPA does not address is anything outside the articulators. It says nothing about how the voice is supposed to be supported, how the resonance space is supposed to be shaped behind the consonant, what kind of register the singer is supposed to be in, or how the onset of a phrase is supposed to be initiated. It is precise about the front of the vocal tract and silent about everything else.

The Missing Layer

What vocal pedagogy needs, but has never had a standard notation for, is the layer between "what notes" and "what phonemes" — the layer of vocal production technique. The dimensions that matter at this layer are reasonably well understood by any serious vocal teacher; they just have not been formalized into a notation that crosses traditions.

Roughly, the missing layer includes:

Onset — how a phrase begins. Is the breath initiated before the tone, simultaneously, or after? Is the onset hard, soft, breathy, or balanced?
Resonance placement — where the singer perceives the sound's primary resonance. Forward, in the mask. Back, in the dome of the soft palate. Mixed.
Register — chest, head, mixed, falsetto, whistle. Transitions between registers, and where they happen relative to the phrase.
Breath management — how the air is supplied to the phrase. The shape of the breath cycle. The timing of expansions and contractions in the abdominal and intercostal musculature.
Vibrato characteristics — presence or absence, rate, depth, and the timing of its onset within a sustained note.
Dynamic shape independent of volume — the perceived weight or lightness of a tone, which can vary independently of its measured loudness.

Every working vocal coach uses some informal subset of this vocabulary in lessons. Different traditions emphasize different elements. A bel canto teacher will spend much of a lesson on resonance placement and onset; a contemporary commercial music coach will spend much of a lesson on register transitions; a speech pathologist working with a transitioning patient will spend much of a lesson on breath management and resonant placement together.

None of these traditions has a written notation for what they are teaching. The teacher demonstrates, the student approximates, and what gets written down on the page (if anything) is a few words of margin annotation that mean nothing to anyone outside the lesson.

Vocal Resonance Notation

Vocal Resonance Notation, or VRN, is the AIUNITES network's attempt to formalize this missing layer. The name reflects the original motivation — resonance is the dimension that classical vocal pedagogy spends the most time on and the one with the least existing notation — but the format has expanded to cover the broader set of dimensions listed above.

VRN is a plain-text notation. It is designed to sit alongside a musical score rather than replace it; the score continues to specify pitch and rhythm, while VRN annotations specify how the voice is supposed to produce those pitches. A simple VRN string for the opening of a phrase looks like this:

ONSET[soft,H*] -> PITCH[A4,chest] -> RES[forward] -> DYN[mp,full]

This annotation specifies a soft onset (with an aspirate, marked H*) leading into an A4 produced in chest voice, with forward resonance placement, at mezzo-piano dynamic with a full tonal weight. The arrows indicate phases within a single phrase rather than separate pitches. A more elaborate phrase, with a register transition and a vibrato onset, can be expressed with additional sub-elements without changing the underlying grammar.

The full VRN specification, currently at version 1.0, defines roughly seventy-five symbols organized into eight categories: onsets, pitch and register, resonance, dynamics, vibrato, articulation, breath, and transitions. The grammar is designed to be readable by humans, parseable by software, and writable by hand — a teacher can write a VRN annotation in a notebook during a lesson and a student can read it later without needing a special viewer.

Why a text format

An obvious objection to VRN is that vocal production is a continuous, real-time phenomenon and a discrete text notation will inevitably oversimplify. This is true, and it is also true of every other notation that has ever been useful. Music notation oversimplifies pitch and timing. The IPA oversimplifies articulation. Movement notation oversimplifies movement. The question is not whether a notation captures everything — it cannot — but whether it captures enough to be useful.

For VRN, "enough" means: enough that a vocal teacher can write down what they want a student to do and have the student read the same instructions in a later session and understand them; enough that two teachers using the same notation can agree about whether a student's production matches a target; enough that software can store, search, and operate on vocal training programs in a way that respects the production technique rather than just the notes.

A binary or graphical format would arguably capture more nuance, but at the cost of requiring specialized tools to write and read. Plain text is the lowest common denominator. It can be written by hand in a notebook, typed into any text editor, version-controlled with the same tools used for source code, and parsed by any programming language without specialized libraries. The choice of plain text is a choice in favor of accessibility over expressive maximum.

The HMN Umbrella

VRN is not a standalone project. It is one of several notation systems within the AIUNITES Human Movement Notation (HMN) family. The other major member of the family is MNN — Muscular Neuro Notation — which addresses the same kind of problem for body movement. The umbrella spec defines shared syntactic primitives so that a system that can parse one HMN-family notation has most of the work done for parsing the others.

The unification matters because the underlying problems are structurally similar. The voice is, in physiological terms, a particular kind of muscular activity. The diaphragm, intercostals, abdominal wall, larynx, soft palate, and articulators are all muscles being controlled by the singer in coordinated patterns. A notation that captures coordinated muscular activity for the limbs and trunk has substantial overlap with one that captures coordinated muscular activity for vocal production. Designing the two notations under a shared umbrella makes that overlap explicit and reuses the design work.

It also produces a tooling benefit. A notation editor that supports HMN can support VRN with relatively little additional work. The UMN Studio demonstrates this directly — a single editor surface that handles both movement and vocal notation, sharing parser infrastructure between them.

What VRN Enables

The point of formalizing a notation is to make new things possible. For VRN, several categories of application are within reach as the notation matures.

Vocal exercises that move with the student

A student who studies with one teacher in one city, then moves and studies with a different teacher in a different city, currently has no good way to bring their vocal work with them. The new teacher has to start from listening, imitation, and trial and error to figure out what the student has been working on. With VRN, the previous teacher's exercise library, written in the standard notation, transfers directly. The new teacher reads what the student has been practicing, sees the resonance and onset goals, and can build on the existing work rather than reconstructing it.

Cross-tradition coaching

Different vocal traditions emphasize different production techniques, and the vocabulary used to describe them often differs even when the underlying technique is similar. A formalized notation gives the traditions a shared substrate. A classical voice teacher and a contemporary commercial music coach can compare notes about a student in a way that is not possible when the notation each uses is purely verbal and informal.

Voice agents and TTS systems

Modern text-to-speech systems and voice synthesis tools are getting genuinely good, but their training data is, almost without exception, raw audio paired with text. The systems do not have access to the resonance, onset, and register annotations that a human voice teacher would naturally add. A corpus of audio paired with VRN annotations would be a different kind of training resource — one that lets a system learn not just how a particular voice sounds, but how that voice is being produced.

Speech therapy goals that follow patients

Speech-language pathology has many of the same continuity-of-care problems that vocal pedagogy has. A patient who is working on resonance shifts as part of voice feminization training, or who is learning to manage breath support after a partial laryngectomy, has goals that are partly captured in a clinical chart but are mostly stored in the practitioner's head. A formal notation gives those goals a written form that survives the patient's relocation, change of provider, or insurance-driven interruptions in care.

Where VRN Is Today

VRN version 1.0 specification is published and freely available. The AIUNITES VoiceStry application uses VRN as its native format for vocal exercise programs and is the primary proving ground for the notation in real use. The specification is open; the parser is open-source; the symbols can be used by anyone, in any application, without licensing or attribution requirements.

The notation will continue to evolve as it accumulates real use. The current version is sufficient for the scope of vocal pedagogy that classical and contemporary commercial traditions actually need. The dimensions that are most likely to require expansion are non-Western vocal traditions — throat singing, yodeling, qawwali, and the variety of vocal techniques in folk traditions worldwide — which use production techniques that the current symbol set may not yet capture with adequate fidelity.

For now, the goal is the same as it was for MNN, and the same as it has been for every successful open standard: provide a format that is genuinely useful for the case that motivated it, keep the specification freely available, and let the field of users develop the notation further as they encounter the limits of the current version.

The voice has always been notatable in the same sense that movement has always been notatable — the physical phenomena are real and reasonably well understood. What has been missing is the convention. VRN is one attempt at providing that convention.

Read the VRN Specification

The full Vocal Resonance Notation specification — symbol tables, grammar, and worked examples across vocal traditions — is freely available alongside the broader HMN umbrella spec.

Explore VRN & HMN → Try the UMN Studio →