Creates a new VisemeElement instance.
The type of viseme animation data to generate.
Common values include:
- redlips_front
: Front-facing lip animation data for standard avatars
- redlips_back
: Back-facing lip animation data for alternative views
The type determines the format and characteristics of the viseme events
that will be generated during speech synthesis. Different types may
provide different levels of detail or be optimized for specific
animation systems or avatar types.
// Standard front-facing viseme for avatars
const frontViseme = new VisemeElement('redlips_front');
// Back-facing viseme for special camera angles
const backViseme = new VisemeElement('redlips_back');
// For use with animation systems
const animationViseme = new VisemeElement('redlips_front');
// The generated viseme events can be used with:
// - Unity 3D avatars
// - Unreal Engine characters
// - Web-based 3D animations (Three.js, Babylon.js)
// - Ready Player Me avatars [[1]](https://github.com/met4citizen/TalkingHead)
Protected
escapeProtected
Escapes special XML characters in text content to ensure valid XML output.
This method replaces XML special characters with their corresponding entity references to prevent XML parsing errors and potential security issues (XML injection). It should be used whenever inserting user-provided or dynamic text content into XML elements.
The following characters are escaped:
&
becomes &
(must be escaped first to avoid double-escaping)<
becomes <
(prevents opening of unintended tags)>
becomes >
(prevents closing of unintended tags)"
becomes "
(prevents breaking out of attribute values)'
becomes '
(prevents breaking out of attribute values)This method is marked as protected
so it's only accessible to classes that extend
SSMLElement, ensuring proper encapsulation while allowing all element implementations
to use this essential functionality.
The text content to escape
The text with all special XML characters properly escaped
// In a render method implementation
class TextElement extends SSMLElement {
private text: string = 'Hello & "world" <script>';
render(): string {
// Escapes to: Hello & "world" <script>
return `<text>${this.escapeXml(this.text)}</text>`;
}
}
// Edge cases handled correctly
this.escapeXml('5 < 10 & 10 > 5');
// Returns: '5 < 10 & 10 > 5'
this.escapeXml('She said "Hello"');
// Returns: 'She said "Hello"'
this.escapeXml("It's a test");
// Returns: 'It's a test'
// Prevents XML injection
this.escapeXml('</voice><voice name="evil">');
// Returns: '</voice><voice name="evil">'
Renders the viseme element as an SSML XML string.
Generates the Azure-specific <mstts:viseme>
element with the type attribute
specifying what kind of viseme data should be generated. This is a self-closing
element that doesn't contain any content. When processed by the speech synthesizer,
it enables the generation of viseme events that can be captured through the
Speech SDK for driving facial animations.
The XML string representation of the viseme element in the format:
<mstts:viseme type="type"/>
// Standard front-facing viseme
const front = new VisemeElement('redlips_front');
console.log(front.render());
// Output: <mstts:viseme type="redlips_front"/>
// Back-facing viseme
const back = new VisemeElement('redlips_back');
console.log(back.render());
// Output: <mstts:viseme type="redlips_back"/>
// Custom viseme type (if supported by service)
const custom = new VisemeElement('custom_avatar_type');
console.log(custom.render());
// Output: <mstts:viseme type="custom_avatar_type"/>
SSML element for generating viseme events for lip-sync animation (Azure-specific).
The
<mstts:viseme>
element enables the generation of viseme (visual phoneme) events during speech synthesis. Visemes represent the visual positions of the mouth, lips, and face that correspond to spoken phonemes. This Azure Speech Service specific feature is essential for creating realistic lip-sync animations for avatars, animated characters, or virtual assistants.When this element is included, the speech synthesizer generates time-aligned viseme data that can be used by 3D rendering engines or animation systems to synchronize facial movements with the audio output. Each viseme event includes timing information and blend shape data that defines how the face should be positioned at that moment in the speech.
Example
Remarks
See