Creates a new VoiceBuilder instance.
Voice configuration options
Configuration options for voice elements.
Defines the voice to use for speech synthesis and optional effects that can be applied to modify the voice output.
Optional
effect?: stringOptional audio effect to apply to the voice.
Modifies the voice output to simulate different audio environments or transmission methods.
Available effects:
eq_car
: Optimized for car speakerseq_telecomhp8k
: Telephone quality (8kHz sampling)eq_telecomhp3k
: Lower quality telephone (3kHz sampling)Voice identifier for text-to-speech synthesis. (Required)
Must be a valid voice name supported by the speech service. Format typically: language-REGION-NameNeural
Common voices:
Optional
parent: SSMLBuilderOptional reference to parent SSMLBuilder for chaining
Inserts an audio file into the speech output. Supports fallback text if audio is unavailable.
URL of the audio file (must be publicly accessible HTTPS URL)
Optional
fallbackText: stringOptional text to speak if audio fails to load
This VoiceBuilder instance for method chaining
// With fallback text
voice.audio(
'https://example.com/sound.mp3',
'Sound effect here'
);
// Without fallback
voice.audio('https://example.com/music.mp3');
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#add-recorded-audio Audio Element Documentation
Controls the total audio duration (Azure Speech Service specific). Can speed up or slow down speech to fit a specific duration.
Target duration (e.g., '10s', '5000ms')
This VoiceBuilder instance for method chaining
Adds a bookmark marker in the SSML for event tracking. Does not produce any speech output.
Unique identifier for the bookmark
This VoiceBuilder instance for method chaining
voice
.text('Introduction')
.bookmark('intro_end')
.text('Main content')
.bookmark('main_start');
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#bookmark-element Bookmark Element Documentation
Adds a pause or break in speech. Can specify either duration or strength of the pause.
Optional
options: string | BreakOptionsBreak configuration or duration string
Configuration options for break/pause elements.
Defines pauses in speech either by strength (semantic) or explicit duration. If both are specified, time takes precedence.
Optional
strength?: BreakStrengthSemantic strength of the pause.
Each strength corresponds to a typical pause duration:
x-weak
: 250ms (very short)weak
: 500ms (short, like a comma)medium
: 750ms (default, like a period)strong
: 1000ms (long, like paragraph break)x-strong
: 1250ms (very long, for emphasis)Ignored if time
is specified.
Optional
time?: stringExplicit duration of the pause.
Specified in milliseconds (ms) or seconds (s). Valid range: 0-20000ms (20 seconds max) Values above 20000ms are capped at 20000ms.
Takes precedence over strength
if both are specified.
This VoiceBuilder instance for method chaining
// Using duration string
voice.break('500ms');
voice.break('2s');
// Using strength
voice.break({ strength: 'medium' });
// Using explicit time (overrides strength)
voice.break({ time: '750ms' });
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#add-a-break Break Element Documentation
Builds the complete SSML document and returns it as a string. Delegates to the parent SSMLBuilder's build method.
The complete SSML document as an XML string
Adds emphasized speech with adjustable intensity. Changes the speaking style to emphasize certain words or phrases.
Text to emphasize
Optional
level: EmphasisLevelEmphasis level: 'strong' | 'moderate' | 'reduced'. Default is 'moderate'
This VoiceBuilder instance for method chaining
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#adjust-emphasis Emphasis Element Documentation
Protected
escapeProtected
Escapes special XML characters in text content to ensure valid XML output.
This method replaces XML special characters with their corresponding entity references to prevent XML parsing errors and potential security issues (XML injection). It should be used whenever inserting user-provided or dynamic text content into XML elements.
The following characters are escaped:
&
becomes &
(must be escaped first to avoid double-escaping)<
becomes <
(prevents opening of unintended tags)>
becomes >
(prevents closing of unintended tags)"
becomes "
(prevents breaking out of attribute values)'
becomes '
(prevents breaking out of attribute values)This method is marked as protected
so it's only accessible to classes that extend
SSMLElement, ensuring proper encapsulation while allowing all element implementations
to use this essential functionality.
The text content to escape
The text with all special XML characters properly escaped
// In a render method implementation
class TextElement extends SSMLElement {
private text: string = 'Hello & "world" <script>';
render(): string {
// Escapes to: Hello & "world" <script>
return `<text>${this.escapeXml(this.text)}</text>`;
}
}
// Edge cases handled correctly
this.escapeXml('5 < 10 & 10 > 5');
// Returns: '5 < 10 & 10 > 5'
this.escapeXml('She said "Hello"');
// Returns: 'She said "Hello"'
this.escapeXml("It's a test");
// Returns: 'It's a test'
// Prevents XML injection
this.escapeXml('</voice><voice name="evil">');
// Returns: '</voice><voice name="evil">'
Expresses emotion or speaking style (Azure Speech Service specific). Only available for certain neural voices.
Text to express with style
Expression configuration
Configuration options for express-as elements (Azure-specific).
Controls emotional expression and speaking styles for neural voices that support these features. Allows for nuanced emotional delivery and role-playing scenarios.
Optional
role?: ExpressAsRoleAge and gender role for voice modification.
Simulates different speaker characteristics. Only supported by certain voices.
Emotional or speaking style to apply. (Required)
The available styles depend on the voice being used. Common categories include emotions (cheerful, sad, angry), professional styles (newscast, customerservice), and special effects (whispering, shouting).
Optional
styledegree?: stringIntensity of the style expression.
Controls how strongly the style is applied. Range: "0.01" (minimal) to "2" (double intensity)
This VoiceBuilder instance for method chaining
// Express with emotion
voice.expressAs('I am so happy to see you!', {
style: 'cheerful',
styledegree: '2'
});
// Express with role
voice.expressAs('Once upon a time...', {
style: 'narration-professional',
role: 'OlderAdultMale'
});
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#express-as-element Express-As Element Documentation
Changes language for a portion of text. Useful for multilingual content.
Language code (e.g., 'es-ES', 'fr-FR', 'de-DE')
Function to build content in the specified language
This VoiceBuilder instance for method chaining
voice
.text('Hello! ')
.lang('es-ES', lang => lang
.text('¡Hola! ')
)
.lang('fr-FR', lang => lang
.text('Bonjour!')
);
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#use-multiple-languages Lang Element Documentation
References an external pronunciation lexicon file. Allows custom pronunciations to be defined externally.
URL of the lexicon file (must be publicly accessible HTTPS URL)
This VoiceBuilder instance for method chaining
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#pronunciation-lexicon Lexicon Element Documentation
Embeds MathML content for mathematical expressions. The math content will be spoken as mathematical notation.
MathML markup string
This VoiceBuilder instance for method chaining
Adds a paragraph with structured content. Paragraphs help organize speech into logical blocks.
Function to build paragraph content
This VoiceBuilder instance for method chaining
voice.paragraph(p => p
.text('This is the first sentence. ')
.text('This is the second sentence.')
.sayAs('2025', { interpretAs: 'date' })
);
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#specify-paragraphs-and-sentences Paragraph Element Documentation
Specifies exact phonetic pronunciation using phonetic alphabets. Provides precise control over pronunciation.
Text to pronounce
Phoneme configuration
Configuration options for phoneme elements.
Provides exact phonetic pronunciation using standard phonetic alphabets. Essential for proper names, technical terms, or words with ambiguous pronunciation.
Phonetic alphabet used for transcription. (Required)
Available alphabets:
ipa
: International Phonetic Alphabet (universal standard)sapi
: Microsoft SAPI phonemes (English-focused)ups
: Universal Phone Set (Microsoft's unified system)Phonetic transcription of the word. (Required)
The exact phonetic representation in the specified alphabet. Must be valid according to the chosen alphabet's rules.
This VoiceBuilder instance for method chaining
// IPA pronunciation
voice.phoneme('tomato', {
alphabet: 'ipa',
ph: 'təˈmeɪtoʊ'
});
// SAPI pronunciation
voice.phoneme('read', {
alphabet: 'sapi',
ph: 'r eh d'
});
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#use-phonemes-to-improve-pronunciation Phoneme Element Documentation
Modifies prosody (pitch, rate, volume, contour, range) of speech. Allows fine-grained control over how text is spoken.
Text to modify
Prosody configuration options
Configuration options for prosody (speech characteristics).
Controls various aspects of speech delivery including pitch, speaking rate, volume, and intonation contours. Multiple properties can be combined for complex speech modifications.
Optional
contour?: stringPitch contour changes over time.
Defines how pitch changes during speech using time-position pairs. Format: "(time1,pitch1) (time2,pitch2) ..." Time as percentage, pitch as Hz or percentage change.
Optional
pitch?: stringPitch adjustment for the speech.
Can be specified as:
Optional
range?: stringPitch range variation.
Controls the variability of pitch (monotone vs expressive). Can be relative change or named value.
Optional
rate?: stringSpeaking rate/speed.
Can be specified as:
Optional
volume?: stringVolume level of the speech.
Can be specified as:
This VoiceBuilder instance for method chaining
// Slow and quiet speech
voice.prosody('Speaking slowly and quietly', {
rate: 'slow',
volume: 'soft',
pitch: 'low'
});
// Precise numeric values
voice.prosody('Precise control', {
rate: '0.8',
pitch: '+5%',
volume: '+10dB'
});
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#adjust-prosody Prosody Element Documentation
Controls how text is interpreted and pronounced. Useful for dates, numbers, currency, abbreviations, etc.
Text to interpret
Say-as configuration
Configuration options for say-as elements.
Controls interpretation and pronunciation of formatted text like dates, numbers, currency, and other specialized content.
Optional
detail?: stringAdditional detail for interpretation.
Provides extra context for certain interpretAs types:
Optional
format?: stringFormat hint for interpretation.
Provides additional formatting information. Available formats depend on interpretAs value:
For dates:
For time:
How to interpret the text content. (Required)
Determines the pronunciation rules applied to the text. Each type has specific formatting requirements.
This VoiceBuilder instance for method chaining
// Date interpretation
voice.sayAs('2025-08-24', {
interpretAs: 'date',
format: 'ymd'
});
// Currency
voice.sayAs('42.50', {
interpretAs: 'currency',
detail: 'USD'
});
// Spell out
voice.sayAs('SSML', { interpretAs: 'spell-out' });
// Phone number
voice.sayAs('1234567890', { interpretAs: 'telephone' });
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#say-as-element Say-As Element Documentation
Adds a sentence with structured content. Sentences help define clear boundaries for intonation and pauses.
Function to build sentence content
This VoiceBuilder instance for method chaining
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#specify-paragraphs-and-sentences Sentence Element Documentation
Adds silence at specific positions in the speech. More precise than break element for controlling silence placement.
Silence configuration
Configuration options for silence elements.
Provides precise control over silence placement in speech output, with options for various positions and boundary types.
Position and type of silence to add. (Required)
Determines where silence is inserted:
Duration of the silence. (Required)
Specified in milliseconds (ms) or seconds (s). Valid range: 0-20000ms (20 seconds max)
For non-exact types, this is added to natural silence. For exact types, this replaces natural silence.
This VoiceBuilder instance for method chaining
// Add silence between sentences
voice.silence({ type: 'Sentenceboundary', value: '500ms' });
// Add leading silence
voice.silence({ type: 'Leading', value: '200ms' });
// Add exact silence at comma
voice.silence({ type: 'Comma-exact', value: '150ms' });
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#add-silence Silence Element Documentation
Substitutes text with an alias for pronunciation. Useful for acronyms or text that should be pronounced differently.
Original text to display
How the text should be pronounced
This VoiceBuilder instance for method chaining
Adds plain text to be spoken by the voice. Special characters (&, <, >, ", ') are automatically escaped.
The text to be spoken
This VoiceBuilder instance for method chaining
Uses a custom speaker profile for voice synthesis (Azure Speech Service specific). Requires a pre-trained speaker profile.
ID of the speaker profile
Text to speak with the custom voice
This VoiceBuilder instance for method chaining
Adds viseme information for lip-sync animations (Azure Speech Service specific). Used for avatar or character animation synchronization.
Viseme type (e.g., 'redlips_front', 'redlips_back')
This VoiceBuilder instance for method chaining
Switches to a different voice while maintaining the fluent API chain. Allows multiple voices in the same SSML document.
Name of the new voice (e.g., 'en-US-AndrewNeural')
Optional
effect: stringOptional voice effect for the new voice
A new VoiceBuilder instance for the specified voice
Builder class for creating voice-specific content within an SSML document. Provides a fluent API for adding text, pauses, emphasis, prosody, and other speech synthesis features.
This class encapsulates all the content that will be spoken by a specific voice, including text, audio, and various speech modification elements.
Example
Example
See
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#voice-element Azure Voice Element Documentation