Inserts an audio file within the paragraph. Supports fallback text if the audio file is unavailable.
Audio elements can be used to include pre-recorded sounds, music, or other audio content within the synthesized speech.
URL of the audio file (must be publicly accessible HTTPS URL)
Optional
fallbackText: stringOptional text to speak if audio fails to load
This ParagraphBuilder instance for method chaining
paragraph
.text('And now, a word from our sponsor')
.break('500ms')
.audio(
'https://example.com/jingle.mp3',
'Sponsor message here'
)
.text('Back to our content.');
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#add-recorded-audio Audio Element Documentation
Adds a pause or break within the paragraph. Can specify either duration or strength of the pause.
Breaks are useful for adding natural pauses between phrases or ideas, improving the comprehension and naturalness of synthesized speech.
Optional
options: string | BreakOptionsBreak configuration or duration string
Configuration options for break/pause elements.
Defines pauses in speech either by strength (semantic) or explicit duration. If both are specified, time takes precedence.
Optional
strength?: BreakStrengthSemantic strength of the pause.
Each strength corresponds to a typical pause duration:
x-weak
: 250ms (very short)weak
: 500ms (short, like a comma)medium
: 750ms (default, like a period)strong
: 1000ms (long, like paragraph break)x-strong
: 1250ms (very long, for emphasis)Ignored if time
is specified.
Optional
time?: stringExplicit duration of the pause.
Specified in milliseconds (ms) or seconds (s). Valid range: 0-20000ms (20 seconds max) Values above 20000ms are capped at 20000ms.
Takes precedence over strength
if both are specified.
This ParagraphBuilder instance for method chaining
// Using duration string
paragraph
.text('First point')
.break('1s')
.text('Second point');
// Using strength
paragraph
.text('Let me think')
.break({ strength: 'medium' })
.text('Yes, I remember now');
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#add-a-break Break Element Documentation
Adds emphasized speech with adjustable intensity to the paragraph. Changes the speaking style to emphasize certain words or phrases.
Text to emphasize
Optional
level: EmphasisLevelEmphasis level: 'strong' | 'moderate' | 'reduced'. Default is 'moderate'
This ParagraphBuilder instance for method chaining
paragraph
.text('This is ')
.emphasis('extremely important', 'strong')
.text(' for everyone to understand.')
.emphasis('Please pay attention', 'moderate');
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#adjust-emphasis Emphasis Element Documentation
Protected
escapeProtected
Escapes special XML characters in text content to ensure valid XML output.
This method replaces XML special characters with their corresponding entity references to prevent XML parsing errors and potential security issues (XML injection). It should be used whenever inserting user-provided or dynamic text content into XML elements.
The following characters are escaped:
&
becomes &
(must be escaped first to avoid double-escaping)<
becomes <
(prevents opening of unintended tags)>
becomes >
(prevents closing of unintended tags)"
becomes "
(prevents breaking out of attribute values)'
becomes '
(prevents breaking out of attribute values)This method is marked as protected
so it's only accessible to classes that extend
SSMLElement, ensuring proper encapsulation while allowing all element implementations
to use this essential functionality.
The text content to escape
The text with all special XML characters properly escaped
// In a render method implementation
class TextElement extends SSMLElement {
private text: string = 'Hello & "world" <script>';
render(): string {
// Escapes to: Hello & "world" <script>
return `<text>${this.escapeXml(this.text)}</text>`;
}
}
// Edge cases handled correctly
this.escapeXml('5 < 10 & 10 > 5');
// Returns: '5 < 10 & 10 > 5'
this.escapeXml('She said "Hello"');
// Returns: 'She said "Hello"'
this.escapeXml("It's a test");
// Returns: 'It's a test'
// Prevents XML injection
this.escapeXml('</voice><voice name="evil">');
// Returns: '</voice><voice name="evil">'
Specifies exact phonetic pronunciation for text within the paragraph. Provides precise control over how specific words are pronounced.
This is particularly useful for proper names, technical terms, or words that the speech synthesizer might pronounce incorrectly by default.
Text to pronounce
Phoneme configuration
Configuration options for phoneme elements.
Provides exact phonetic pronunciation using standard phonetic alphabets. Essential for proper names, technical terms, or words with ambiguous pronunciation.
Phonetic alphabet used for transcription. (Required)
Available alphabets:
ipa
: International Phonetic Alphabet (universal standard)sapi
: Microsoft SAPI phonemes (English-focused)ups
: Universal Phone Set (Microsoft's unified system)Phonetic transcription of the word. (Required)
The exact phonetic representation in the specified alphabet. Must be valid according to the chosen alphabet's rules.
This ParagraphBuilder instance for method chaining
// Using IPA (International Phonetic Alphabet)
paragraph
.text('The word ')
.phoneme('schedule', {
alphabet: 'ipa',
ph: 'ˈʃɛdjuːl' // British pronunciation
})
.text(' has different pronunciations.');
// Using SAPI
paragraph.phoneme('Azure', {
alphabet: 'sapi',
ph: 'ae zh er'
});
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#use-phonemes-to-improve-pronunciation Phoneme Element Documentation
Modifies prosody (pitch, rate, volume) of speech within the paragraph. Allows fine-grained control over how text is spoken.
Prosody modifications can make speech sound more natural and expressive, and can be used to convey emotion or emphasis.
Text to modify
Prosody configuration options
Configuration options for prosody (speech characteristics).
Controls various aspects of speech delivery including pitch, speaking rate, volume, and intonation contours. Multiple properties can be combined for complex speech modifications.
Optional
contour?: stringPitch contour changes over time.
Defines how pitch changes during speech using time-position pairs. Format: "(time1,pitch1) (time2,pitch2) ..." Time as percentage, pitch as Hz or percentage change.
Optional
pitch?: stringPitch adjustment for the speech.
Can be specified as:
Optional
range?: stringPitch range variation.
Controls the variability of pitch (monotone vs expressive). Can be relative change or named value.
Optional
rate?: stringSpeaking rate/speed.
Can be specified as:
Optional
volume?: stringVolume level of the speech.
Can be specified as:
This ParagraphBuilder instance for method chaining
// Whispered effect
paragraph.prosody('This is a secret', {
volume: 'x-soft',
rate: 'slow',
pitch: 'low'
});
// Excited speech
paragraph.prosody('Amazing news!', {
rate: 'fast',
pitch: 'high',
volume: 'loud'
});
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#adjust-prosody Prosody Element Documentation
Controls how text is interpreted and pronounced within the paragraph. Useful for dates, numbers, currency, abbreviations, and other specialized text.
This method forwards to the underlying ParagraphElement's sayAs method, allowing proper pronunciation of special text formats.
Text to interpret
Say-as configuration options
Configuration options for say-as elements.
Controls interpretation and pronunciation of formatted text like dates, numbers, currency, and other specialized content.
Optional
detail?: stringAdditional detail for interpretation.
Provides extra context for certain interpretAs types:
Optional
format?: stringFormat hint for interpretation.
Provides additional formatting information. Available formats depend on interpretAs value:
For dates:
For time:
How to interpret the text content. (Required)
Determines the pronunciation rules applied to the text. Each type has specific formatting requirements.
This ParagraphBuilder instance for method chaining
paragraph
.text('The meeting is on ')
.sayAs('2025-08-24', {
interpretAs: 'date',
format: 'ymd'
})
.text(' at ')
.sayAs('14:30', {
interpretAs: 'time',
format: 'hms24'
});
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#say-as-element Say-As Element Documentation
Adds a sentence element to the paragraph with structured content. Sentences help define clear boundaries for intonation and natural pauses.
The <s>
element explicitly marks sentence boundaries, which can improve
the naturalness of speech synthesis by ensuring proper intonation patterns.
Function that receives a SentenceElement to build sentence content
This ParagraphBuilder instance for method chaining
// Simple sentence
paragraph.sentence(s => s
.text('This is a complete sentence.')
);
// Sentence with multiple elements
paragraph.sentence(s => s
.text('The price is ')
.sayAs('42.50', { interpretAs: 'currency', detail: 'USD' })
.text(' including tax.')
);
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#specify-paragraphs-and-sentences Sentence Element Documentation
Substitutes text with an alias for pronunciation within the paragraph. Useful for acronyms, abbreviations, or text that should be pronounced differently than written.
The original text as written
How the text should be pronounced
This ParagraphBuilder instance for method chaining
paragraph
.text('The ')
.sub('WHO', 'World Health Organization')
.text(' released new guidelines.')
.sub('Dr.', 'Doctor')
.text(' Smith will present them.');
https://www.w3.org/TR/speech-synthesis11/#S3.1.11 Sub Element W3C Specification
Adds plain text content to the paragraph. Special characters (&, <, >, ", ') are automatically escaped.
The text to add to the paragraph
This ParagraphBuilder instance for method chaining
Builder class for creating paragraph elements within an SSML document. Provides a fluent API for structuring content into logical paragraph blocks with sentences and speech modifications.
Paragraphs help organize speech content and provide natural pauses between text blocks. The
<p>
element is particularly useful for longer texts where you want to maintain proper speech flow and intonation patterns.Example
Example
See
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#specify-paragraphs-and-sentences Paragraph Element Documentation