Creates a new SSML document builder with the specified configuration.
The constructor sets up the SSML document with proper XML namespaces and version information required for speech synthesis services.
Configuration options for the SSML document
Configuration options for the SSML document root.
Defines the basic settings for an SSML document including version, language, and XML namespace declarations required for proper parsing by speech synthesis services.
Primary language/locale for the SSML document. (Required)
Sets the default language for all content. Can be overridden using the lang element for multilingual content.
Format: language-REGION (e.g., en-US, es-ES, fr-FR, zh-CN)
https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support Supported Languages
Optional
version?: stringSSML specification version.
Currently only version "1.0" is supported by the W3C specification and Azure Speech Service. This is typically set automatically.
Optional
xmlns?: stringXML namespace for SSML elements.
The standard W3C namespace for SSML. This should not be changed unless you have specific requirements.
Optional
xmlnsMstts?: stringMicrosoft Text-to-Speech namespace for Azure-specific features.
Required for using Azure Speech Service extensions like express-as, backgroundaudio, and other mstts elements.
// US English
const builder = new SSMLBuilder({ lang: 'en-US' });
// Spanish (Spain)
const builderES = new SSMLBuilder({ lang: 'es-ES' });
// French (France)
const builderFR = new SSMLBuilder({ lang: 'fr-FR' });
// With custom namespace (rare)
const customBuilder = new SSMLBuilder({
lang: 'en-US',
version: '1.0',
xmlns: 'http://www.w3.org/2001/10/synthesis',
xmlnsMstts: 'https://www.w3.org/2001/mstts'
});
https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support Supported Languages
Adds background audio that plays throughout the entire SSML document.
Background audio is useful for adding ambient music, sound effects, or any audio that should play behind the spoken content. The audio automatically adjusts its duration to match the speech duration.
Only one background audio can be set per SSML document. Calling this method multiple times will replace the previous background audio setting.
Background audio configuration
Configuration options for background audio.
Defines audio that plays throughout the entire SSML document, behind the spoken content. Useful for adding music or ambient sounds.
Optional
fadein?: stringDuration of the fade-in effect at the start.
Gradually increases volume from 0 to the specified volume level. Specified in milliseconds (ms) or seconds (s).
Optional
fadeout?: stringDuration of the fade-out effect at the end.
Gradually decreases volume from the specified level to 0. Specified in milliseconds (ms) or seconds (s).
URL of the background audio file. (Required)
Requirements:
Optional
volume?: stringVolume level of the background audio.
Can be specified as:
This SSMLBuilder instance for method chaining
// Basic background music
builder
.backgroundAudio({
src: 'https://example.com/background-music.mp3'
})
.voice('en-US-AvaNeural')
.text('This speech has background music.')
.build();
// With volume and fade effects
builder
.backgroundAudio({
src: 'https://example.com/ambient.mp3',
volume: '0.3', // 30% volume
fadein: '3000ms', // 3-second fade in
fadeout: '2000ms' // 2-second fade out
})
.voice('en-US-AvaNeural')
.text('Welcome to our podcast!')
.build();
// Ambient sound for meditation app
builder
.backgroundAudio({
src: 'https://example.com/ocean-waves.mp3',
volume: '0.5',
fadein: '5000ms'
})
.voice('en-US-EmmaNeural')
.prosody('Take a deep breath', { rate: 'slow', volume: 'soft' })
.build();
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#add-background-audio Background Audio Documentation
Builds and returns the complete SSML document as an XML string.
This method assembles all the elements added to the builder into a valid SSML document that can be sent to a text-to-speech service. The output follows the SSML 1.0 specification with Azure Speech Service extensions.
The generated XML includes:
<speak>
element with proper attributes<mstts:backgroundaudio>
element if configured<mstts:voiceconversion>
element if configuredThe complete SSML document as a formatted XML string
// Simple build
const ssml = new SSMLBuilder({ lang: 'en-US' })
.voice('en-US-AvaNeural')
.text('Hello, world!')
.build();
console.log(ssml);
// Output:
// <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
// xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
// <voice name="en-US-AvaNeural">Hello, world!</voice>
// </speak>
// Complex build with all features
const complexSSML = new SSMLBuilder({ lang: 'en-US' })
.backgroundAudio({
src: 'https://example.com/music.mp3',
volume: '0.3'
})
.voiceConversion('https://example.com/model.json')
.voice('en-US-AvaNeural')
.text('First voice')
.voice('en-US-AndrewNeural')
.text('Second voice')
.build();
// Use the generated SSML with Azure Speech SDK
const ssmlForAzure = builder.build();
synthesizer.speakSsmlAsync(ssmlForAzure);
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#speak-root-element Speak Element Documentation
Adds a voice element to the SSML document and returns a VoiceBuilder for adding content.
Each voice element can contain text, audio, and various speech modification elements. You can add multiple voice elements to create conversations or switch between different voices within the same document.
The VoiceBuilder returned from this method provides a fluent API for adding all types of content that should be spoken by this voice.
The voice name identifier (e.g., 'en-US-AvaNeural', 'en-US-AndrewMultilingualNeural'). Must be a valid voice name supported by your speech service.
Optional
effect: stringOptional voice effect to apply. Available effects include: - 'eq_car' - Optimized for car speakers - 'eq_telecomhp8k' - Telephone audio (8kHz sampling) - 'eq_telecomhp3k' - Telephone audio (3kHz sampling)
A VoiceBuilder instance for adding content to be spoken by this voice
// Single voice
builder
.voice('en-US-AvaNeural')
.text('Hello, I am Ava.')
.build();
// Multiple voices for conversation
builder
.voice('en-US-AvaNeural')
.text('Hello, Andrew!')
.voice('en-US-AndrewNeural')
.text('Hi Ava, how are you?')
.voice('en-US-AvaNeural')
.text('I am doing great, thanks!')
.build();
// Voice with effect
builder
.voice('en-US-AvaNeural', 'eq_telecomhp8k')
.text('This sounds like a phone call.')
.build();
// Multilingual voice
builder
.voice('en-US-AvaMultilingualNeural')
.text('Hello! ')
.lang('es-ES', lang => lang.text('¡Hola! '))
.lang('fr-FR', lang => lang.text('Bonjour!'))
.build();
Experimental
Adds voice conversion to transform the voice characteristics using a custom model.
Voice conversion allows you to modify the voice output using a pre-trained voice conversion model. This is an Azure Speech Service specific feature that requires a custom voice model URL.
Only one voice conversion can be set per SSML document. The conversion applies to all voices in the document.
URL of the voice conversion model. Must be: - A valid HTTPS URL to a voice conversion model - Accessible by the Azure Speech Service - In the correct format for voice conversion
This SSMLBuilder instance for method chaining
// Apply voice conversion to all voices
builder
.voiceConversion('https://example.com/custom-voice-model.json')
.voice('en-US-AvaNeural')
.text('This voice will be converted.')
.build();
// Voice conversion with multiple speakers
builder
.voiceConversion('https://example.com/voice-transform.json')
.voice('en-US-AvaNeural')
.text('Original voice A transformed.')
.voice('en-US-AndrewNeural')
.text('Original voice B also transformed.')
.build();
This feature is in preview and may change in future versions
https://docs.microsoft.com/azure/cognitive-services/speech-service/voice-conversion Voice Conversion Documentation
Main builder class for creating SSML (Speech Synthesis Markup Language) documents. Provides a fluent, type-safe API for constructing speech synthesis instructions compatible with Azure Speech Service and other SSML-compliant text-to-speech engines.
The SSMLBuilder is the entry point for creating SSML documents. It manages the root
<speak>
element and allows you to add voices, background audio, and voice conversion settings.Example
Example
See