Class SSMLBuilder

Main builder class for creating SSML (Speech Synthesis Markup Language) documents. Provides a fluent, type-safe API for constructing speech synthesis instructions compatible with Azure Speech Service and other SSML-compliant text-to-speech engines.

The SSMLBuilder is the entry point for creating SSML documents. It manages the root <speak> element and allows you to add voices, background audio, and voice conversion settings.

Example

// Basic single voice example
const ssml = new SSMLBuilder({ lang: 'en-US' })
  .voice('en-US-AvaNeural')
    .text('Hello, world!')
  .build();

Example

// Multiple voices with background audio
const conversation = new SSMLBuilder({ lang: 'en-US' })
  .backgroundAudio({
    src: 'https://example.com/music.mp3',
    volume: '0.5',
    fadein: '2000ms'
  })
  .voice('en-US-AvaNeural')
    .text('Hello from Ava!')
  .voice('en-US-AndrewNeural')
    .text('Hello from Andrew!')
  .build();

See

https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup Azure SSML Documentation
https://www.w3.org/TR/speech-synthesis11/ W3C SSML Specification

Index

Constructors

constructor

Methods

backgroundAudio build voice voiceConversion

Constructors

constructor

new SSMLBuilder(options: SSMLOptions): SSMLBuilder
Creates a new SSML document builder with the specified configuration.

The constructor sets up the SSML document with proper XML namespaces and version information required for speech synthesis services.
Parameters
- options: SSMLOptions
  Configuration options for the SSML document
  
  Configuration options for the SSML document root.
  
  Defines the basic settings for an SSML document including version, language, and XML namespace declarations required for proper parsing by speech synthesis services.
  - lang: string
    Primary language/locale for the SSML document. (Required)
    
    Sets the default language for all content. Can be overridden using the lang element for multilingual content.
    
    Format: language-REGION (e.g., en-US, es-ES, fr-FR, zh-CN)
    
    Example
    "en-US" - English (United States)
    
    Example
    "es-ES" - Spanish (Spain)
    
    Example
    "fr-FR" - French (France)
    
    Example
    "zh-CN" - Chinese (Simplified, China)
    
    See
    https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support Supported Languages
  - Optionalversion?: string
    SSML specification version.
    
    Currently only version "1.0" is supported by the W3C specification and Azure Speech Service. This is typically set automatically.
    
    Default Value
    "1.0"
    
    Example
    "1.0"
  - Optionalxmlns?: string
    XML namespace for SSML elements.
    
    The standard W3C namespace for SSML. This should not be changed unless you have specific requirements.
    
    Default Value
    "http://www.w3.org/2001/10/synthesis"
    
    Example
    "http://www.w3.org/2001/10/synthesis"
  - OptionalxmlnsMstts?: string
    Microsoft Text-to-Speech namespace for Azure-specific features.
    
    Required for using Azure Speech Service extensions like express-as, backgroundaudio, and other mstts elements.
    
    Default Value
    "https://www.w3.org/2001/mstts"
    
    Example
    "https://www.w3.org/2001/mstts"
Returns SSMLBuilder
Example
```
// US English
const builder = new SSMLBuilder({ lang: 'en-US' });

// Spanish (Spain)
const builderES = new SSMLBuilder({ lang: 'es-ES' });

// French (France)
const builderFR = new SSMLBuilder({ lang: 'fr-FR' });

// With custom namespace (rare)
const customBuilder = new SSMLBuilder({
  lang: 'en-US',
  version: '1.0',
  xmlns: 'http://www.w3.org/2001/10/synthesis',
  xmlnsMstts: 'https://www.w3.org/2001/mstts'
});
```
See
https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support Supported Languages
- Defined in core/SSMLBuilder.ts:91

Methods

backgroundAudio

backgroundAudio(options: BackgroundAudioOptions): this
Adds background audio that plays throughout the entire SSML document.

Background audio is useful for adding ambient music, sound effects, or any audio that should play behind the spoken content. The audio automatically adjusts its duration to match the speech duration.

Only one background audio can be set per SSML document. Calling this method multiple times will replace the previous background audio setting.
Parameters
- options: BackgroundAudioOptions
  Background audio configuration
  
  Configuration options for background audio.
  
  Defines audio that plays throughout the entire SSML document, behind the spoken content. Useful for adding music or ambient sounds.
  - Optionalfadein?: string
    Duration of the fade-in effect at the start.
    
    Gradually increases volume from 0 to the specified volume level. Specified in milliseconds (ms) or seconds (s).
    
    Example
    "2000ms" - 2-second fade in
    
    Example
    "3s" - 3-second fade in
  - Optionalfadeout?: string
    Duration of the fade-out effect at the end.
    
    Gradually decreases volume from the specified level to 0. Specified in milliseconds (ms) or seconds (s).
    
    Example
    "1500ms" - 1.5-second fade out
    
    Example
    "2s" - 2-second fade out
  - src: string
    URL of the background audio file. (Required)
    
    Requirements:
    
    Must be a publicly accessible HTTPS URL
    
    Supported formats: MP3, WAV, and other common audio formats
    
    File size limits depend on the service tier
    
    Audio automatically loops if shorter than speech duration
    
    Example
    "https://example.com/background-music.mp3"
    
    Example
    "https://cdn.example.com/sounds/ambient.wav"
  - Optionalvolume?: string
    Volume level of the background audio.
    
    Can be specified as:
    
    Decimal: "0.0" (silent) to "1.0" (full volume)
    
    Percentage: "0%" to "100%"
    
    Decibels: "+0dB", "-6dB" (negative reduces volume)
    
    Default Value
    "1.0"
    
    Example
    "0.5" - 50% volume
    
    Example
    "30%" - 30% volume
    
    Example
    "-6dB" - Reduce by 6 decibels
Returns this
This SSMLBuilder instance for method chaining
Example
```
// Basic background music
builder
  .backgroundAudio({
    src: 'https://example.com/background-music.mp3'
  })
  .voice('en-US-AvaNeural')
    .text('This speech has background music.')
  .build();

// With volume and fade effects
builder
  .backgroundAudio({
    src: 'https://example.com/ambient.mp3',
    volume: '0.3',  // 30% volume
    fadein: '3000ms',  // 3-second fade in
    fadeout: '2000ms'  // 2-second fade out
  })
  .voice('en-US-AvaNeural')
    .text('Welcome to our podcast!')
  .build();

// Ambient sound for meditation app
builder
  .backgroundAudio({
    src: 'https://example.com/ocean-waves.mp3',
    volume: '0.5',
    fadein: '5000ms'
  })
  .voice('en-US-EmmaNeural')
    .prosody('Take a deep breath', { rate: 'slow', volume: 'soft' })
  .build();
```
See
https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#add-background-audio Background Audio Documentation
- Defined in core/SSMLBuilder.ts:221

build

build(): string

Builds and returns the complete SSML document as an XML string.

This method assembles all the elements added to the builder into a valid SSML document that can be sent to a text-to-speech service. The output follows the SSML 1.0 specification with Azure Speech Service extensions.

The generated XML includes:

The root <speak> element with proper attributes
Optional <mstts:backgroundaudio> element if configured
Optional <mstts:voiceconversion> element if configured
All voice elements with their content in the order they were added

Returns string

The complete SSML document as a formatted XML string

Example

// Simple build
const ssml = new SSMLBuilder({ lang: 'en-US' })
  .voice('en-US-AvaNeural')
    .text('Hello, world!')
  .build();

console.log(ssml);
// Output:
// <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" 
//        xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
//     <voice name="en-US-AvaNeural">Hello, world!</voice>
// </speak>

// Complex build with all features
const complexSSML = new SSMLBuilder({ lang: 'en-US' })
  .backgroundAudio({
    src: 'https://example.com/music.mp3',
    volume: '0.3'
  })
  .voiceConversion('https://example.com/model.json')
  .voice('en-US-AvaNeural')
    .text('First voice')
  .voice('en-US-AndrewNeural')
    .text('Second voice')
  .build();

// Use the generated SSML with Azure Speech SDK
const ssmlForAzure = builder.build();
synthesizer.speakSsmlAsync(ssmlForAzure);

See

https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup#speak-root-element Speak Element Documentation

voice

voice(name: string, effect?: string): VoiceBuilder
Adds a voice element to the SSML document and returns a VoiceBuilder for adding content.

Each voice element can contain text, audio, and various speech modification elements. You can add multiple voice elements to create conversations or switch between different voices within the same document.

The VoiceBuilder returned from this method provides a fluent API for adding all types of content that should be spoken by this voice.
Parameters
- name: string
  The voice name identifier (e.g., 'en-US-AvaNeural', 'en-US-AndrewMultilingualNeural'). Must be a valid voice name supported by your speech service.
- Optionaleffect: string
  Optional voice effect to apply. Available effects include: - 'eq_car' - Optimized for car speakers - 'eq_telecomhp8k' - Telephone audio (8kHz sampling) - 'eq_telecomhp3k' - Telephone audio (3kHz sampling)
Returns VoiceBuilder
A VoiceBuilder instance for adding content to be spoken by this voice
Example
```
// Single voice
builder
  .voice('en-US-AvaNeural')
    .text('Hello, I am Ava.')
  .build();

// Multiple voices for conversation
builder
  .voice('en-US-AvaNeural')
    .text('Hello, Andrew!')
  .voice('en-US-AndrewNeural')
    .text('Hi Ava, how are you?')
  .voice('en-US-AvaNeural')
    .text('I am doing great, thanks!')
  .build();

// Voice with effect
builder
  .voice('en-US-AvaNeural', 'eq_telecomhp8k')
    .text('This sounds like a phone call.')
  .build();

// Multilingual voice
builder
  .voice('en-US-AvaMultilingualNeural')
    .text('Hello! ')
    .lang('es-ES', lang => lang.text('¡Hola! '))
    .lang('fr-FR', lang => lang.text('Bonjour!'))
  .build();
```
See
- https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#neural-voices Available Neural Voices
- VoiceBuilder For available methods on the returned builder
- Defined in core/SSMLBuilder.ts:154

voiceConversion

voiceConversion(url: string): this
Experimental
Adds voice conversion to transform the voice characteristics using a custom model.

Voice conversion allows you to modify the voice output using a pre-trained voice conversion model. This is an Azure Speech Service specific feature that requires a custom voice model URL.

Only one voice conversion can be set per SSML document. The conversion applies to all voices in the document.
Parameters
- url: string
  URL of the voice conversion model. Must be: - A valid HTTPS URL to a voice conversion model - Accessible by the Azure Speech Service - In the correct format for voice conversion
Returns this
This SSMLBuilder instance for method chaining
Example
```
// Apply voice conversion to all voices
builder
  .voiceConversion('https://example.com/custom-voice-model.json')
  .voice('en-US-AvaNeural')
    .text('This voice will be converted.')
  .build();

// Voice conversion with multiple speakers
builder
  .voiceConversion('https://example.com/voice-transform.json')
  .voice('en-US-AvaNeural')
    .text('Original voice A transformed.')
  .voice('en-US-AndrewNeural')
    .text('Original voice B also transformed.')
  .build();
```
This feature is in preview and may change in future versions
See
https://docs.microsoft.com/azure/cognitive-services/speech-service/voice-conversion Voice Conversion Documentation
- Defined in core/SSMLBuilder.ts:264

Class SSMLBuilder

Example

Example

See

Index

Constructors

Methods

Constructors

constructor

Parameters

lang: string

Example

Example

Example

Example

See

Optionalversion?: string

Default Value

Example

Optionalxmlns?: string

Default Value

Example

OptionalxmlnsMstts?: string

Default Value

Example

Returns SSMLBuilder

Example

See

Methods

backgroundAudio

Parameters

Optionalfadein?: string

Example

Example

Optionalfadeout?: string

Example

Example

src: string

Example

Example

Optionalvolume?: string

Default Value

Example

Example

Example

Returns this

Example

See

build

Returns string

Example

See

voice

Parameters

Returns VoiceBuilder

Example

See

voiceConversion

Parameters

Returns this

Example

See

Settings

On This Page

`Optional`version?: string

`Optional`xmlns?: string

`Optional`xmlnsMstts?: string

`Optional`fadein?: string

`Optional`fadeout?: string

`Optional`volume?: string