SSML Builder Documentation - v1.0.1
    Preparing search index...

    Class SSMLBuilder

    Main builder class for creating SSML (Speech Synthesis Markup Language) documents. Provides a fluent, type-safe API for constructing speech synthesis instructions compatible with Azure Speech Service and other SSML-compliant text-to-speech engines.

    The SSMLBuilder is the entry point for creating SSML documents. It manages the root <speak> element and allows you to add voices, background audio, and voice conversion settings.

    // Basic single voice example
    const ssml = new SSMLBuilder({ lang: 'en-US' })
    .voice('en-US-AvaNeural')
    .text('Hello, world!')
    .build();
    // Multiple voices with background audio
    const conversation = new SSMLBuilder({ lang: 'en-US' })
    .backgroundAudio({
    src: 'https://example.com/music.mp3',
    volume: '0.5',
    fadein: '2000ms'
    })
    .voice('en-US-AvaNeural')
    .text('Hello from Ava!')
    .voice('en-US-AndrewNeural')
    .text('Hello from Andrew!')
    .build();
    Index

    Constructors

    • Creates a new SSML document builder with the specified configuration.

      The constructor sets up the SSML document with proper XML namespaces and version information required for speech synthesis services.

      Parameters

      • options: SSMLOptions

        Configuration options for the SSML document

        Configuration options for the SSML document root.

        Defines the basic settings for an SSML document including version, language, and XML namespace declarations required for proper parsing by speech synthesis services.

        • lang: string

          Primary language/locale for the SSML document. (Required)

          Sets the default language for all content. Can be overridden using the lang element for multilingual content.

          Format: language-REGION (e.g., en-US, es-ES, fr-FR, zh-CN)

          "en-US" - English (United States)
          
          "es-ES" - Spanish (Spain)
          
          "fr-FR" - French (France)
          
          "zh-CN" - Chinese (Simplified, China)
          
        • Optionalversion?: string

          SSML specification version.

          Currently only version "1.0" is supported by the W3C specification and Azure Speech Service. This is typically set automatically.

          "1.0"
          
          "1.0"
          
        • Optionalxmlns?: string

          XML namespace for SSML elements.

          The standard W3C namespace for SSML. This should not be changed unless you have specific requirements.

          "http://www.w3.org/2001/10/synthesis"
          
          "http://www.w3.org/2001/10/synthesis"
          
        • OptionalxmlnsMstts?: string

          Microsoft Text-to-Speech namespace for Azure-specific features.

          Required for using Azure Speech Service extensions like express-as, backgroundaudio, and other mstts elements.

          "https://www.w3.org/2001/mstts"
          
          "https://www.w3.org/2001/mstts"
          

      Returns SSMLBuilder

      // US English
      const builder = new SSMLBuilder({ lang: 'en-US' });

      // Spanish (Spain)
      const builderES = new SSMLBuilder({ lang: 'es-ES' });

      // French (France)
      const builderFR = new SSMLBuilder({ lang: 'fr-FR' });

      // With custom namespace (rare)
      const customBuilder = new SSMLBuilder({
      lang: 'en-US',
      version: '1.0',
      xmlns: 'http://www.w3.org/2001/10/synthesis',
      xmlnsMstts: 'https://www.w3.org/2001/mstts'
      });

    Methods

    • Adds background audio that plays throughout the entire SSML document.

      Background audio is useful for adding ambient music, sound effects, or any audio that should play behind the spoken content. The audio automatically adjusts its duration to match the speech duration.

      Only one background audio can be set per SSML document. Calling this method multiple times will replace the previous background audio setting.

      Parameters

      • options: BackgroundAudioOptions

        Background audio configuration

        Configuration options for background audio.

        Defines audio that plays throughout the entire SSML document, behind the spoken content. Useful for adding music or ambient sounds.

        • Optionalfadein?: string

          Duration of the fade-in effect at the start.

          Gradually increases volume from 0 to the specified volume level. Specified in milliseconds (ms) or seconds (s).

          "2000ms" - 2-second fade in
          
          "3s" - 3-second fade in
          
        • Optionalfadeout?: string

          Duration of the fade-out effect at the end.

          Gradually decreases volume from the specified level to 0. Specified in milliseconds (ms) or seconds (s).

          "1500ms" - 1.5-second fade out
          
          "2s" - 2-second fade out
          
        • src: string

          URL of the background audio file. (Required)

          Requirements:

          • Must be a publicly accessible HTTPS URL
          • Supported formats: MP3, WAV, and other common audio formats
          • File size limits depend on the service tier
          • Audio automatically loops if shorter than speech duration
          "https://example.com/background-music.mp3"
          
          "https://cdn.example.com/sounds/ambient.wav"
          
        • Optionalvolume?: string

          Volume level of the background audio.

          Can be specified as:

          • Decimal: "0.0" (silent) to "1.0" (full volume)
          • Percentage: "0%" to "100%"
          • Decibels: "+0dB", "-6dB" (negative reduces volume)
          "1.0"
          
          "0.5" - 50% volume
          
          "30%" - 30% volume
          
          "-6dB" - Reduce by 6 decibels
          

      Returns this

      This SSMLBuilder instance for method chaining

      // Basic background music
      builder
      .backgroundAudio({
      src: 'https://example.com/background-music.mp3'
      })
      .voice('en-US-AvaNeural')
      .text('This speech has background music.')
      .build();

      // With volume and fade effects
      builder
      .backgroundAudio({
      src: 'https://example.com/ambient.mp3',
      volume: '0.3', // 30% volume
      fadein: '3000ms', // 3-second fade in
      fadeout: '2000ms' // 2-second fade out
      })
      .voice('en-US-AvaNeural')
      .text('Welcome to our podcast!')
      .build();

      // Ambient sound for meditation app
      builder
      .backgroundAudio({
      src: 'https://example.com/ocean-waves.mp3',
      volume: '0.5',
      fadein: '5000ms'
      })
      .voice('en-US-EmmaNeural')
      .prosody('Take a deep breath', { rate: 'slow', volume: 'soft' })
      .build();
    • Builds and returns the complete SSML document as an XML string.

      This method assembles all the elements added to the builder into a valid SSML document that can be sent to a text-to-speech service. The output follows the SSML 1.0 specification with Azure Speech Service extensions.

      The generated XML includes:

      • The root <speak> element with proper attributes
      • Optional <mstts:backgroundaudio> element if configured
      • Optional <mstts:voiceconversion> element if configured
      • All voice elements with their content in the order they were added

      Returns string

      The complete SSML document as a formatted XML string

      // Simple build
      const ssml = new SSMLBuilder({ lang: 'en-US' })
      .voice('en-US-AvaNeural')
      .text('Hello, world!')
      .build();

      console.log(ssml);
      // Output:
      // <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
      // xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
      // <voice name="en-US-AvaNeural">Hello, world!</voice>
      // </speak>

      // Complex build with all features
      const complexSSML = new SSMLBuilder({ lang: 'en-US' })
      .backgroundAudio({
      src: 'https://example.com/music.mp3',
      volume: '0.3'
      })
      .voiceConversion('https://example.com/model.json')
      .voice('en-US-AvaNeural')
      .text('First voice')
      .voice('en-US-AndrewNeural')
      .text('Second voice')
      .build();

      // Use the generated SSML with Azure Speech SDK
      const ssmlForAzure = builder.build();
      synthesizer.speakSsmlAsync(ssmlForAzure);
    • Adds a voice element to the SSML document and returns a VoiceBuilder for adding content.

      Each voice element can contain text, audio, and various speech modification elements. You can add multiple voice elements to create conversations or switch between different voices within the same document.

      The VoiceBuilder returned from this method provides a fluent API for adding all types of content that should be spoken by this voice.

      Parameters

      • name: string

        The voice name identifier (e.g., 'en-US-AvaNeural', 'en-US-AndrewMultilingualNeural'). Must be a valid voice name supported by your speech service.

      • Optionaleffect: string

        Optional voice effect to apply. Available effects include: - 'eq_car' - Optimized for car speakers - 'eq_telecomhp8k' - Telephone audio (8kHz sampling) - 'eq_telecomhp3k' - Telephone audio (3kHz sampling)

      Returns VoiceBuilder

      A VoiceBuilder instance for adding content to be spoken by this voice

      // Single voice
      builder
      .voice('en-US-AvaNeural')
      .text('Hello, I am Ava.')
      .build();

      // Multiple voices for conversation
      builder
      .voice('en-US-AvaNeural')
      .text('Hello, Andrew!')
      .voice('en-US-AndrewNeural')
      .text('Hi Ava, how are you?')
      .voice('en-US-AvaNeural')
      .text('I am doing great, thanks!')
      .build();

      // Voice with effect
      builder
      .voice('en-US-AvaNeural', 'eq_telecomhp8k')
      .text('This sounds like a phone call.')
      .build();

      // Multilingual voice
      builder
      .voice('en-US-AvaMultilingualNeural')
      .text('Hello! ')
      .lang('es-ES', lang => lang.text('¡Hola! '))
      .lang('fr-FR', lang => lang.text('Bonjour!'))
      .build();
    • Experimental

      Adds voice conversion to transform the voice characteristics using a custom model.

      Voice conversion allows you to modify the voice output using a pre-trained voice conversion model. This is an Azure Speech Service specific feature that requires a custom voice model URL.

      Only one voice conversion can be set per SSML document. The conversion applies to all voices in the document.

      Parameters

      • url: string

        URL of the voice conversion model. Must be: - A valid HTTPS URL to a voice conversion model - Accessible by the Azure Speech Service - In the correct format for voice conversion

      Returns this

      This SSMLBuilder instance for method chaining

      // Apply voice conversion to all voices
      builder
      .voiceConversion('https://example.com/custom-voice-model.json')
      .voice('en-US-AvaNeural')
      .text('This voice will be converted.')
      .build();

      // Voice conversion with multiple speakers
      builder
      .voiceConversion('https://example.com/voice-transform.json')
      .voice('en-US-AvaNeural')
      .text('Original voice A transformed.')
      .voice('en-US-AndrewNeural')
      .text('Original voice B also transformed.')
      .build();

      This feature is in preview and may change in future versions