SSML Builder Documentation - v1.0.1
    Preparing search index...

    Class ProsodyElement

    SSML element for modifying prosodic properties of speech.

    The <prosody> element provides fine-grained control over various aspects of speech synthesis including pitch, speaking rate, volume, and intonation patterns. This element is essential for creating more natural and expressive synthesized speech by adjusting how text is spoken rather than what is spoken. Prosody modifications can convey emotion, emphasis, or create specific speaking styles like whispering or shouting.

    All prosody attributes are optional and can be combined to achieve complex speech modifications. The element affects all text and child elements within its scope, allowing for nested prosody elements with cumulative effects.

    // Slow, quiet speech
    const whisper = new ProsodyElement('This is a secret', {
    rate: 'slow',
    volume: 'soft',
    pitch: 'low'
    });
    whisper.render();
    // Output: <prosody rate="slow" volume="soft" pitch="low">This is a secret</prosody>

    // Excited speech
    const excited = new ProsodyElement('Amazing news!', {
    rate: '1.2',
    pitch: '+10%',
    volume: 'loud'
    });

    // Use with SSMLBuilder
    const ssml = new SSMLBuilder({ lang: 'en-US' })
    .voice('en-US-AvaNeural')
    .prosody('Speaking slowly', { rate: 'slow' })
    .text(' and then ')
    .prosody('speaking quickly!', { rate: 'fast' })
    .build();
    • All attributes are optional; if none are specified, default speech characteristics are used
    • Prosody can be nested, with inner elements inheriting and modifying outer settings
    • The prosody element can contain text and other elements including audio, break, p, phoneme, prosody, say-as, sub, and s [[11]]
    • Extreme values may produce unnatural-sounding speech
    • Special XML characters in text are automatically escaped
    • Different voices may interpret prosody values slightly differently

    Hierarchy (View Summary)

    Index

    Constructors

    Methods

    Constructors

    • Creates a new ProsodyElement instance.

      Parameters

      • text: string

        The text content to be spoken with modified prosody. Special XML characters (&, <, >, ", ') are automatically escaped. Can be any length of text from a single word to multiple paragraphs.

      • options: ProsodyOptions

        Configuration object for prosody modifications. All properties are optional.

        Configuration options for prosody (speech characteristics).

        Controls various aspects of speech delivery including pitch, speaking rate, volume, and intonation contours. Multiple properties can be combined for complex speech modifications.

        • Optionalcontour?: string

          Pitch contour changes over time.

          Defines how pitch changes during speech using time-position pairs. Format: "(time1,pitch1) (time2,pitch2) ..." Time as percentage, pitch as Hz or percentage change.

          "(0%,+5Hz) (50%,+10Hz) (100%,+5Hz)" - Rising intonation
          
          "(0%,+20Hz) (100%,-10Hz)" - Falling intonation
          
        • Optionalpitch?: string

          Pitch adjustment for the speech.

          Can be specified as:

          • Absolute frequency: "200Hz", "150Hz"
          • Relative change: "+2st" (semitones), "+10%", "-5%"
          • Named values: "x-low", "low", "medium", "high", "x-high"
          "high" - High pitch
          
          "+10%" - 10% higher
          
          "200Hz" - Specific frequency
          
          "-2st" - 2 semitones lower
          
        • Optionalrange?: string

          Pitch range variation.

          Controls the variability of pitch (monotone vs expressive). Can be relative change or named value.

          "x-low" - Very monotone
          
          "high" - Very expressive
          
          "+10%" - 10% more variation
          
        • Optionalrate?: string

          Speaking rate/speed.

          Can be specified as:

          • Multiplier: "0.5" (half speed), "2.0" (double speed)
          • Percentage: "+10%", "-20%"
          • Named values: "x-slow", "slow", "medium", "fast", "x-fast"
          "slow" - Slow speech
          
          "1.5" - 50% faster
          
          "+25%" - 25% faster
          
        • Optionalvolume?: string

          Volume level of the speech.

          Can be specified as:

          • Numeric: "0" to "100" (0=silent, 100=loudest)
          • Percentage: "50%", "80%"
          • Decibels: "+10dB", "-5dB"
          • Named values: "silent", "x-soft", "soft", "medium", "loud", "x-loud"
          "soft" - Quiet speech
          
          "loud" - Loud speech
          
          "50" - 50% volume
          
          "+5dB" - 5 decibels louder
          

      Returns ProsodyElement

      // Whisper effect
      const whisper = new ProsodyElement('This is confidential', {
      volume: 'x-soft',
      rate: 'slow',
      pitch: 'low'
      });

      // Shouting effect
      const shout = new ProsodyElement('Look out!', {
      volume: 'x-loud',
      pitch: 'high',
      rate: 'fast'
      });

      // Robot/monotone voice
      const robot = new ProsodyElement('I am a robot', {
      pitch: 'medium',
      range: 'x-low', // Minimal pitch variation
      rate: '0.9'
      });

      // Question intonation with contour
      const question = new ProsodyElement('Are you sure', {
      contour: '(0%,+5Hz) (60%,+10Hz) (100%,+20Hz)' // Rising intonation
      });

      // Emphasis with multiple attributes
      const emphasis = new ProsodyElement('This is important', {
      pitch: '+10%',
      rate: '0.8', // Slower
      volume: '+5dB' // Louder
      });

      // Precise numeric control
      const precise = new ProsodyElement('Precise control', {
      pitch: '150Hz',
      rate: '1.1', // 10% faster
      volume: '75' // 75% volume
      });

    Methods

    • Protected

      Escapes special XML characters in text content to ensure valid XML output.

      This method replaces XML special characters with their corresponding entity references to prevent XML parsing errors and potential security issues (XML injection). It should be used whenever inserting user-provided or dynamic text content into XML elements.

      The following characters are escaped:

      • & becomes &amp; (must be escaped first to avoid double-escaping)
      • < becomes &lt; (prevents opening of unintended tags)
      • > becomes &gt; (prevents closing of unintended tags)
      • " becomes &quot; (prevents breaking out of attribute values)
      • ' becomes &apos; (prevents breaking out of attribute values)

      This method is marked as protected so it's only accessible to classes that extend SSMLElement, ensuring proper encapsulation while allowing all element implementations to use this essential functionality.

      Parameters

      • text: string

        The text content to escape

      Returns string

      The text with all special XML characters properly escaped

      // In a render method implementation
      class TextElement extends SSMLElement {
      private text: string = 'Hello & "world" <script>';

      render(): string {
      // Escapes to: Hello &amp; &quot;world&quot; &lt;script&gt;
      return `<text>${this.escapeXml(this.text)}</text>`;
      }
      }

      // Edge cases handled correctly
      this.escapeXml('5 < 10 & 10 > 5');
      // Returns: '5 &lt; 10 &amp; 10 &gt; 5'

      this.escapeXml('She said "Hello"');
      // Returns: 'She said &quot;Hello&quot;'

      this.escapeXml("It's a test");
      // Returns: 'It&apos;s a test'

      // Prevents XML injection
      this.escapeXml('</voice><voice name="evil">');
      // Returns: '&lt;/voice&gt;&lt;voice name=&quot;evil&quot;&gt;'
    • Renders the prosody element as an SSML XML string.

      Generates the <prosody> element with only the specified attributes. Undefined attributes are omitted from the output, keeping the XML clean. The text content is automatically escaped to prevent XML injection and ensure valid output. Attributes are rendered in a consistent order: pitch, contour, range, rate, volume.

      Returns string

      The XML string representation of the prosody element with format: <prosody [pitch="value"] [contour="value"] [range="value"] [rate="value"] [volume="value"]>text</prosody> If no attributes are specified, returns: <prosody>text</prosody>

      // Single attribute
      const slow = new ProsodyElement('Slow speech', { rate: 'slow' });
      console.log(slow.render());
      // Output: <prosody rate="slow">Slow speech</prosody>

      // Multiple attributes
      const complex = new ProsodyElement('Complex prosody', {
      pitch: 'high',
      rate: 'fast',
      volume: 'loud'
      });
      console.log(complex.render());
      // Output: <prosody pitch="high" rate="fast" volume="loud">Complex prosody</prosody>

      // With contour
      const contour = new ProsodyElement('Rising tone', {
      contour: '(0%,+0Hz) (100%,+20Hz)'
      });
      console.log(contour.render());
      // Output: <prosody contour="(0%,+0Hz) (100%,+20Hz)">Rising tone</prosody>

      // No attributes (valid but no effect)
      const plain = new ProsodyElement('Plain text', {});
      console.log(plain.render());
      // Output: <prosody>Plain text</prosody>

      // Special characters escaped
      const special = new ProsodyElement('Terms & "conditions"', {
      volume: 'soft'
      });
      console.log(special.render());
      // Output: <prosody volume="soft">Terms &amp; &quot;conditions&quot;</prosody>