`normalize_text`¶

The primary entry point for text normalization.

`normalize_text(text, language='en', profile=None, disable=None, verbose=False, **kwargs)` ¶

Normalize text for TTS in the given language.

Parameters¶

text : str Input text to normalize. language : str "en" for English, "ms" for Malay. profile : str or None One of "minimal", "basic", "standard", "aggressive". If None the standard profile (all features on) is used. disable : list[str] or None Feature names to turn off, e.g. ["acronyms", "measurements"]. verbose : bool If True, return a dict with mappings and triggered rules instead of just the normalized text string. **kwargs Legacy boolean flags — accepted for backward compatibility but emit a DeprecationWarning. Supported names: normalize_spacing, fix_dot_letters, sound_words_field, apply_pronunciation_overrides_flag, expand_abbreviations_flag, expand_acronyms_flag, normalize_elongated_flag, normalize_fractions_flag, normalize_x_kali_flag, normalize_temperature_flag, normalize_ic_flag, normalize_measurements_flag, normalize_hari_bulan_flag, normalize_hijri_flag, extract_entities_first, config.

Returns¶

str or dict Normalized text, or a dict with text, original, mappings, and rules keys when verbose is True.

Parameters¶

`text` {: #text }¶

str — The input text to normalize. Leading and trailing whitespace is stripped before processing. If the string is empty after stripping, an empty string is returned immediately.

`language` {: #language }¶

str, default "en" — Target language for normalization.

Value	Description
`"en"`	English normalization (contractions, numbers via `inflect`, English date/time formats)
`"ms"`	Malay normalization (Malay grammar, numbers via `num2word_ms`, Malay-specific features)

`profile` {: #profile }¶

str | None, default None — A preset configuration profile. When provided, determines which feature groups are enabled.

Profile	Description
`"minimal"`	Spacing normalization only
`"basic"`	Spacing + acronyms + abbreviations + elongated + Malay-local + special chars
`"standard"`	All features enabled (same as default when `profile=None`)
`"aggressive"`	All features enabled + strips `[...]` content

When None, the standard profile (all features on) is used.

`disable` {: #disable }¶

list[str] | None, default None — A list of feature names to turn off. Feature names correspond to fields on Config. Unknown names are ignored.

Common feature names:

"acronyms" — Disable acronym expansion (I.B.M., API, etc.)
"measurements" — Disable measurement normalization (5km, 10kg, etc.)
"temperature" — Disable temperature normalization (25C, -5F, etc.)
"fractions" — Disable fraction normalization (3/4, 10/4, etc.)
"dates" — Disable date-to-spoken conversion
"times" — Disable time-to-spoken conversion
"spacing" — Disable whitespace normalization
"abbreviations" — Disable abbreviation expansion (currently a no-op)
"elongated" — Disable elongated word normalization
"special_chars" — Disable special character replacement (&, +, %, etc.)
"pronunciation_overrides" — Disable pronunciation overrides

`**kwargs` (legacy flags)¶

Legacy boolean flags accepted for backward compatibility. Using any of these emits a DeprecationWarning.

Supported legacy names:

normalize_spacing, fix_dot_letters, sound_words_field, apply_pronunciation_overrides_flag, expand_abbreviations_flag, expand_acronyms_flag, normalize_elongated_flag, normalize_fractions_flag, normalize_x_kali_flag, normalize_temperature_flag, normalize_ic_flag, normalize_measurements_flag, normalize_hari_bulan_flag, normalize_hijri_flag, extract_entities_first, config

Return Value¶

str — The normalized text, ready for TTS processing.

Examples¶

Basic usage¶

from revo_norm import normalize_text

# English
result = normalize_text("The API is fast and costs $5.50", language="en")
# "The A P I is fast and costs five dollar fifty cents"

# Malay
result = normalize_text("RM30K untuk projek ML", language="ms")
# "tiga puluh ribu ringgit untuk projek M L"

With a profile¶

from revo_norm import normalize_text

# Minimal — only whitespace cleanup
result = normalize_text("The  API  is  fast", language="en", profile="minimal")
# "The API is fast"

# Basic — adds acronym/abbreviation/special chars
result = normalize_text("5km & 10kg", language="en", profile="basic")

With disabled features¶

from revo_norm import normalize_text

# Keep acronyms as-is (no letter splitting)
result = normalize_text("Build the API with ML", language="en", disable=["acronyms"])
# "Build the API with ML"

# Disable multiple features
result = normalize_text(
    "25C and 3/4 of 5km",
    language="en",
    disable=["temperature", "fractions", "measurements"],
)

Legacy flags (deprecated)¶

import warnings

# Legacy flags still work but emit DeprecationWarning
with warnings.catch_warnings():
    warnings.simplefilter("ignore", DeprecationWarning)
    result = normalize_text(
        "25C outside",
        language="en",
        normalize_temperature_flag=False,
    )

Pipeline Steps¶

When normalize_text() is called, the following steps execute in order:

Currency suffix expansion — RM30K becomes RM30000, RM1M becomes RM1000000
Entity extraction — Entities are detected and replaced with <<<TYPE_ID>>> placeholders
Pronunciation mappings — Explicit mappings (e.g., GUI to "gooey") applied first
Placeholder stashing — Entity placeholders are replaced with safe alphabetic tokens
Feature-gated processing:
- Pronunciation overrides
- Elongated word normalization
- Measurement normalization
- X-kali normalization
- Language-specific normalization (English or Malay)
- Spacing normalization
- Sound word removal
- Abbreviation expansion
- Acronym expansion
- Comma insertion for repeated words
- Special character replacement
Entity restoration — Placeholders are restored as spoken form

normalize_text¶

normalize_text(text, language='en', profile=None, disable=None, verbose=False, **kwargs) ¶

Parameters¶

Returns¶

Parameters¶

text {: #text }¶

language {: #language }¶

profile {: #profile }¶

disable {: #disable }¶

**kwargs (legacy flags)¶

Return Value¶

Examples¶

Basic usage¶

With a profile¶

With disabled features¶

Legacy flags (deprecated)¶

Pipeline Steps¶

`normalize_text`¶

`normalize_text(text, language='en', profile=None, disable=None, verbose=False, **kwargs)` ¶

`text` {: #text }¶

`language` {: #language }¶

`profile` {: #profile }¶

`disable` {: #disable }¶

`**kwargs` (legacy flags)¶