Skip to content

Config

Feature-toggle configuration for text normalization.

Config dataclass

Simple feature-toggle configuration for text normalization.

All features default to True (standard profile).

Create from a profile name

cfg = Config.from_profile("minimal") cfg = Config.from_profile("basic")

Disable specific features

cfg = Config.with_disabled(["acronyms", "measurements"])

Check a feature

cfg.is_enabled("acronyms") False

from_profile(name) classmethod

Create a Config from a profile name.

Profiles

minimal — spacing only basic — spacing, acronyms, abbreviations, elongated, malay_local, special_chars standard — everything enabled (default) aggressive — everything enabled

with_disabled(features) classmethod

Create a standard Config with specific features disabled.

is_enabled(feature)

Check if a feature is enabled.

Returns True for unknown feature names (safe default).

should_run_malay_features(language)

Return True if Malay-local features should run for the given language.

with_feature(group, level)

DEPRECATED. Set a feature toggle. Use cfg.<field> = True/False instead.

with_sound_words(words)

DEPRECATED. Set cfg.sound_words directly instead.

Feature Fields

All boolean fields default to True (standard profile).

Field Type Default Description
acronyms bool True Expand acronyms (I.B.M. to I B M, API to A P I)
abbreviations bool True Expand abbreviations (currently a no-op placeholder)
spacing bool True Normalize whitespace (collapse multiple spaces)
measurements bool True Normalize measurements (5km to five kilometers, 10kg to ten kilograms)
dates bool True Convert dates to spoken form (15/08/2025 to fifteenth of August ...)
times bool True Convert times to spoken form (3:30 pm to three thirty p m)
temperature bool True Convert temperatures to spoken form (25C to twenty five degrees Celsius)
fractions bool True Convert fractions to spoken form (3/4 to three quarters)
x_kali bool True Convert multipliers to spoken form (5x to lima kali)
ic bool True Normalize Malaysian IC numbers (900101-10-1234 to spoken form)
hari_bulan bool True Normalize hari bulan patterns (Malay day-of-month format)
hijri bool True Normalize Hijri years to spoken form
elongated bool True Normalize elongated words (soooo to so)
malay_local bool True Enable Malay-specific local features
special_chars bool True Replace special characters (& to and, % to percent/peratus)
pronunciation_overrides bool True Apply pronunciation overrides (legacy word-level corrections)
sound_words list[str] [] Sound words to remove or replace (e.g., [laughter], [applause])
strip_bracketed bool False Strip all [...] content from text (enabled in aggressive profile)

Constructors

Config() {: #config-init }

Creates a standard configuration with all features enabled.

from revo_norm.config import Config

cfg = Config()  # All features enabled

Config.from_profile(name) {: #from-profile }

from_profile(name) classmethod

Create a Config from a profile name.

Profiles

minimal — spacing only basic — spacing, acronyms, abbreviations, elongated, malay_local, special_chars standard — everything enabled (default) aggressive — everything enabled

Creates a Config from a named profile.

Profile comparison:

Feature minimal basic standard aggressive
spacing ON ON ON ON
acronyms off ON ON ON
abbreviations off ON ON ON
elongated off ON ON ON
malay_local off ON ON ON
special_chars off ON ON ON
measurements off off ON ON
dates off off ON ON
times off off ON ON
temperature off off ON ON
fractions off off ON ON
x_kali off off ON ON
ic off off ON ON
hari_bulan off off ON ON
hijri off off ON ON
pronunciation_overrides off off ON ON

Note

aggressive has the same feature toggles as standard. The difference is that aggressive may populate sound_words with a default list.

from revo_norm.config import Config

cfg = Config.from_profile("minimal")  # Spacing only
cfg = Config.from_profile("basic")    # Core features
cfg = Config.from_profile("standard") # Everything
cfg = Config.from_profile("aggressive") # Everything

Raises ValueError for unknown profile names.

Config.with_disabled(features) {: #with-disabled }

with_disabled(features) classmethod

Create a standard Config with specific features disabled.

Creates a standard Config with specific features disabled. Unknown feature names emit a warning and are ignored.

from revo_norm.config import Config

# Disable acronym expansion and measurement normalization
cfg = Config.with_disabled(["acronyms", "measurements"])

# cfg.acronyms == False
# cfg.measurements == False
# cfg.dates == True  (everything else stays on)

Methods

is_enabled(feature) {: #is-enabled }

is_enabled(feature)

Check if a feature is enabled.

Returns True for unknown feature names (safe default).

Check whether a feature is enabled. Returns True for unknown feature names (safe default).

from revo_norm.config import Config

cfg = Config.with_disabled(["acronyms"])
cfg.is_enabled("acronyms")    # False
cfg.is_enabled("temperature") # True
cfg.is_enabled("unknown")     # True (safe default)

should_run_malay_features(language) {: #should-run-malay }

Returns True if Malay-local features should run for the given language. This is True only when malay_local is enabled and language == "ms".

Deprecated Methods

These methods are retained for backward compatibility but emit DeprecationWarning.

with_feature(group, level) {: #with-feature }

Use direct attribute assignment instead:

# Deprecated
cfg.with_feature(FeatureGroup.ACRONYMS, FeatureLevel.OFF)

# Preferred
cfg.acronyms = False

with_sound_words(words) {: #with-sound-words }

Set cfg.sound_words directly instead:

# Deprecated
cfg.with_sound_words(["[laughter]", "[applause]"])

# Preferred
cfg.sound_words = ["[laughter]", "[applause]"]

Deprecated Factory Functions

These module-level functions are deprecated in favor of Config.from_profile():

Deprecated Replacement
minimal_config() Config.from_profile("minimal")
basic_config() Config.from_profile("basic")
standard_config() Config() or Config.from_profile("standard")
aggressive_config() Config.from_profile("aggressive")

Backward-Compatible Aliases

Alias Actual Class
NormalizationConfig Config

The deprecated enums FeatureGroup, FeatureLevel, and Profile are still importable for backward compatibility but should not be used in new code.