Intl.Segmenter
The Intl.Segmenter object enables locale-sensitive text segmentation, enabling you to get meaningful items (graphemes, words or sentences) from a string.
Constructor
Intl.Segmenter()-
Creates a new
Intl.Segmenterobject.
Static methods
Intl.Segmenter.supportedLocalesOf()-
Returns an array containing those of the provided locales that are supported without having to fall back to the runtime's default locale.
Instance methods
Intl.Segmenter.prototype.resolvedOptions()-
Returns a new object with properties reflecting the locale and granularity options computed during initialization of this
Intl.Segmenterobject. Intl.Segmenter.prototype.segment()-
Returns a new iterable
Segmentsinstance representing the segments of a string according to the locale and granularity of thisIntl.Segmenterinstance.
Examples
Basic usage and difference from String.prototype.split()
If we were to use String.prototype.split(" ") to segment a text in words, we would not get the correct result if the locale of the text does not use whitespaces between words (which is the case for Japanese, Chinese, Thai, Lao, Khmer, Myanmar, etc.).
const str = "εΎθΌ©γ―η«γ§γγγεεγ―γγ¬γγ";
console.table(str.split(" "));
// ['εΎθΌ©γ―η«γ§γγγεεγ―γγ¬γγ']
// The two sentences are not correctly segmented.
const str = "εΎθΌ©γ―η«γ§γγγεεγ―γγ¬γγ";
const segmenterJa = new Intl.Segmenter('ja-JP', { granularity: 'word' });
const segments = segmenterJa.segment(str);
console.table(Array.from(segments));
// [{segment: 'εΎθΌ©', index: 0, input: 'εΎθΌ©γ―η«γ§γγγεεγ―γγ¬γγ', isWordLike: true},
// etc.
// ]
Specifications
| Specification |
|---|
| ECMAScript Internationalization API Specification # segmenter-objects |
Browser compatibility
BCD tables only load in the browser