Character classes
Character classes distinguish kinds of characters such as, for example, distinguishing between letters and digits.
Types
Characters | Meaning |
---|---|
. |
Has one of the following meanings:
Note that the
ES2018 added the |
\d |
Matches any digit (Arabic numeral). Equivalent to |
\D |
Matches any character that is not a digit (Arabic numeral). Equivalent
to |
\w |
Matches any alphanumeric character from the basic Latin alphabet,
including the underscore. Equivalent to |
\W |
Matches any character that is not a word character from the basic
Latin alphabet. Equivalent to |
\s |
Matches a single white space character, including space, tab, form
feed, line feed, and other Unicode spaces. Equivalent to
|
\S |
Matches a single character other than white space. Equivalent to
|
\t |
Matches a horizontal tab. |
\r |
Matches a carriage return. |
\n |
Matches a linefeed. |
\v |
Matches a vertical tab. |
\f |
Matches a form-feed. |
[\b] |
Matches a backspace. If you're looking for the word-boundary character
(\b ), see
Assertions.
|
\0 |
Matches a NUL character. Do not follow this with another digit. |
\cX |
Matches a control character using
caret notation, where "X" is a letter from A–Z (corresponding to codepoints
|
\xhh |
Matches the character with the code hh (two
hexadecimal digits).
|
\uhhhh |
Matches a UTF-16 code-unit with the value
hhhh (four hexadecimal digits).
|
\u{hhhh} or \u{hhhhh} |
(Only when the u flag is set.) Matches the character with
the Unicode value U+hhhh or U+hhhhh
(hexadecimal digits).
|
\p{UnicodeProperty} ,
\P{UnicodeProperty}
|
Matches a character based on its Unicode character properties (to match just, for example, emoji characters, or Japanese katakana characters, or Chinese/Japanese Han/Kanji characters, etc.). |
\ |
Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.
Note: To match this character literally, escape it
with itself. In other words to search for |
Examples
Looking for a series of digits
var randomData = "015 354 8787 687351 3512 8735";
var regexpFourDigits = /\b\d{4}\b/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// \d{4} indicates a digit, four times
// \b indicates another boundary (i.e. do not end matching in the middle of a word)
console.table(randomData.match(regexpFourDigits));
// ['8787', '3512', '8735']
Looking for a word (from the latin alphabet) starting with A
var aliceExcerpt = "I'm sure I'm not Ada,' she said, 'for her hair goes in such long ringlets, and mine doesn't go in ringlets at all.";
var regexpWordStartingWithA = /\b[aA]\w+/g;
// \b indicates a boundary (i.e. do not start matching in the middle of a word)
// [aA] indicates the letter a or A
// \w+ indicates any character *from the latin alphabet*, multiple times
console.table(aliceExcerpt.match(regexpWordStartingWithA));
// ['Ada', 'and', 'at', 'all']
Looking for a word (from Unicode characters)
Instead of the Latin alphabet, we can use a range of Unicode characters to identify a word (thus being able to deal with text in other languages like Russian or Arabic). The "Basic Multilingual Plane" of Unicode contains most of the characters used around the world and we can use character classes and ranges to match words written with those characters.
var nonEnglishText = "Приключения Алисы в Стране чудес";
var regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
// BMP goes through U+0000 to U+FFFF but space is U+0020
console.table(nonEnglishText.match(regexpBMPWord));
[ 'Приключения', 'Алисы', 'в', 'Стране', 'чудес' ]