My account

v3.24.0

Understanding the what3words RegEx: A human-friendly guide

If you’ve ever needed to verify that a piece of text is in the format of a what3words address, you may have encountered the official what3words Regular Expression (RegEx).

RegEx patterns can look intimidating at first glance, so in this guide we’ll break down the what3words RegEx in plain English. We’ll also discuss how to adapt it for scanning free-form text (like AI chat & LLM messages), consider special cases like Vietnamese addresses that include spaces, and list the different punctuation marks that can separate the words.

Part 1a: The what3words RegEx (exact match) – Rules

Whole-string match

The pattern is anchored to the start and end of the input so the entire string must match, with no extra characters before the first component or after the third.
Some regex flavours treat the end anchor as matching just before a single terminal line break; where that is the case, the intent remains a whole ‑ string match.

Optional prefix

Zero or more leading forward slashes (/) are permitted as a prefix; this accepts the conventional three slashes or none, and is tolerant of extra slashes introduced by copy ‑ and ‑ paste.

Core shape

After any optional slashes, the address comprises exactly three components joined by exactly two separators.
The two separators may be different; each occurrence is chosen independently from the allowed set.

Components (“Words”)

Each component is formed only from Unicode letters and combining marks (general categories L and M). Digits and symbols are excluded.
A component may begin with a combining mark.
Casing is unrestricted (upper/lower/mixed all allowed).

Mutually exclusive styles (cannot be mixed)

1) Unspaced style
- Each component is a single contiguous run of letters/combining marks with no internal spaces.
2) Spaced style
- Each component is 2–4 tokens, where a token is a run of letters/combining marks.
- Tokens inside a component are separated by exactly one regular spaceU+0020 or exactly one no‑break spaceU+00A0 .
- Equivalently, each component contains 1–3 internalU+0020 /U+00A0 spaces. No tabs, other whitespace, or consecutive spaces.

Separators (“Dots”)

Exactly two separators occur – one between component 1 and 2, and one between 2 and 3 – each chosen from:

– U+002E . (Full stop)

– U+FF61 ｡ (Halfwidth Ideographic Full Stop)

– U+3002 。 (Ideographic Full Stop)

– U+FF65 ･ (Halfwidth Katakana Middle Dot)

– U+30FB ・ (Katakana Middle Dot)

– U+FE12 ︒ (Presentation Form for Vertical Ideographic Full Stop)

– U+17D4 ។ (Khmer sign khan)

– U+0589 ։ (Armenian full stop)

– U+104B ။ (Myanmar sign section)

– U+06D4 ۔ (Arabic full stop)

– U+1362 ። (Ethiopic full stop)

– U+0964 । (Devanagari danda)

End conditions

No trailing punctuation is permitted: the match ends immediately after the third component (or immediately before a single terminal newline in flavours that treat it as outside the end anchor).

Part 1b: The what3words RegEx (exact match version) – Deep dive explanation

This pattern is anchored to match the entire string (meaning it assumes the input is just the three word address with nothing extra).

Here is the official RegEx (in JavaScript) for a full exact-match validation of a what3words address (including support for various languages and the optional `///` prefix):

var regex = /^\/{0,}(?:[\p{L}\p{M}]+[.｡。･・︒។։။۔።।][\p{L}\p{M}]+[.｡。･・︒។։။۔።।][\p{L}\p{M}]+|[\p{L}\p{M}]+([\u0020\u00A0][\p{L}\p{M}]+){1,3}[.｡。･・︒។։။۔።।][\p{L}\p{M}]+([\u0020\u00A0][\p{L}\p{M}]+){1,3}[.｡。･・︒។։။۔።।][\p{L}\p{M}]+([\u0020\u00A0][\p{L}\p{M}]+){1,3})$/gu;

Let’s break down what this pattern is doing, step by step:

Anchors ( ^ and $ ): the ^ at the start and $ at the end of the RegEx ensure that the pattern matches the entire string from start to finish. In other words, the string should contain only a what3words address and nothing else. This is ideal for validating a standalone address field because it won’t allow any extra characters before or after the three words.
Optional prefix (^/{0,}): right after the ^, the pattern allows /{0,} — this means “zero or more forward slashes”. For display, the “///” prefix should always be shown; the RegEx simply accepts inputs with or without it. In practice, that means both “///index.home.raft” and “index.home.raft” match. Technically /{0,} allows any number of slashes, but the intent is to cover exactly “///” or none. This leniency also helps tolerate copy/paste errors in pasted addresses. Note: the slashes are outside the non ‑ capturing group (which wraps the two alternatives).
- Alternation for spacing styles: the pattern is split into two major alternatives separated by a| . This is to handle two scenarios:
  - No spaces within words: The first alternative matches the typical case where each of the three words contains no internal spaces (e.g.filled.count.soap ).
  - Spaces within word groups: The second alternative handles languages like Vietnamese where what3words “words” can consist of multiple words separated by spaces (e.g. món hầm.kem sữa.thơ ca). In this scenario, each of the three components may contain 1–3 internal spaces. Important: the RegEx is constructed such that either all three components have internal spaces or none do — it won’t allow mixing (you can’t have one part with a space and others without).
- Word pattern (letters only): in JavaScript we define a “word” inclusionarily as Unicode letters plus any following combining marks. That’s why the examples use letter and mark classes to cover Latin and non ‑ Latin scripts and letters with accents/diacritics. In practical terms, it matches “ filled ”, “ écoute ”, or “ 東京 ”, and naturally excludes digits and symbols.
- Delimiters between words: the character class [.｡。･・︒។։။۔።।] in the pattern represents the separator that must appear between the three word segments. This is the set of all supported delimiter characters that what3words recognises. It includes the standard Latin script full stop . as well as various Unicode punctuation used as equivalents to a dot in other languages (for example, the ideographic full stop 。 used in Japanese, the Arabic full stop ۔ , etc.). Exactly two of these delimiters must appear – one between the first and second word, and one between the second and third word. This guarantees the format is “word1<delim>word2<delim>word3”. We will detail all the supported delimiters in a section below.
- Note on trailing punctuation: the exact-match pattern requires the string to end immediately after the third word (there are exactly two delimiters in total). That’s why “filled.count.soap” matches, but “filled.count.soap.” (with a trailing full stop) does not.
- Handling Vietnamese (or spaced) word groups: In the second alternative of the RegEx (after the| ), you’ll notice a construction like[\p{L}\p{M}]+(?:[\u0020\u00A0][\p{L}\p{M}]+){1,3} for each word group. This extends the “letters only” pattern to allow internal spaces.
  - a) [\p{L}\p{M}]+ matches a sequence of letters (including letters with diacritics).
  - b) (?:[\u0020\u00A0][\p{L}\p{M}]+){1,3} then allows one to three repetitions of “a space followed by another sequence of letters”. Here\u0020 is the Unicode for a normal space “ ” and\u00A0 is a non-breaking space. This means each word can be made up of exactly two to four separate syllables separated by spaces. For example, it could match“món hầm” as one word (letters + space + letters), or“dụng cụ pha chế” as one word.
  - c) By structuring the RegEx with a separate alternative, it ensures all three components use the same style. If the RegEx is matching the spaced what3words address version, then each of the first, second, and third words must contain between one and three spaces. Conversely, if the “no internal spaces” version is used, none of the three can contain a space. In short: either all three components use the spaced style, or none do.
Case sensitivity: the pattern matches letters regardless of case (so “ Filled.Count.Soap ” matches). For UI/brand consistency, display 3 word addresses in lowercase; validation itself does not need to enforce case.

In summary, this anchored RegEx ensures we have exactly three groups of characters (allowing for accents and letters from any language, and even spaces within those groups for certain languages) separated by two valid delimiters (like dots) and optional leading slashes. If a string matches this RegEx, it looks like a well-formed what3words address (but to know if it’s an actual valid address, you’d still need to check against the what3words API).

If you are implementing this “exact match” RegEx, you may wish to apply the RegEx in conjunction with the instructions on how to enable what3words for use with an existing search bar; it’s named as a “ride-hailing” tutorial, but it applies to enabling all existing search boxes so is recommended: https://developer.what3words.com/tutorial/ride-hailing For interactive search fields, prefer our wrapper’s detection (e.g., isPossible3wa ) and AutoSuggest; use the RegEx mainly for pre ‑ filtering or offline scanning.

Note on slashes: in the exact ‑ match pattern we accept any number of leading “/”, but in the free ‑ text pattern we accept 0-3 to avoid over ‑ matching.

Part 2a: The what3words RegEx (free-text scenarios version e.g. AI & LLMs (no real-time autosuggest)) – Rules

Find-in-text mode

Run the expression in a Unicode ‑ aware, find ‑ all mode with no start ‑ or ‑ end anchors so it can locate matches inside longer text. The pattern is intended for scanning arbitrary prose rather than validating a whole string.
The expression is structured as two branches that match either full links or bare addresses within text.

Branch 1: Full links

The match begins at a word boundary.
Optionally match the scheme “http” or “https”, followed by a colon and two forward slashes.
Optionally match the literal “www.” immediately after the slashes.
Require the domain to be exactly “what3words.com” or “w3w.co”, followed by exactly one forward slash “/”.
After that single slash, match the three ‑ component address as described in CORE SHAPE, COMPONENTS, STYLES, and SEPARATORS below.
After the third component: explicitly forbid another “/”; then require a clean boundary so the next character is either the end of the string, a question mark “?”, a hash “#”, or any character that is not a Unicode letter or combining mark.

Branch 2: Bare addresses

The match must either start at the beginning of the string or be immediately preceded by a character that is not a Unicode letter.
Permit zero, one, two, or three leading forward slashes “/” as a prefix.
After any such slashes, match the same three ‑ component address as described in CORE SHAPE, COMPONENTS, STYLES, and SEPARATORS below.
After the third component: explicitly forbid another “/”; then require the same clean boundary used by the full ‑ link branch (end of string, “?”, “#”, or any character that is not a Unicode letter or combining mark).

Core shape

The address comprises exactly three components joined by exactly two separators.
The two separators may be different; each occurrence is chosen independently from the allowed set.

Components (“Words”)

Each component is formed only from Unicode letters and combining marks (general categories L and M). Digits and symbols are excluded.
Each token must begin with a letter and may be followed by combining marks; digits and symbols are excluded.
Casing is unrestricted (upper/lower/mixed all allowed).

Mutually exclusive styles (cannot be mixed)

1) Unspaced style
- Each component is a single contiguous run of letters/combining marks with no internal spaces.
2) Spaced style
- Each component is 2–4 tokens, where a token is a run of letters/combining marks.
- Tokens inside a component are separated by exactly one regular space U+0020 or exactly one no-break space U+00A0.
- Equivalently, each component contains 1–3 internal U+0020/U+00A0 spaces. No tabs, other whitespace, or consecutive spaces.
- All three components must use the same style; mixing the unspaced and spaced styles within a single address is not permitted.

Separators (“Dots”)

Exactly two separators occur – one between component 1 and 2, and one between 2 and 3 – each chosen from:

– U+002E . (Full stop)

– U+FF61 ｡ (Halfwidth Ideographic Full Stop)

– U+3002 。 (Ideographic Full Stop)

– U+FF65 ･ (Halfwidth Katakana Middle Dot)

– U+30FB ・ (Katakana Middle Dot)

– U+FE12 ︒ (Presentation Form for Vertical Ideographic Full Stop)

– U+17D4 ។ (Khmer sign khan)

– U+0589 ։ (Armenian full stop)

– U+104B ။ (Myanmar sign section)

– U+06D4 ۔ (Arabic full stop)

– U+1362 ። (Ethiopic full stop)

– U+0964 । (Devanagari danda)

End conditions

In both branches, the match ends immediately after the third component and must not be followed by a forward slash.
A “clean boundary” is required after the third component: the next character must be end ‑ of ‑ string, “?”, “#”, or any character that is not a Unicode letter or combining mark. This allows sentence punctuation, query strings, and fragments to follow without being consumed.

Part 2b: The what3words RegEx (free-text scenarios version e.g. AI & LLMs (no real-time autosuggest)) – Deep dive explanation

We often need to find a what3words address buried in a larger string of text – for example:“ Please deliver to filled.count.soap by tomorrow.”

In such cases (common in AI chats, LLMs, message parsing, or scanning free‑form text), we adapt the RegEx pattern so it can match the address within a longer string, rather than the whole string.

Here is a JavaScript example of this free-text version (e.g. AI & LLMs):

var regex = /(?:\b(?:https?:\/\/)?(?:www\.)?(?:what3words\.com|w3w\.co)\/((?:(?:\p{L}\p{M}*)+[.｡。･・︒។։။۔።।](?:\p{L}\p{M}*)){2}(?:\p{L}\p{M}*)+|(?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.｡。･・︒។։။۔።।](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.｡。･・︒។։။۔።।](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3})(?!\/)(?=$|[?#]|[^\p{L}\p{M}]))|(?:(?:^|(?<!\p{L}))\/{0,3}((?:(?:\p{L}\p{M}*)+[.｡。･・︒។։။۔።।](?:\p{L}\p{M}*)){2}(?:\p{L}\p{M}*)+|(?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.｡。･・︒។։။۔።।](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.｡。･・︒។։။۔።।](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3})(?!\/)(?=$|[?#]|[^\p{L}\p{M}]))/gu;

Here are the key adaptations we apply for free‑text use:

We remove the anchors: the^ and$ anchors make the RegEx match only when the entire string is the address. To locate an address inside a sentence or paragraph, we omit these anchors. This lets the RegEx find a match starting and ending anywhere in the input text — in RegEx terms, we allow “find a substring” mode instead of “match the whole string”.
We use lookarounds for word boundaries (optional but recommended): even without^ and$ , the raw pattern can locate a what3words address in text, but it might also match things we don’t intend if they happen to fit the pattern by coincidence. To improve accuracy:
- a) We enforce token boundaries using lookarounds. Before the match we require either the start of the string or that the previous character is not a letter, and we allow up to three leading slashes:(?:^|(?<!\p{L}))\/{0,3} . After the match we forbid a trailing slash and then require that the next character is either the end of the string, a “? ” or “# ”, or any character that is not a letter/mark. This lets punctuation sit immediately after the address without being captured.
- b) In simpler terms, using lookarounds makes sure our address is a separate token in the text. For example, without lookarounds the pattern might inadvertently match the tail end of a longer string of letters or a URL. With lookarounds, “PleaseDeliverToindex.home.raftNow” would not yield a false match because there’s no word boundary before “index.” but “deliver to index.home.raft now” would match correctly, and we’d capture “index.home.raft”. Engine support note: some regex engines (e.g., Go and Rust) don’t support lookarounds. If your engine lacks lookarounds, copy the language-specific free-text pattern from the developer tutorial for that engine. We handle punctuation around the address: in normal writing, a what3words address might be followed by a full stop or comma (for example, at the end of a sentence:“Your location is filled.count.soap. Please stay there.” ). Without the$ end anchor, the RegEx will match“filled.count.soap” from that string, ignoring the final full stop. In our free-text pattern we ensure a clean boundary so the match ends before that punctuation: we disallow a trailing “/ ” and then require that the next character is the end of the string, a “? ” or “# ”, or any non-letter/mark.

Global search: when scanning free text, run the RegEx in a global/find‑all mode (depending on the RegEx engine). This scans the whole input and returns any and all matches – there can be more than one what3words address in a given text.

How to get the what3words address from a match

In JavaScript, the address is in m[1] for URL matches and m[2] for non ‑ URL matches. The easiest way to collect them is:

const threeWAs = [...text.matchAll(regex)].map(m => m[1] ?? m[2]);

This returns an array like [‘ filled.count.soap ‘, ‘ index.home.raft ‘].

Part 3: When to use which RegEx style

Exact ‑ match pattern: e.g. use when validating inputs that may or may not be a what3words address, but if they are a what3words address, the field will contain only the what3words address (e.g. a field that could have more than one type of location identifier as an input).
Free ‑ text pattern: e.g. use when scanning longer text (AI chat, LLM queries or responses).
Wrapper detection: in interactive search boxes use the wrapper helpers ( isPossible3wa , findPossible3wa ) for a quick “looks ‑ like ‑ a ‑ what3words-address” check, then show AutoSuggest.
Validation / resolution: when you need to confirm whether a what3words address is actually valid (valid in that it resolves to a position on the map, not that it’s just validly formed), use convert‑to‑coordinates after a format match.

Part 4: Worked Examples of which tools to use for which use case

If you have an interactive search UI, then use theAutoSuggest (component) . (No RegEx required.)
- (a) A dedicated what3words field →AutoSuggest (component) → the final value will definitely be valid. (No RegEx required.)
- (b) A mixed-purpose field (might be a what3words address or not) →isPossible3wa → then:
  - (i) If true → treat as a what3words address →AutoSuggest (component) for UX and/orisValid3wa on submit.
  - (ii) If false → treat as a normal geocoder/POI input. (No RegEx)
Free-text (AI chats, LLMs etc) → Use the free-text RegEx to find candidates → isValid3wa to confirm (orAutoSuggest (raw API, not component) to offer corrections and choices).
Offline or minimising API calls →isPossible3wa (or RegEx) as a local pre-filter → queue →isValid3wa later. (No network until ready.)

Notes:

The AutoSuggest component gives you a ready-made field that provides suggestions as a user types and validation to ensure the input value is a valid what3words address. You get best-practice UX and don’t need to maintain patterns.
AutoSuggest (raw API, not component) corrects typing, spelling, misremembered words, but with more flexibility and parameters than the component.
isPossible3wa is a fast format check (no API call) you can use to route mixed-purpose fields; isValid3wa gives definitive yes/no for validity.
The RegEx is most useful when you must extract addresses from free-form text (AI chats, LLMs etc) or when offline. After extraction, validate.

Other notes

(a) Vietnamese addresses with spaces

Vietnamese presents a unique challenge for what3words because: multi-syllabic words contain spaces. what3words has designed the Vietnamese word list so that it works with the way Vietnamese is normally written and typed:

Compound “words” with spaces:

Many Vietnamese words are compounds of two or more syllables, written with spaces between them. For example, the Vietnamese word for “city” is thành phố – two syllables, with a space in between, even though it is parsed by Vietnamese speakers (and what3words!) as one word. In a what3words address, thành phố might appear as one of the three address words. To a Vietnamese speaker, it looks natural and readable with that space.

Importantly, users have flexibility in how they enter it: to accommodate different typing habits, what3words accepts Vietnamese address words either with spaces (exactly as displayed) or with all those internal spaces removed. So whether you write thành phố or thànhphố , it’s understood as the same word, as long as the other two words are formatted consistently with it. This ensures that if you are using a keyboard or input method that makes it tricky to add the space, you have an easier way to enter the address you need. (In practice, the addresses are consistently displayed with the proper spaces for clarity, but no special effort is needed on the user’s part to match that format when inputting).

In this case thành phố is the technically correct way of writing the word; it is therefore the “primary” word, and this is always what we display on our app and online map. Typing thànhphố.thànhphố.thànhphố into our search bar will take you to the location ///thành phố.thành phố.thành phố .

Vietnamese input/display/share URL summary:

Input: Vietnamese components can be typed with spaces (primary form) or without (spaces removed), but use of spaces or no spaces must be consistent across all three components; mixed styles are not matched by the RegEx.
Display: We display Vietnamese what3words addresses with spaces on the app and what3words.com map site (e.g. ///thành phố.kem sữa.thơ ca ).
URLs & share links: In URLs, the Vietnamese address appears without spaces, and our share links include “?alias=” for Vietnamese amongst other languages; the alias value is the unspaced Vietnamese address (see more on the use of “alias” here ).

(b) Supported delimiters between word groups

We mentioned that what3words addresses aren’t always separated by the standard Latin script full stop . (U+002E) – when displayed on our online map, Japanese what3words addresses are separated by the ideographic full stop 。 (U+3002). The full-width full stop here is used to prevent any visual confusion of where the word boundaries are. Japanese is the only language that is not displayed using Latin script full stops. If you require a single canonical delimiter for storage or logs, normalise any accepted delimiter to ‘.’ after extraction – this does not change the address.

Of course, what3words is available in many different writing scripts, many of which have a totally different set of punctuation to the Latin script. Due to its prevalence in URLs and email addresses, the Latin script full stop is often easily accessible on non-Latin script keyboards – but we want to make things easy for our users, and therefore we have allowed a range of different delimiters to be inputted by the user. These are not displayed within what3words addresses on our app or online map, but increase accessibility for global users. The RegEx character class [.｡。･・︒។։။۔።।] lists all the supported delimiters that can be inputted between the three words. These are essentially various forms of “period” or similar separators in different writing systems.

Here’s a table of all supported delimiter characters, along with their Unicode names and the languages/scripts that commonly use them:

Delimiter	Unicode Name	Commonly used in language/script
.	FULL STOP (Period)	Default display delimiter for all languages except Japanese
。	IDEOGRAPHIC FULL STOP	Default display delimiter for Japanese only
・	KATAKANA MIDDLE DOT	Japanese (written in katakana or generally horizontal Japanese text)
｡	HALFWIDTH IDEOGRAPHIC FULL STOP	Japanese (half-width punctuation, sometimes used in Japanese digital text)
･	HALFWIDTH KATAKANA MIDDLE DOT	Japanese (half-width katakana contexts)
︒	PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP	Chinese/Japanese (vertical text layout)
។	KHMER SIGN KHAN	Khmer (Cambodian)
։	ARMENIAN FULL STOP	Armenian
။	MYANMAR SIGN SECTION	Burmese (Myanmar)
۔	ARABIC FULL STOP	Arabic, Urdu and other Arabic-script languages
።	ETHIOPIC FULL STOP	Amharic (Ethiopic script)
।	DEVANAGARI DANDA	Hindi, Marathi, and other Devanagari-script languages (used as a period)

(c) Repeated words

Whilst what3words addresses containing repeated words (e.g. ///table.table.chair , ///table.table.table , or ///table.chair.table ) would pass the Regex in all languages, it is worth explicitly clarifying that repeated words are indeed allowed – either twice or three times in a single address – in all languages. Format validators should not reject them.

Final thoughts

The what3words RegEx might seem daunting, but it’s designed to be comprehensive. It accounts for various languages, character sets, and even the unique challenges presented by languages like Vietnamese. For developers, understanding this pattern means you can confidently validate or find what3words addresses in text without immediately calling the API for every check.

Each component of the RegEx ensures addresses are formatted correctly, which in turn means users get a smooth experience. By adapting the RegEx for your needs (exact match vs free-text search) and being mindful of internationalisation details (like the different delimiters and spacing rules), you can effectively integrate what3words address handling into your application. Hopefully, this breakdown makes the rule set clearer and takes away the mystery of RegEx.

For the most up‑to‑date patterns and language notes, see the Official what3words RegEx tutorial – this page is our source of truth and is kept current. We illustrate the inclusionary style here in JavaScript; some examples on that tutorial use the exclusionary style – copy the exact code for your target programming language from that page.

Language Name

ISO code

what3words API language code

what3words Locale code

Script

Writing Direction

Default word delimiter

Does the language have secondary words

Secondary words notes

Does the language allow internal spaces

Internal Spaces notes

/// marker logical position

/// marker visual edge

Afrikaans

Latin

ltr

FALSE

prefix

left

Amharic

Ethiopic

ltr

FALSE

prefix

left

Arabic

rtl

FALSE

prefix

right

Bahasa Indonesia

Latin

ltr

FALSE

prefix

left

Bahasa Malaysia

Latin

ltr

FALSE

prefix

left

Bengali

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Bosnian

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Bosnian

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Bulgarian

Cyrillic

ltr

FALSE

prefix

left

Catalan

Latin

ltr

FALSE

prefix

left

Chinese

zh_si

Han (Simplified)

ltr

FALSE

prefix

left

Chinese

zh_tr

Han (Traditional)

ltr

FALSE

prefix

left

Croatian

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Croatian

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Czech

Latin

ltr

FALSE

prefix

left

Danish

Latin

ltr

TRUE

Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’

FALSE

prefix

left

Dutch

Latin

ltr

FALSE

prefix

left

English

Latin

ltr

FALSE

prefix

left

Estonian

Latin

ltr

FALSE

prefix

left

Finnish

Latin

ltr

FALSE

prefix

left

French

Latin

ltr

TRUE

Note: ‘œ’ may be typed as ‘oe’.

FALSE

prefix

left

German

Latin

ltr

TRUE

Note: ‘ä’ can also be inputted as ‘ae’; ‘ö’ can also be inputted as ‘oe’; ‘ü’ can also be inputted as ‘ue’

FALSE

prefix

left

Greek

ltr

FALSE

prefix

left

Gujarati

ltr

FALSE

prefix

left

Hebrew

rtl

FALSE

prefix

right

Hindi

Devanagari

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Hungarian

Latin

ltr

FALSE

prefix

left

isiXhosa

Latin

ltr

FALSE

prefix

left

isiZulu

Latin

ltr

FALSE

prefix

left

Italian

Latin

ltr

FALSE

prefix

left

Japanese

Hiragana

ltr

。

FALSE

prefix

left

Kannada

ltr

FALSE

prefix

left

Kazakh

kk_cy

Cyrillic

ltr

FALSE

prefix

left

Kazakh

kk_la

Latin

ltr

FALSE

prefix

left

Khmer

ltr

FALSE

prefix

left

Korean

Hangul

ltr

FALSE

prefix

left

Lao

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Malayalam

ltr

TRUE

Words that were changed in spelling reform have previous spellings as secondary words

FALSE

prefix

left

Marathi

Devanagari

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Mongolian

mn_cy

Cyrillic

ltr

FALSE

prefix

left

Mongolian

mn_la

Latin

ltr

TRUE

Secondary words are created when a Cyrillic character has more than one Latin script equivalent: ‘х’ can also be inputted as ‘h’ OR ‘kh’; ‘ө’ can also be inputted as ‘o’ OR ‘u’

FALSE

prefix

left

Montenegrin

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Montenegrin

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Nepali

Devanagari

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Norwegian

Latin

ltr

TRUE

Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’

FALSE

prefix

left

Odia

Oriya (Odia)

ltr

FALSE

prefix

left

Persian

Arabic

rtl

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

right

Polish

Latin

ltr

FALSE

prefix

left

Portuguese

Latin

ltr

FALSE

prefix

left

Punjabi

Gurmukhi

ltr

TRUE

Some characters that look identical can be typed in more than one way

FALSE

prefix

left

Romanian

Latin

ltr

FALSE

prefix

left

Russian

Cyrillic

ltr

FALSE

prefix

left

Serbian

oo_cy

Cyrillic

ltr

FALSE

prefix

left

Serbian

oo_la

Latin

ltr

TRUE

Note: ‘đ’ can also be inputted as ‘dj’

FALSE

prefix

left

Sinhala

ltr

FALSE

prefix

left

Slovak

Latin

ltr

FALSE

prefix

left

Slovene

Latin

ltr

FALSE

prefix

left

Spanish

Latin

ltr

FALSE

prefix

left

Swahili

Latin

ltr

FALSE

prefix

left

Swedish

Latin

ltr

FALSE

prefix

left

Tamil

ltr

FALSE

prefix

left

Telugu

ltr

FALSE

prefix

left

Thai

ltr

FALSE

prefix

left

Turkish

Latin

ltr

FALSE

prefix

left

Ukrainian

Cyrillic

ltr

FALSE

prefix

left

Urdu

Arabic

rtl

TRUE

Some pairs of characters share the same sound. Secondary words allow for this

FALSE

prefix

right

Vietnamese

Latin

ltr

TRUE

Primary words have spaces; secondary words are written with no spaces

TRUE

Vietnamese orthography allows up to three internal spaces inside a single dictionary word (e.g. ‘thành phố’). In a valid Vietnamese 3-word address this rule is all-or-nothing: if one word contains internal spaces, then all three words do.

prefix

left

Welsh

Latin

ltr

FALSE

prefix

left

Note: For an explanation of secondary words, see this blog post.

Understanding the what3words RegEx: A human-friendly guide

Part 1a: The what3words RegEx (exact match) – Rules

Whole-string match

Optional prefix

Core shape

Components (“Words”)

Mutually exclusive styles (cannot be mixed)

Separators (“Dots”)

End conditions

Part 1b: The what3words RegEx (exact match version) – Deep dive explanation

Here is the official RegEx (in JavaScript) for a full exact-match validation of a what3words address (including support for various languages and the optional /// prefix):

Part 2a: The what3words RegEx (free-text scenarios version e.g. AI & LLMs (no real-time autosuggest)) – Rules

Find-in-text mode

Branch 1: Full links

Branch 2: Bare addresses

Core shape

Components (“Words”)

Mutually exclusive styles (cannot be mixed)

Part 2b: The what3words RegEx (free-text scenarios version e.g. AI & LLMs (no real-time autosuggest)) – Deep dive explanation

Here is a JavaScript example of this free-text version (e.g. AI & LLMs):

Here are the key adaptations we apply for free‑text use:

How to get the what3words address from a match

Part 3: When to use which RegEx style

Part 4: Worked Examples of which tools to use for which use case

Other notes

(a) Vietnamese addresses with spaces

Final thoughts

Here is the official RegEx (in JavaScript) for a full exact-match validation of a what3words address (including support for various languages and the optional `///` prefix):