11/11/2025

If you’ve ever needed to verify that a piece of text is in the format of a what3words address, you may have encountered the official what3words Regular Expression (RegEx).

RegEx patterns can look intimidating at first glance, so in this guide we’ll break down the what3words RegEx in plain English. We’ll also discuss how to adapt it for scanning free-form text (like AI chat & LLM messages), consider special cases like Vietnamese addresses that include spaces, and list the different punctuation marks that can separate the words.

Part 1a: The what3words RegEx (exact match) – Rules

Whole-string match
  • The pattern is anchored to the start and end of the input so the entire string must match, with no extra characters before the first component or after the third.
  • Some regex flavours treat the end anchor as matching just before a single terminal line break; where that is the case, the intent remains a whole string match.
Optional prefix
  • Zero or more leading forward slashes (/) are permitted as a prefix; this accepts the conventional three slashes or none, and is tolerant of extra slashes introduced by copy and paste.
Core shape
  • After any optional slashes, the address comprises exactly three components joined by exactly two separators.
  • The two separators may be different; each occurrence is chosen independently from the allowed set.
Components (“Words”)
  • Each component is formed only from Unicode letters and combining marks (general categories L and M). Digits and symbols are excluded.
  • A component may begin with a combining mark.
  • Casing is unrestricted (upper/lower/mixed all allowed).
Mutually exclusive styles (cannot be mixed)
  • 1) Unspaced style
    • Each component is a single contiguous run of letters/combining marks with no internal spaces.
  • 2) Spaced style
    • Each component is 2–4 tokens, where a token is a run of letters/combining marks.
    • Tokens inside a component are separated by exactly one regular spaceU+0020 or exactly one no‑break spaceU+00A0 .
    • Equivalently, each component contains 1–3 internalU+0020 /U+00A0 spaces. No tabs, other whitespace, or consecutive spaces.
Separators (“Dots”)

Exactly two separators occur – one between component 1 and 2, and one between 2 and 3 – each chosen from:

– U+002E . (Full stop)

– U+FF61 (Halfwidth Ideographic Full Stop)

– U+3002 (Ideographic Full Stop)

– U+FF65 (Halfwidth Katakana Middle Dot)

– U+30FB (Katakana Middle Dot)

– U+FE12 (Presentation Form for Vertical Ideographic Full Stop)

– U+17D4 (Khmer sign khan)

– U+0589 ։ (Armenian full stop)

– U+104B (Myanmar sign section)

– U+06D4 ۔ (Arabic full stop)

– U+1362 (Ethiopic full stop)

– U+0964 (Devanagari danda)

End conditions
  • No trailing punctuation is permitted: the match ends immediately after the third component (or immediately before a single terminal newline in flavours that treat it as outside the end anchor).

Part 1b: The what3words RegEx (exact match version) – Deep dive explanation

This pattern is anchored to match the entire string (meaning it assumes the input is just the three word address with nothing extra).

Here is the official RegEx (in JavaScript) for a full exact-match validation of a what3words address (including support for various languages and the optional /// prefix):

var regex = /^\/{0,}(?:[\p{L}\p{M}]+[.。。・・։۔][\p{L}\p{M}]+[.。。・・։۔][\p{L}\p{M}]+|[\p{L}\p{M}]+([\u0020\u00A0][\p{L}\p{M}]+){1,3}[.。。・・։۔][\p{L}\p{M}]+([\u0020\u00A0][\p{L}\p{M}]+){1,3}[.。。・・։۔][\p{L}\p{M}]+([\u0020\u00A0][\p{L}\p{M}]+){1,3})$/gu;

Let’s break down what this pattern is doing, step by step:

  • Anchors ( ^ and $ ): the ^ at the start and $ at the end of the RegEx ensure that the pattern matches the entire string from start to finish. In other words, the string should contain only a what3words address and nothing else. This is ideal for validating a standalone address field because it won’t allow any extra characters before or after the three words.
  • Optional prefix (^/{0,}): right after the ^, the pattern allows /{0,} — this means “zero or more forward slashes”. For display, the “///” prefix should always be shown; the RegEx simply accepts inputs with or without it. In practice, that means both “///index.home.raft” and “index.home.raft” match. Technically /{0,} allows any number of slashes, but the intent is to cover exactly “///” or none. This leniency also helps tolerate copy/paste errors in pasted addresses. Note: the slashes are outside the non capturing group (which wraps the two alternatives).
    • Alternation for spacing styles: the pattern is split into two major alternatives separated by a| . This is to handle two scenarios:
      • No spaces within words: The first alternative matches the typical case where each of the three words contains no internal spaces (e.g.filled.count.soap ).
      • Spaces within word groups: The second alternative handles languages like Vietnamese where what3words “words” can consist of multiple words separated by spaces (e.g. món hầm.kem sữa.thơ ca). In this scenario, each of the three components may contain 1–3 internal spaces. Important: the RegEx is constructed such that either all three components have internal spaces or none do — it won’t allow mixing (you can’t have one part with a space and others without).
    • Word pattern (letters only): in JavaScript we define a “word” inclusionarily as Unicode letters plus any following combining marks. That’s why the examples use letter and mark classes to cover Latin and non Latin scripts and letters with accents/diacritics. In practical terms, it matches “ filled ”, “ écoute ”, or “ 東京 ”, and naturally excludes digits and symbols.
    • Delimiters between words: the character class [.。。・・︒។։။۔።।] in the pattern represents the separator that must appear between the three word segments. This is the set of all supported delimiter characters that what3words recognises. It includes the standard Latin script full stop . as well as various Unicode punctuation used as equivalents to a dot in other languages (for example, the ideographic full stop used in Japanese, the Arabic full stop ۔ , etc.). Exactly two of these delimiters must appear – one between the first and second word, and one between the second and third word. This guarantees the format is “word1<delim>word2<delim>word3”. We will detail all the supported delimiters in a section below.
    • Note on trailing punctuation: the exact-match pattern requires the string to end immediately after the third word (there are exactly two delimiters in total). That’s why “filled.count.soap” matches, but “filled.count.soap.” (with a trailing full stop) does not.
    • Handling Vietnamese (or spaced) word groups: In the second alternative of the RegEx (after the| ), you’ll notice a construction like[\p{L}\p{M}]+(?:[\u0020\u00A0][\p{L}\p{M}]+){1,3} for each word group. This extends the “letters only” pattern to allow internal spaces.
      • a) [\p{L}\p{M}]+ matches a sequence of letters (including letters with diacritics).
      • b) (?:[\u0020\u00A0][\p{L}\p{M}]+){1,3} then allows one to three repetitions of “a space followed by another sequence of letters”. Here\u0020 is the Unicode for a normal space “ ” and\u00A0 is a non-breaking space. This means each word can be made up of exactly two to four separate syllables separated by spaces. For example, it could match“món hầm” as one word (letters + space + letters), or“dụng cụ pha chế” as one word.
      • c) By structuring the RegEx with a separate alternative, it ensures all three components use the same style. If the RegEx is matching the spaced what3words address version, then each of the first, second, and third words must contain between one and three spaces. Conversely, if the “no internal spaces” version is used, none of the three can contain a space. In short: either all three components use the spaced style, or none do.
  • Case sensitivity: the pattern matches letters regardless of case (so “ Filled.Count.Soap ” matches). For UI/brand consistency, display 3 word addresses in lowercase; validation itself does not need to enforce case.

In summary, this anchored RegEx ensures we have exactly three groups of characters (allowing for accents and letters from any language, and even spaces within those groups for certain languages) separated by two valid delimiters (like dots) and optional leading slashes. If a string matches this RegEx, it looks like a well-formed what3words address (but to know if it’s an actual valid address, you’d still need to check against the what3words API).

If you are implementing this “exact match” RegEx, you may wish to apply the RegEx in conjunction with the instructions on how to enable what3words for use with an existing search bar; it’s named as a “ride-hailing” tutorial, but it applies to enabling all existing search boxes so is recommended: https://developer.what3words.com/tutorial/ride-hailing For interactive search fields, prefer our wrapper’s detection (e.g., isPossible3wa ) and AutoSuggest; use the RegEx mainly for pre filtering or offline scanning.

Note on slashes: in the exact match pattern we accept any number of leading “/”, but in the free text pattern we accept 0-3 to avoid over matching.

Part 2a: The what3words RegEx (free-text scenarios version e.g. AI & LLMs (no real-time autosuggest)) – Rules

Find-in-text mode
  • Run the expression in a Unicode aware, find all mode with no start or end anchors so it can locate matches inside longer text. The pattern is intended for scanning arbitrary prose rather than validating a whole string.
  • The expression is structured as two branches that match either full links or bare addresses within text.
Branch 1: Full links
  • The match begins at a word boundary.
  • Optionally match the scheme “http” or “https”, followed by a colon and two forward slashes.
  • Optionally match the literal “www.” immediately after the slashes.
  • Require the domain to be exactly “what3words.com” or “w3w.co”, followed by exactly one forward slash “/”.
  • After that single slash, match the three component address as described in CORE SHAPE, COMPONENTS, STYLES, and SEPARATORS below.
  • After the third component: explicitly forbid another “/”; then require a clean boundary so the next character is either the end of the string, a question mark “?”, a hash “#”, or any character that is not a Unicode letter or combining mark.
Branch 2: Bare addresses
  • The match must either start at the beginning of the string or be immediately preceded by a character that is not a Unicode letter.
  • Permit zero, one, two, or three leading forward slashes “/” as a prefix.
  • After any such slashes, match the same three component address as described in CORE SHAPE, COMPONENTS, STYLES, and SEPARATORS below.
  • After the third component: explicitly forbid another “/”; then require the same clean boundary used by the full link branch (end of string, “?”, “#”, or any character that is not a Unicode letter or combining mark).
Core shape
  • The address comprises exactly three components joined by exactly two separators.
  • The two separators may be different; each occurrence is chosen independently from the allowed set.
Components (“Words”)
  • Each component is formed only from Unicode letters and combining marks (general categories L and M). Digits and symbols are excluded.
  • Each token must begin with a letter and may be followed by combining marks; digits and symbols are excluded.
  • Casing is unrestricted (upper/lower/mixed all allowed).
Mutually exclusive styles (cannot be mixed)
  • 1) Unspaced style
    • Each component is a single contiguous run of letters/combining marks with no internal spaces.
  • 2) Spaced style
    • Each component is 2–4 tokens, where a token is a run of letters/combining marks.
    • Tokens inside a component are separated by exactly one regular space U+0020 or exactly one no-break space U+00A0.
    • Equivalently, each component contains 1–3 internal U+0020/U+00A0 spaces. No tabs, other whitespace, or consecutive spaces.
    • All three components must use the same style; mixing the unspaced and spaced styles within a single address is not permitted.

Separators (“Dots”)

Exactly two separators occur – one between component 1 and 2, and one between 2 and 3 – each chosen from:

– U+002E . (Full stop)

– U+FF61 (Halfwidth Ideographic Full Stop)

– U+3002 (Ideographic Full Stop)

– U+FF65 (Halfwidth Katakana Middle Dot)

– U+30FB (Katakana Middle Dot)

– U+FE12 (Presentation Form for Vertical Ideographic Full Stop)

– U+17D4 (Khmer sign khan)

– U+0589 ։ (Armenian full stop)

– U+104B (Myanmar sign section)

– U+06D4 ۔ (Arabic full stop)

– U+1362 (Ethiopic full stop)

– U+0964 (Devanagari danda)

End conditions

  • In both branches, the match ends immediately after the third component and must not be followed by a forward slash.
  • A “clean boundary” is required after the third component: the next character must be end of string, “?”, “#”, or any character that is not a Unicode letter or combining mark. This allows sentence punctuation, query strings, and fragments to follow without being consumed.

Part 2b: The what3words RegEx (free-text scenarios version e.g. AI & LLMs (no real-time autosuggest)) – Deep dive explanation

We often need to find a what3words address buried in a larger string of text – for example: Please deliver to filled.count.soap by tomorrow.

In such cases (common in AI chats, LLMs, message parsing, or scanning free‑form text), we adapt the RegEx pattern so it can match the address within a longer string, rather than the whole string.

Here is a JavaScript example of this free-text version (e.g. AI & LLMs):

var regex = /(?:\b(?:https?:\/\/)?(?:www\.)?(?:what3words\.com|w3w\.co)\/((?:(?:\p{L}\p{M}*)+[.。。・・։۔](?:\p{L}\p{M}*)){2}(?:\p{L}\p{M}*)+|(?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.。。・・։۔](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.。。・・։۔](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3})(?!\/)(?=$|[?#]|[^\p{L}\p{M}]))|(?:(?:^|(?<!\p{L}))\/{0,3}((?:(?:\p{L}\p{M}*)+[.。。・・։۔](?:\p{L}\p{M}*)){2}(?:\p{L}\p{M}*)+|(?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.。。・・։۔](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3}[.。。・・։۔](?:\p{L}\p{M}*)+(?:[\u0020\u00A0](?:\p{L}\p{M}*)+){1,3})(?!\/)(?=$|[?#]|[^\p{L}\p{M}]))/gu;


Here are the key adaptations we apply for free‑text use:
  • We remove the anchors: the^ and$ anchors make the RegEx match only when the entire string is the address. To locate an address inside a sentence or paragraph, we omit these anchors. This lets the RegEx find a match starting and ending anywhere in the input text — in RegEx terms, we allow “find a substring” mode instead of “match the whole string”.
  • We use lookarounds for word boundaries (optional but recommended): even without^ and$ , the raw pattern can locate a what3words address in text, but it might also match things we don’t intend if they happen to fit the pattern by coincidence. To improve accuracy:
    • a) We enforce token boundaries using lookarounds. Before the match we require either the start of the string or that the previous character is not a letter, and we allow up to three leading slashes:(?:^|(?<!\p{L}))\/{0,3} . After the match we forbid a trailing slash and then require that the next character is either the end of the string, a “? ” or “# ”, or any character that is not a letter/mark. This lets punctuation sit immediately after the address without being captured.
    • b) In simpler terms, using lookarounds makes sure our address is a separate token in the text. For example, without lookarounds the pattern might inadvertently match the tail end of a longer string of letters or a URL. With lookarounds, “PleaseDeliverToindex.home.raftNow” would not yield a false match because there’s no word boundary before “index.” but “deliver to index.home.raft now” would match correctly, and we’d capture “index.home.raft”. Engine support note: some regex engines (e.g., Go and Rust) don’t support lookarounds. If your engine lacks lookarounds, copy the language-specific free-text pattern from the developer tutorial for that engine. We handle punctuation around the address: in normal writing, a what3words address might be followed by a full stop or comma (for example, at the end of a sentence:“Your location is filled.count.soap. Please stay there.” ). Without the$ end anchor, the RegEx will match“filled.count.soap” from that string, ignoring the final full stop. In our free-text pattern we ensure a clean boundary so the match ends before that punctuation: we disallow a trailing “/ ” and then require that the next character is the end of the string, a “? ” or “# ”, or any non-letter/mark.

Global search: when scanning free text, run the RegEx in a global/find‑all mode (depending on the RegEx engine). This scans the whole input and returns any and all matches – there can be more than one what3words address in a given text.

How to get the what3words address from a match

In JavaScript, the address is in m[1] for URL matches and m[2] for non URL matches. The easiest way to collect them is:

const threeWAs = [...text.matchAll(regex)].map(m => m[1] ?? m[2]);

This returns an array like [‘ filled.count.soap ‘, ‘ index.home.raft ‘].

Part 3: When to use which RegEx style

  • Exact match pattern: e.g. use when validating inputs that may or may not be a what3words address, but if they are a what3words address, the field will contain only the what3words address (e.g. a field that could have more than one type of location identifier as an input).
  • Free text pattern: e.g. use when scanning longer text (AI chat, LLM queries or responses).
  • Wrapper detection: in interactive search boxes use the wrapper helpers ( isPossible3wa , findPossible3wa ) for a quick “looks like a what3words-address” check, then show AutoSuggest.
  • Validation / resolution: when you need to confirm whether a what3words address is actually valid (valid in that it resolves to a position on the map, not that it’s just validly formed), use converttocoordinates after a format match.

Part 4: Worked Examples of which tools to use for which use case

  • If you have an interactive search UI, then use theAutoSuggest (component) . (No RegEx required.)
    • (a) A dedicated what3words field →AutoSuggest (component) → the final value will definitely be valid. (No RegEx required.)
    • (b) A mixed-purpose field (might be a what3words address or not) →isPossible3wa → then:
      • (i) If true → treat as a what3words address →AutoSuggest (component) for UX and/orisValid3wa on submit.
      • (ii) If false → treat as a normal geocoder/POI input. (No RegEx)
  • Free-text (AI chats, LLMs etc) → Use the free-text RegEx to find candidates → isValid3wa to confirm (orAutoSuggest (raw API, not component) to offer corrections and choices).
  • Offline or minimising API calls →isPossible3wa (or RegEx) as a local pre-filter → queue →isValid3wa later. (No network until ready.)

Notes:

  • The AutoSuggest component gives you a ready-made field that provides suggestions as a user types and validation to ensure the input value is a valid what3words address. You get best-practice UX and don’t need to maintain patterns.
  • AutoSuggest (raw API, not component) corrects typing, spelling, misremembered words, but with more flexibility and parameters than the component.
  • isPossible3wa is a fast format check (no API call) you can use to route mixed-purpose fields; isValid3wa gives definitive yes/no for validity.
  • The RegEx is most useful when you must extract addresses from free-form text (AI chats, LLMs etc) or when offline. After extraction, validate.

Other notes

(a) Vietnamese addresses with spaces

Vietnamese presents a unique challenge for what3words because: multi-syllabic words contain spaces. what3words has designed the Vietnamese word list so that it works with the way Vietnamese is normally written and typed:

Compound “words” with spaces:

Many Vietnamese words are compounds of two or more syllables, written with spaces between them. For example, the Vietnamese word for “city” is thành phố – two syllables, with a space in between, even though it is parsed by Vietnamese speakers (and what3words!) as one word. In a what3words address, thành phố might appear as one of the three address words. To a Vietnamese speaker, it looks natural and readable with that space.

Importantly, users have flexibility in how they enter it: to accommodate different typing habits, what3words accepts Vietnamese address words either with spaces (exactly as displayed) or with all those internal spaces removed. So whether you write thành phố or thànhphố , it’s understood as the same word, as long as the other two words are formatted consistently with it. This ensures that if you are using a keyboard or input method that makes it tricky to add the space, you have an easier way to enter the address you need. (In practice, the addresses are consistently displayed with the proper spaces for clarity, but no special effort is needed on the user’s part to match that format when inputting).

In this case thành phố is the technically correct way of writing the word; it is therefore the “primary” word, and this is always what we display on our app and online map. Typing thànhphố.thànhphố.thànhphố into our search bar will take you to the location ///thành phố.thành phố.thành phố .

Vietnamese input/display/share URL summary:

  • Input: Vietnamese components can be typed with spaces (primary form) or without (spaces removed), but use of spaces or no spaces must be consistent across all three components; mixed styles are not matched by the RegEx.
  • Display: We display Vietnamese what3words addresses with spaces on the app and what3words.com map site (e.g. ///thành phố.kem sữa.thơ ca ).
  • URLs & share links: In URLs, the Vietnamese address appears without spaces, and our share links include “?alias=” for Vietnamese amongst other languages; the alias value is the unspaced Vietnamese address (see more on the use of “alias” here ).

(b) Supported delimiters between word groups

We mentioned that what3words addresses aren’t always separated by the standard Latin script full stop . (U+002E) – when displayed on our online map, Japanese what3words addresses are separated by the ideographic full stop (U+3002). The full-width full stop here is used to prevent any visual confusion of where the word boundaries are. Japanese is the only language that is not displayed using Latin script full stops. If you require a single canonical delimiter for storage or logs, normalise any accepted delimiter to ‘.’ after extraction – this does not change the address.

Of course, what3words is available in many different writing scripts, many of which have a totally different set of punctuation to the Latin script. Due to its prevalence in URLs and email addresses, the Latin script full stop is often easily accessible on non-Latin script keyboards – but we want to make things easy for our users, and therefore we have allowed a range of different delimiters to be inputted by the user. These are not displayed within what3words addresses on our app or online map, but increase accessibility for global users. The RegEx character class [.。。・・︒។։။۔።।] lists all the supported delimiters that can be inputted between the three words. These are essentially various forms of “period” or similar separators in different writing systems.

Here’s a table of all supported delimiter characters, along with their Unicode names and the languages/scripts that commonly use them:

Delimiter Unicode Name Commonly used in language/script
. FULL STOP (Period) Default display delimiter for all languages except Japanese
IDEOGRAPHIC FULL STOP Default display delimiter for Japanese only
KATAKANA MIDDLE DOT Japanese (written in katakana or generally horizontal Japanese text)
HALFWIDTH IDEOGRAPHIC FULL STOP Japanese (half-width punctuation, sometimes used in Japanese digital text)
HALFWIDTH KATAKANA MIDDLE DOT Japanese (half-width katakana contexts)
PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP Chinese/Japanese (vertical text layout)
KHMER SIGN KHAN Khmer (Cambodian)
։ ARMENIAN FULL STOP Armenian
MYANMAR SIGN SECTION Burmese (Myanmar)
۔ ARABIC FULL STOP Arabic, Urdu and other Arabic-script languages
ETHIOPIC FULL STOP Amharic (Ethiopic script)
DEVANAGARI DANDA Hindi, Marathi, and other Devanagari-script languages (used as a period)

(c) Repeated words

Whilst what3words addresses containing repeated words (e.g. ///table.table.chair , ///table.table.table , or ///table.chair.table ) would pass the Regex in all languages, it is worth explicitly clarifying that repeated words are indeed allowed – either twice or three times in a single address – in all languages. Format validators should not reject them.

Final thoughts

The what3words RegEx might seem daunting, but it’s designed to be comprehensive. It accounts for various languages, character sets, and even the unique challenges presented by languages like Vietnamese. For developers, understanding this pattern means you can confidently validate or find what3words addresses in text without immediately calling the API for every check.

Each component of the RegEx ensures addresses are formatted correctly, which in turn means users get a smooth experience. By adapting the RegEx for your needs (exact match vs free-text search) and being mindful of internationalisation details (like the different delimiters and spacing rules), you can effectively integrate what3words address handling into your application. Hopefully, this breakdown makes the rule set clearer and takes away the mystery of RegEx.

For the most up‑to‑date patterns and language notes, see the Official what3words RegEx tutorial – this page is our source of truth and is kept current. We illustrate the inclusionary style here in JavaScript; some examples on that tutorial use the exclusionary style – copy the exact code for your target programming language from that page.

Language Name ISO code what3words API language code what3words Locale code Script Writing Direction Default word delimiter Does the language have secondary words Secondary words notes Does the language allow internal spaces Internal Spaces notes /// marker logical position /// marker visual edge
Afrikaans af af Latin ltr . FALSE FALSE prefix left
Amharic am am Ethiopic ltr . FALSE FALSE prefix left
Arabic ar ar Arabic rtl . FALSE FALSE prefix right
Bahasa Indonesia id id Latin ltr . FALSE FALSE prefix left
Bahasa Malaysia ms ms Latin ltr . FALSE FALSE prefix left
Bengali bn bn Bengali ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Bosnian bs oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Bosnian bs oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Bulgarian bg bg Cyrillic ltr . FALSE FALSE prefix left
Catalan ca ca Latin ltr . FALSE FALSE prefix left
Chinese zh zh zh_si Han (Simplified) ltr . FALSE FALSE prefix left
Chinese zh zh zh_tr Han (Traditional) ltr . FALSE FALSE prefix left
Croatian hr oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Croatian hr oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Czech cs cs Latin ltr . FALSE FALSE prefix left
Danish da da Latin ltr . TRUE Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’ FALSE prefix left
Dutch nl nl Latin ltr . FALSE FALSE prefix left
English en en Latin ltr . FALSE FALSE prefix left
Estonian et et Latin ltr . FALSE FALSE prefix left
Finnish fi fi Latin ltr . FALSE FALSE prefix left
French fr fr Latin ltr . TRUE Note: ‘œ’ may be typed as ‘oe’. FALSE prefix left
German de de Latin ltr . TRUE Note: ‘ä’ can also be inputted as ‘ae’; ‘ö’ can also be inputted as ‘oe’; ‘ü’ can also be inputted as ‘ue’ FALSE prefix left
Greek el el Greek ltr . FALSE FALSE prefix left
Gujarati gu gu Gujarati ltr . FALSE FALSE prefix left
Hebrew he he Hebrew rtl . FALSE FALSE prefix right
Hindi hi hi Devanagari ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Hungarian hu hu Latin ltr . FALSE FALSE prefix left
isiXhosa xh xh Latin ltr . FALSE FALSE prefix left
isiZulu zu zu Latin ltr . FALSE FALSE prefix left
Italian it it Latin ltr . FALSE FALSE prefix left
Japanese ja ja Hiragana ltr FALSE FALSE prefix left
Kannada kn kn Kannada ltr . FALSE FALSE prefix left
Kazakh kk kk kk_cy Cyrillic ltr . FALSE FALSE prefix left
Kazakh kk kk kk_la Latin ltr . FALSE FALSE prefix left
Khmer km km Khmer ltr . FALSE FALSE prefix left
Korean ko ko Hangul ltr . FALSE FALSE prefix left
Lao lo lo Lao ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Malayalam ml ml Malayalam ltr . TRUE Words that were changed in spelling reform have previous spellings as secondary words FALSE prefix left
Marathi mr mr Devanagari ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Mongolian mn mn mn_cy Cyrillic ltr . FALSE FALSE prefix left
Mongolian mn mn mn_la Latin ltr . TRUE Secondary words are created when a Cyrillic character has more than one Latin script equivalent: ‘х’ can also be inputted as ‘h’ OR ‘kh’; ‘ө’ can also be inputted as ‘o’ OR ‘u’ FALSE prefix left
Montenegrin me oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Montenegrin me oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Nepali ne ne Devanagari ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Norwegian no no Latin ltr . TRUE Note: ‘æ’ can also be inputted as ‘ae’; ‘ø’ can also be inputted ‘oe’; ‘å’ can also be inputted ‘aa’ FALSE prefix left
Odia or or Oriya (Odia) ltr . FALSE FALSE prefix left
Persian fa fa Arabic rtl . TRUE Some characters that look identical can be typed in more than one way FALSE prefix right
Polish pl pl Latin ltr . FALSE FALSE prefix left
Portuguese pt pt Latin ltr . FALSE FALSE prefix left
Punjabi pa pa Gurmukhi ltr . TRUE Some characters that look identical can be typed in more than one way FALSE prefix left
Romanian ro ro Latin ltr . FALSE FALSE prefix left
Russian ru ru Cyrillic ltr . FALSE FALSE prefix left
Serbian sr oo oo_cy Cyrillic ltr . FALSE FALSE prefix left
Serbian sr oo oo_la Latin ltr . TRUE Note: ‘đ’ can also be inputted as ‘dj’ FALSE prefix left
Sinhala si si Sinhala ltr . FALSE FALSE prefix left
Slovak sk sk Latin ltr . FALSE FALSE prefix left
Slovene sl sl Latin ltr . FALSE FALSE prefix left
Spanish es es Latin ltr . FALSE FALSE prefix left
Swahili sw sw Latin ltr . FALSE FALSE prefix left
Swedish sv sv Latin ltr . FALSE FALSE prefix left
Tamil ta ta Tamil ltr . FALSE FALSE prefix left
Telugu te te Telugu ltr . FALSE FALSE prefix left
Thai th th Thai ltr . FALSE FALSE prefix left
Turkish tr tr Latin ltr . FALSE FALSE prefix left
Ukrainian uk uk Cyrillic ltr . FALSE FALSE prefix left
Urdu ur ur Arabic rtl . TRUE Some pairs of characters share the same sound. Secondary words allow for this FALSE prefix right
Vietnamese vi vi Latin ltr . TRUE Primary words have spaces; secondary words are written with no spaces TRUE Vietnamese orthography allows up to three internal spaces inside a single dictionary word (e.g. ‘thành phố’). In a valid Vietnamese 3-word address this rule is all-or-nothing: if one word contains internal spaces, then all three words do. prefix left
Welsh cy cy Latin ltr . FALSE FALSE prefix left

Note: For an explanation of secondary words, see this blog post.