...

Package language

import "golang.org/x/text/language"
Overview
Index
Examples
Subdirectories

Overview ▾

Package language implements BCP 47 language tags and related functionality.

The Tag type, which is used to represent languages, is agnostic to the meaning of its subtags. Tags are not fully canonicalized to preserve information that may be valuable in certain contexts. As a consequence, two different tags may represent identical languages.

Initializing language- or locale-specific components usually consists of two steps. The first step is to select a display language based on the preferred languages of the user and the languages supported by an application. The second step is to create the language-specific services based on this selection. Each is discussed in more details below.

Matching preferred against supported languages

An application may support various languages. This list is typically limited by the languages for which there exists translations of the user interface. Similarly, a user may provide a list of preferred languages which is limited by the languages understood by this user. An application should use a Matcher to find the best supported language based on the user's preferred list. Matchers are aware of the intricacies of equivalence between languages. The default Matcher implementation takes into account things such as deprecated subtags, legacy tags, and mutual intelligibility between scripts and languages.

A Matcher for English, Australian English, Danish, and standard Mandarin can be defined as follows:

var matcher = language.NewMatcher([]language.Tag{
	language.English,   // The first language is used as fallback.
	language.MustParse("en-AU"),
	language.Danish,
	language.Chinese,
})

The following code selects the best match for someone speaking Spanish and Norwegian:

preferred := []language.Tag{ language.Spanish, language.Norwegian }
tag, _, _ := matcher.Match(preferred...)

In this case, the best match is Danish, as Danish is sufficiently a match to Norwegian to not have to fall back to the default. See ParseAcceptLanguage on how to handle the Accept-Language HTTP header.

Selecting language-specific services

One should always use the Tag returned by the Matcher to create an instance of any of the language-specific services provided by the text repository. This prevents the mixing of languages, such as having a different language for messages and display names, as well as improper casing or sorting order for the selected language. Using the returned Tag also allows user-defined settings, such as collation order or numbering system to be transparently passed as options.

If you have language-specific data in your application, however, it will in most cases suffice to use the index returned by the matcher to identify the user language. The following loop provides an alternative in case this is not sufficient:

supported := map[language.Tag]data{
	language.English:            enData,
	language.MustParse("en-AU"): enAUData,
	language.Danish:             daData,
	language.Chinese:            zhData,
}
tag, _, _ := matcher.Match(preferred...)
for ; tag != language.Und; tag = tag.Parent() {
	if v, ok := supported[tag]; ok {
		return v
	}
}
return enData // should not reach here

Repeatedly taking the Parent of the tag returned by Match will eventually match one of the tags used to initialize the Matcher.

Canonicalization

By default, only legacy and deprecated tags are converted into their canonical equivalent. All other information is preserved. This approach makes the confidence scores more accurate and allows matchers to distinguish between variants that are otherwise lost.

As a consequence, two tags that should be treated as identical according to BCP 47 or CLDR, like "en-Latn" and "en", will be represented differently. The Matchers will handle such distinctions, though, and are aware of the equivalence relations. The CanonType type can be used to alter the canonicalization form.

References

BCP 47 - Tags for Identifying Languages http://tools.ietf.org/html/bcp47

Index ▾

Constants
Variables
func CompactIndex(t Tag) (index int, ok bool)
func ParseAcceptLanguage(s string) (tag []Tag, q []float32, err error)
type Base
    func MustParseBase(s string) Base
    func ParseBase(s string) (Base, error)
    func (b Base) ISO3() string
    func (b Base) IsPrivateUse() bool
    func (b Base) String() string
type CanonType
    func (c CanonType) Canonicalize(t Tag) (Tag, error)
    func (c CanonType) Compose(part ...interface{}) (t Tag, err error)
    func (c CanonType) Make(s string) Tag
    func (c CanonType) MustParse(s string) Tag
    func (c CanonType) Parse(s string) (t Tag, err error)
type Confidence
    func Comprehends(speaker, alternative Tag) Confidence
    func (c Confidence) String() string
type Coverage
    func NewCoverage(list ...interface{}) Coverage
type Extension
    func ParseExtension(s string) (e Extension, err error)
    func (e Extension) String() string
    func (e Extension) Tokens() []string
    func (e Extension) Type() byte
type Matcher
    func NewMatcher(t []Tag) Matcher
type Region
    func EncodeM49(r int) (Region, error)
    func MustParseRegion(s string) Region
    func ParseRegion(s string) (Region, error)
    func (r Region) Canonicalize() Region
    func (r Region) Contains(c Region) bool
    func (r Region) ISO3() string
    func (r Region) IsCountry() bool
    func (r Region) IsGroup() bool
    func (r Region) IsPrivateUse() bool
    func (r Region) M49() int
    func (r Region) String() string
    func (r Region) TLD() (Region, error)
type Script
    func MustParseScript(s string) Script
    func ParseScript(s string) (Script, error)
    func (s Script) IsPrivateUse() bool
    func (s Script) String() string
type Tag
    func Compose(part ...interface{}) (t Tag, err error)
    func Make(s string) Tag
    func MustParse(s string) Tag
    func Parse(s string) (t Tag, err error)
    func (t Tag) Base() (Base, Confidence)
    func (t Tag) Extension(x byte) (ext Extension, ok bool)
    func (t Tag) Extensions() []Extension
    func (t Tag) IsRoot() bool
    func (t Tag) Parent() Tag
    func (t Tag) Raw() (b Base, s Script, r Region)
    func (t Tag) Region() (Region, Confidence)
    func (t Tag) Script() (Script, Confidence)
    func (t Tag) SetTypeForKey(key, value string) (Tag, error)
    func (t Tag) String() string
    func (t Tag) TypeForKey(key string) string
    func (t Tag) Variants() []Variant
type ValueError
    func (e ValueError) Error() string
    func (e ValueError) Subtag() string
type Variant
    func ParseVariant(s string) (Variant, error)
    func (v Variant) String() string

Package files

common.go coverage.go go1_2.go index.go language.go lookup.go match.go parse.go tables.go tags.go

Constants

const CLDRVersion = "29"

CLDRVersion is the CLDR version from which the tables in this package are derived.

const NumCompactTags = 747

NumCompactTags is the number of common tags. The maximum tag is NumCompactTags-1.

Variables

var ErrMissingLikelyTagsData = errors.New("missing likely tags data")

ErrMissingLikelyTagsData indicates no information was available to compute likely values of missing tags.

func CompactIndex

func CompactIndex(t Tag) (index int, ok bool)

CompactIndex returns an index, where 0 <= index < NumCompactTags, for tags for which data exists in the text repository. The index will change over time and should not be stored in persistent storage. Extensions, except for the 'va' type of the 'u' extension, are ignored. It will return 0, false if no compact tag exists, where 0 is the index for the root language (Und).

func ParseAcceptLanguage

func ParseAcceptLanguage(s string) (tag []Tag, q []float32, err error)

ParseAcceptLanguage parses the contents of a Accept-Language header as defined in http://www.ietf.org/rfc/rfc2616.txt and returns a list of Tags and a list of corresponding quality weights. It is more permissive than RFC 2616 and may return non-nil slices even if the input is not valid. The Tags will be sorted by highest weight first and then by first occurrence. Tags with a weight of zero will be dropped. An error will be returned if the input could not be parsed.

Example

Code:

package language_test

import (
    "fmt"
    "net/http"
    "strings"

    "golang.org/x/text/language"
)

// matcher is a language.Matcher configured for all supported languages.
var matcher = language.NewMatcher([]language.Tag{
    language.BritishEnglish,
    language.Norwegian,
    language.German,
})

// handler is a http.HandlerFunc.
func handler(w http.ResponseWriter, r *http.Request) {
    t, q, err := language.ParseAcceptLanguage(r.Header.Get("Accept-Language"))
    // We ignore the error: the default language will be selected for t == nil.
    tag, _, _ := matcher.Match(t...)
    fmt.Printf("%5v (t: %6v; q: %3v; err: %v)\n", tag, t, q, err)
}

func ExampleParseAcceptLanguage() {
    for _, al := range []string{
        "nn;q=0.3, en-us;q=0.8, en,",
        "gsw, en;q=0.7, en-US;q=0.8",
        "gsw, nl, da",
        "invalid",
    } {
        // Create dummy request with Accept-Language set and pass it to handler.
        r, _ := http.NewRequest("GET", "example.com", strings.NewReader("Hello"))
        r.Header.Set("Accept-Language", al)
        handler(nil, r)
    }

    // Output:
    // en-GB (t: [    en  en-US     nn]; q: [  1 0.8 0.3]; err: <nil>)
    // en-GB (t: [   gsw  en-US     en]; q: [  1 0.8 0.7]; err: <nil>)
    //    de (t: [   gsw     nl     da]; q: [  1   1   1]; err: <nil>)
    // en-GB (t: []; q: []; err: language: tag is not well-formed)
}

type Base

type Base struct {
    // contains filtered or unexported fields
}

Base is an ISO 639 language code, used for encoding the base language of a language tag.

func MustParseBase

func MustParseBase(s string) Base

MustParseBase is like ParseBase, but panics if the given base cannot be parsed. It simplifies safe initialization of Base values.

func ParseBase

func ParseBase(s string) (Base, error)

ParseBase parses a 2- or 3-letter ISO 639 code. It returns a ValueError if s is a well-formed but unknown language identifier or another error if another error occurred.

func (Base) ISO3

func (b Base) ISO3() string

ISO3 returns the ISO 639-3 language code.

func (Base) IsPrivateUse

func (b Base) IsPrivateUse() bool

IsPrivateUse reports whether this language code is reserved for private use.

func (Base) String

func (b Base) String() string

String returns the BCP 47 representation of the langID. Use b as variable name, instead of id, to ensure the variable used is consistent with that of Base in which this type is embedded.

type CanonType

type CanonType int

CanonType can be used to enable or disable various types of canonicalization.

const (
    // Replace deprecated base languages with their preferred replacements.
    DeprecatedBase CanonType = 1 << iota
    // Replace deprecated scripts with their preferred replacements.
    DeprecatedScript
    // Replace deprecated regions with their preferred replacements.
    DeprecatedRegion
    // Remove redundant scripts.
    SuppressScript
    // Normalize legacy encodings. This includes legacy languages defined in
    // CLDR as well as bibliographic codes defined in ISO-639.
    Legacy
    // Map the dominant language of a macro language group to the macro language
    // subtag. For example cmn -> zh.
    Macro
    // The CLDR flag should be used if full compatibility with CLDR is required.
    // There are a few cases where language.Tag may differ from CLDR. To follow all
    // of CLDR's suggestions, use All|CLDR.
    CLDR

    // Raw can be used to Compose or Parse without Canonicalization.
    Raw CanonType = 0

    // Replace all deprecated tags with their preferred replacements.
    Deprecated = DeprecatedBase | DeprecatedScript | DeprecatedRegion

    // All canonicalizations recommended by BCP 47.
    BCP47 = Deprecated | SuppressScript

    // All canonicalizations.
    All = BCP47 | Legacy | Macro

    // Default is the canonicalization used by Parse, Make and Compose. To
    // preserve as much information as possible, canonicalizations that remove
    // potentially valuable information are not included. The Matcher is
    // designed to recognize similar tags that would be the same if
    // they were canonicalized using All.
    Default = Deprecated | Legacy
)

Example

Code:

p := func(id string) {
    fmt.Printf("Default(%s) -> %s\n", id, language.Make(id))
    fmt.Printf("BCP47(%s) -> %s\n", id, language.BCP47.Make(id))
    fmt.Printf("Macro(%s) -> %s\n", id, language.Macro.Make(id))
    fmt.Printf("All(%s) -> %s\n", id, language.All.Make(id))
}
p("en-Latn")
p("sh")
p("zh-cmn")
p("bjd")
p("iw-Latn-fonipa-u-cu-usd")

Output:

Default(en-Latn) -> en-Latn
BCP47(en-Latn) -> en
Macro(en-Latn) -> en-Latn
All(en-Latn) -> en
Default(sh) -> sr-Latn
BCP47(sh) -> sh
Macro(sh) -> sh
All(sh) -> sr-Latn
Default(zh-cmn) -> cmn
BCP47(zh-cmn) -> cmn
Macro(zh-cmn) -> zh
All(zh-cmn) -> zh
Default(bjd) -> drl
BCP47(bjd) -> drl
Macro(bjd) -> bjd
All(bjd) -> drl
Default(iw-Latn-fonipa-u-cu-usd) -> he-Latn-fonipa-u-cu-usd
BCP47(iw-Latn-fonipa-u-cu-usd) -> he-Latn-fonipa-u-cu-usd
Macro(iw-Latn-fonipa-u-cu-usd) -> iw-Latn-fonipa-u-cu-usd
All(iw-Latn-fonipa-u-cu-usd) -> he-Latn-fonipa-u-cu-usd

func (CanonType) Canonicalize

func (c CanonType) Canonicalize(t Tag) (Tag, error)

Canonicalize returns the canonicalized equivalent of the tag.

func (CanonType) Compose

func (c CanonType) Compose(part ...interface{}) (t Tag, err error)

Compose creates a Tag from individual parts, which may be of type Tag, Base, Script, Region, Variant, []Variant, Extension, []Extension or error. If a Base, Script or Region or slice of type Variant or Extension is passed more than once, the latter will overwrite the former. Variants and Extensions are accumulated, but if two extensions of the same type are passed, the latter will replace the former. A Tag overwrites all former values and typically only makes sense as the first argument. The resulting tag is returned after canonicalizing using CanonType c. If one or more errors are encountered, one of the errors is returned.

func (CanonType) Make

func (c CanonType) Make(s string) Tag

Make is a convenience wrapper for c.Parse that omits the error. In case of an error, a sensible default is returned.

func (CanonType) MustParse

func (c CanonType) MustParse(s string) Tag

MustParse is like Parse, but panics if the given BCP 47 tag cannot be parsed. It simplifies safe initialization of Tag values.

func (CanonType) Parse

func (c CanonType) Parse(s string) (t Tag, err error)

Parse parses the given BCP 47 string and returns a valid Tag. If parsing failed it returns an error and any part of the tag that could be parsed. If parsing succeeded but an unknown value was found, it returns ValueError. The Tag returned in this case is just stripped of the unknown value. All other values are preserved. It accepts tags in the BCP 47 format and extensions to this standard defined in http://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers. The resulting tag is canonicalized using the the canonicalization type c.

type Confidence

type Confidence int

Confidence indicates the level of certainty for a given return value. For example, Serbian may be written in Cyrillic or Latin script. The confidence level indicates whether a value was explicitly specified, whether it is typically the only possible value, or whether there is an ambiguity.

const (
    No    Confidence = iota // full confidence that there was no match
    Low                     // most likely value picked out of a set of alternatives
    High                    // value is generally assumed to be the correct match
    Exact                   // exact match or explicitly specified value
)

func Comprehends

func Comprehends(speaker, alternative Tag) Confidence

Comprehends reports the confidence score for a speaker of a given language to being able to comprehend the written form of an alternative language.

Example

Code:

// Various levels of comprehensibility.
fmt.Println(language.Comprehends(language.English, language.English))
fmt.Println(language.Comprehends(language.AmericanEnglish, language.BritishEnglish))

// An explicit Und results in no match.
fmt.Println(language.Comprehends(language.English, language.Und))

fmt.Println("----")

// There is usually no mutual comprehensibility between different scripts.
fmt.Println(language.Comprehends(language.Make("en-Dsrt"), language.English))

// One exception is for Traditional versus Simplified Chinese, albeit with
// a low confidence.
fmt.Println(language.Comprehends(language.TraditionalChinese, language.SimplifiedChinese))

fmt.Println("----")

// A Swiss German speaker will often understand High German.
fmt.Println(language.Comprehends(language.Make("gsw"), language.Make("de")))

// The converse is not generally the case.
fmt.Println(language.Comprehends(language.Make("de"), language.Make("gsw")))

Output:

Exact
High
No
----
No
Low
----
High
No

func (Confidence) String

func (c Confidence) String() string

type Coverage

type Coverage interface {
    // Tags returns the list of supported tags.
    Tags() []Tag

    // BaseLanguages returns the list of supported base languages.
    BaseLanguages() []Base

    // Scripts returns the list of supported scripts.
    Scripts() []Script

    // Regions returns the list of supported regions.
    Regions() []Region
}

The Coverage interface is used to define the level of coverage of an internationalization service. Note that not all types are supported by all services. As lists may be generated on the fly, it is recommended that users of a Coverage cache the results.

var (
    // Supported defines a Coverage that lists all supported subtags. Tags
    // always returns nil.
    Supported Coverage = allSubtags{}
)

func NewCoverage

func NewCoverage(list ...interface{}) Coverage

NewCoverage returns a Coverage for the given lists. It is typically used by packages providing internationalization services to define their level of coverage. A list may be of type []T or func() []T, where T is either Tag, Base, Script or Region. The returned Coverage derives the value for Bases from Tags if no func or slice for []Base is specified. For other unspecified types the returned Coverage will return nil for the respective methods.

type Extension

type Extension struct {
    // contains filtered or unexported fields
}

Extension is a single BCP 47 extension.

func ParseExtension

func ParseExtension(s string) (e Extension, err error)

ParseExtension parses s as an extension and returns it on success.

func (Extension) String

func (e Extension) String() string

String returns the string representation of the extension, including the type tag.

func (Extension) Tokens

func (e Extension) Tokens() []string

Tokens returns the list of tokens of e.

func (Extension) Type

func (e Extension) Type() byte

Type returns the one-byte extension type of e. It returns 0 for the zero exception.

type Matcher

type Matcher interface {
    Match(t ...Tag) (tag Tag, index int, c Confidence)
}

Matcher is the interface that wraps the Match method.

Match returns the best match for any of the given tags, along with a unique index associated with the returned tag and a confidence score.

Example

ExampleMatcher_bestMatch gives some examples of getting the best match of a set of tags to any of the tags of given set.

Code:

// This is the set of tags from which we want to pick the best match. These
// can be, for example, the supported languages for some package.
tags := []language.Tag{
    language.English,
    language.BritishEnglish,
    language.French,
    language.Afrikaans,
    language.BrazilianPortuguese,
    language.EuropeanPortuguese,
    language.Croatian,
    language.SimplifiedChinese,
    language.Raw.Make("iw-IL"),
    language.Raw.Make("iw"),
    language.Raw.Make("he"),
}
m := language.NewMatcher(tags)

// A simple match.
fmt.Println(m.Match(language.Make("fr")))

// Australian English is closer to British than American English.
fmt.Println(m.Match(language.Make("en-AU")))

// Default to the first tag passed to the Matcher if there is no match.
fmt.Println(m.Match(language.Make("ar")))

// Get the default tag.
fmt.Println(m.Match())

fmt.Println("----")

// Croatian speakers will likely understand Serbian written in Latin script.
fmt.Println(m.Match(language.Make("sr-Latn")))

// We match SimplifiedChinese, but with Low confidence.
fmt.Println(m.Match(language.TraditionalChinese))

// Serbian in Latin script is a closer match to Croatian than Traditional
// Chinese to Simplified Chinese.
fmt.Println(m.Match(language.TraditionalChinese, language.Make("sr-Latn")))

fmt.Println("----")

// In case a multiple variants of a language are available, the most spoken
// variant is typically returned.
fmt.Println(m.Match(language.Portuguese))

// Pick the first value passed to Match in case of a tie.
fmt.Println(m.Match(language.Dutch, language.Make("fr-BE"), language.Make("af-NA")))
fmt.Println(m.Match(language.Dutch, language.Make("af-NA"), language.Make("fr-BE")))

fmt.Println("----")

// If a Matcher is initialized with a language and it's deprecated version,
// it will distinguish between them.
fmt.Println(m.Match(language.Raw.Make("iw")))

// However, for non-exact matches, it will treat deprecated versions as
// equivalent and consider other factors first.
fmt.Println(m.Match(language.Raw.Make("he-IL")))

fmt.Println("----")

// User settings passed to the Unicode extension are ignored for matching
// and preserved in the returned tag.
fmt.Println(m.Match(language.Make("de-u-co-phonebk"), language.Make("fr-u-cu-frf")))

// Even if the matching language is different.
fmt.Println(m.Match(language.Make("de-u-co-phonebk"), language.Make("br-u-cu-frf")))

// If there is no matching language, the options of the first preferred tag are used.
fmt.Println(m.Match(language.Make("de-u-co-phonebk")))

Output:

fr 2 Exact
en-GB 1 High
en 0 No
en 0 No
----
hr 6 High
zh-Hans 7 Low
hr 6 High
----
pt-BR 4 High
fr 2 High
af 3 High
----
iw 9 Exact
iw-IL 8 Exact
----
fr-u-cu-frf 2 Exact
fr-u-cu-frf 2 High
en-u-co-phonebk 0 No

func NewMatcher

func NewMatcher(t []Tag) Matcher

NewMatcher returns a Matcher that matches an ordered list of preferred tags against a list of supported tags based on written intelligibility, closeness of dialect, equivalence of subtags and various other rules. It is initialized with the list of supported tags. The first element is used as the default value in case no match is found.

Its Match method matches the first of the given Tags to reach a certain confidence threshold. The tags passed to Match should therefore be specified in order of preference. Extensions are ignored for matching.

The index returned by the Match method corresponds to the index of the matched tag in t, but is augmented with the Unicode extension ('u')of the corresponding preferred tag. This allows user locale options to be passed transparently.

type Region

type Region struct {
    // contains filtered or unexported fields
}

Region is an ISO 3166-1 or UN M.49 code for representing countries and regions.

func EncodeM49

func EncodeM49(r int) (Region, error)

EncodeM49 returns the Region for the given UN M.49 code. It returns an error if r is not a valid code.

func MustParseRegion

func MustParseRegion(s string) Region

MustParseRegion is like ParseRegion, but panics if the given region cannot be parsed. It simplifies safe initialization of Region values.

func ParseRegion

func ParseRegion(s string) (Region, error)

ParseRegion parses a 2- or 3-letter ISO 3166-1 or a UN M.49 code. It returns a ValueError if s is a well-formed but unknown region identifier or another error if another error occurred.

func (Region) Canonicalize

func (r Region) Canonicalize() Region

Canonicalize returns the region or a possible replacement if the region is deprecated. It will not return a replacement for deprecated regions that are split into multiple regions.

func (Region) Contains

func (r Region) Contains(c Region) bool

Contains returns whether Region c is contained by Region r. It returns true if c == r.

func (Region) ISO3

func (r Region) ISO3() string

ISO3 returns the 3-letter ISO code of r. Note that not all regions have a 3-letter ISO code. In such cases this method returns "ZZZ".

func (Region) IsCountry

func (r Region) IsCountry() bool

IsCountry returns whether this region is a country or autonomous area. This includes non-standard definitions from CLDR.

func (Region) IsGroup

func (r Region) IsGroup() bool

IsGroup returns whether this region defines a collection of regions. This includes non-standard definitions from CLDR.

func (Region) IsPrivateUse

func (r Region) IsPrivateUse() bool

IsPrivateUse reports whether r has the ISO 3166 User-assigned status. This may include private-use tags that are assigned by CLDR and used in this implementation. So IsPrivateUse and IsCountry can be simultaneously true.

func (Region) M49

func (r Region) M49() int

M49 returns the UN M.49 encoding of r, or 0 if this encoding is not defined for r.

func (Region) String

func (r Region) String() string

String returns the BCP 47 representation for the region. It returns "ZZ" for an unspecified region.

func (Region) TLD

func (r Region) TLD() (Region, error)

TLD returns the country code top-level domain (ccTLD). UK is returned for GB. In all other cases it returns either the region itself or an error.

This method may return an error for a region for which there exists a canonical form with a ccTLD. To get that ccTLD canonicalize r first. The region will already be canonicalized it was obtained from a Tag that was obtained using any of the default methods.

Example

Code:

us := language.MustParseRegion("US")
gb := language.MustParseRegion("GB")
uk := language.MustParseRegion("UK")
bu := language.MustParseRegion("BU")

fmt.Println(us.TLD())
fmt.Println(gb.TLD())
fmt.Println(uk.TLD())
fmt.Println(bu.TLD())

fmt.Println(us.Canonicalize().TLD())
fmt.Println(gb.Canonicalize().TLD())
fmt.Println(uk.Canonicalize().TLD())
fmt.Println(bu.Canonicalize().TLD())

Output:

US <nil>
UK <nil>
UK <nil>
ZZ language: region is not a valid ccTLD
US <nil>
UK <nil>
UK <nil>
MM <nil>

type Script

type Script struct {
    // contains filtered or unexported fields
}

Script is a 4-letter ISO 15924 code for representing scripts. It is idiomatically represented in title case.

func MustParseScript

func MustParseScript(s string) Script

MustParseScript is like ParseScript, but panics if the given script cannot be parsed. It simplifies safe initialization of Script values.

func ParseScript

func ParseScript(s string) (Script, error)

ParseScript parses a 4-letter ISO 15924 code. It returns a ValueError if s is a well-formed but unknown script identifier or another error if another error occurred.

func (Script) IsPrivateUse

func (s Script) IsPrivateUse() bool

IsPrivateUse reports whether this script code is reserved for private use.

func (Script) String

func (s Script) String() string

String returns the script code in title case. It returns "Zzzz" for an unspecified script.

type Tag

type Tag struct {
    // contains filtered or unexported fields
}

Tag represents a BCP 47 language tag. It is used to specify an instance of a specific language or locale. All language tag values are guaranteed to be well-formed.

var (
    Und Tag = Tag{}

    Afrikaans            Tag = Tag{lang: _af}                //  af
    Amharic              Tag = Tag{lang: _am}                //  am
    Arabic               Tag = Tag{lang: _ar}                //  ar
    ModernStandardArabic Tag = Tag{lang: _ar, region: _001}  //  ar-001
    Azerbaijani          Tag = Tag{lang: _az}                //  az
    Bulgarian            Tag = Tag{lang: _bg}                //  bg
    Bengali              Tag = Tag{lang: _bn}                //  bn
    Catalan              Tag = Tag{lang: _ca}                //  ca
    Czech                Tag = Tag{lang: _cs}                //  cs
    Danish               Tag = Tag{lang: _da}                //  da
    German               Tag = Tag{lang: _de}                //  de
    Greek                Tag = Tag{lang: _el}                //  el
    English              Tag = Tag{lang: _en}                //  en
    AmericanEnglish      Tag = Tag{lang: _en, region: _US}   //  en-US
    BritishEnglish       Tag = Tag{lang: _en, region: _GB}   //  en-GB
    Spanish              Tag = Tag{lang: _es}                //  es
    EuropeanSpanish      Tag = Tag{lang: _es, region: _ES}   //  es-ES
    LatinAmericanSpanish Tag = Tag{lang: _es, region: _419}  //  es-419
    Estonian             Tag = Tag{lang: _et}                //  et
    Persian              Tag = Tag{lang: _fa}                //  fa
    Finnish              Tag = Tag{lang: _fi}                //  fi
    Filipino             Tag = Tag{lang: _fil}               //  fil
    French               Tag = Tag{lang: _fr}                //  fr
    CanadianFrench       Tag = Tag{lang: _fr, region: _CA}   //  fr-CA
    Gujarati             Tag = Tag{lang: _gu}                //  gu
    Hebrew               Tag = Tag{lang: _he}                //  he
    Hindi                Tag = Tag{lang: _hi}                //  hi
    Croatian             Tag = Tag{lang: _hr}                //  hr
    Hungarian            Tag = Tag{lang: _hu}                //  hu
    Armenian             Tag = Tag{lang: _hy}                //  hy
    Indonesian           Tag = Tag{lang: _id}                //  id
    Icelandic            Tag = Tag{lang: _is}                //  is
    Italian              Tag = Tag{lang: _it}                //  it
    Japanese             Tag = Tag{lang: _ja}                //  ja
    Georgian             Tag = Tag{lang: _ka}                //  ka
    Kazakh               Tag = Tag{lang: _kk}                //  kk
    Khmer                Tag = Tag{lang: _km}                //  km
    Kannada              Tag = Tag{lang: _kn}                //  kn
    Korean               Tag = Tag{lang: _ko}                //  ko
    Kirghiz              Tag = Tag{lang: _ky}                //  ky
    Lao                  Tag = Tag{lang: _lo}                //  lo
    Lithuanian           Tag = Tag{lang: _lt}                //  lt
    Latvian              Tag = Tag{lang: _lv}                //  lv
    Macedonian           Tag = Tag{lang: _mk}                //  mk
    Malayalam            Tag = Tag{lang: _ml}                //  ml
    Mongolian            Tag = Tag{lang: _mn}                //  mn
    Marathi              Tag = Tag{lang: _mr}                //  mr
    Malay                Tag = Tag{lang: _ms}                //  ms
    Burmese              Tag = Tag{lang: _my}                //  my
    Nepali               Tag = Tag{lang: _ne}                //  ne
    Dutch                Tag = Tag{lang: _nl}                //  nl
    Norwegian            Tag = Tag{lang: _no}                //  no
    Punjabi              Tag = Tag{lang: _pa}                //  pa
    Polish               Tag = Tag{lang: _pl}                //  pl
    Portuguese           Tag = Tag{lang: _pt}                //  pt
    BrazilianPortuguese  Tag = Tag{lang: _pt, region: _BR}   //  pt-BR
    EuropeanPortuguese   Tag = Tag{lang: _pt, region: _PT}   //  pt-PT
    Romanian             Tag = Tag{lang: _ro}                //  ro
    Russian              Tag = Tag{lang: _ru}                //  ru
    Sinhala              Tag = Tag{lang: _si}                //  si
    Slovak               Tag = Tag{lang: _sk}                //  sk
    Slovenian            Tag = Tag{lang: _sl}                //  sl
    Albanian             Tag = Tag{lang: _sq}                //  sq
    Serbian              Tag = Tag{lang: _sr}                //  sr
    SerbianLatin         Tag = Tag{lang: _sr, script: _Latn} //  sr-Latn
    Swedish              Tag = Tag{lang: _sv}                //  sv
    Swahili              Tag = Tag{lang: _sw}                //  sw
    Tamil                Tag = Tag{lang: _ta}                //  ta
    Telugu               Tag = Tag{lang: _te}                //  te
    Thai                 Tag = Tag{lang: _th}                //  th
    Turkish              Tag = Tag{lang: _tr}                //  tr
    Ukrainian            Tag = Tag{lang: _uk}                //  uk
    Urdu                 Tag = Tag{lang: _ur}                //  ur
    Uzbek                Tag = Tag{lang: _uz}                //  uz
    Vietnamese           Tag = Tag{lang: _vi}                //  vi
    Chinese              Tag = Tag{lang: _zh}                //  zh
    SimplifiedChinese    Tag = Tag{lang: _zh, script: _Hans} //  zh-Hans
    TraditionalChinese   Tag = Tag{lang: _zh, script: _Hant} //  zh-Hant
    Zulu                 Tag = Tag{lang: _zu}                //  zu
)

Example (Values)

Code:

us := language.MustParseRegion("US")
en := language.MustParseBase("en")

lang, _, region := language.AmericanEnglish.Raw()
fmt.Println(lang == en, region == us)

lang, _, region = language.BritishEnglish.Raw()
fmt.Println(lang == en, region == us)

// Tags can be compared for exact equivalence using '=='.
en_us, _ := language.Compose(en, us)
fmt.Println(en_us == language.AmericanEnglish)

Output:

true true
true false
true

func Compose

func Compose(part ...interface{}) (t Tag, err error)

Compose creates a Tag from individual parts, which may be of type Tag, Base, Script, Region, Variant, []Variant, Extension, []Extension or error. If a Base, Script or Region or slice of type Variant or Extension is passed more than once, the latter will overwrite the former. Variants and Extensions are accumulated, but if two extensions of the same type are passed, the latter will replace the former. A Tag overwrites all former values and typically only makes sense as the first argument. The resulting tag is returned after canonicalizing using the Default CanonType. If one or more errors are encountered, one of the errors is returned.

Example

Code:

nl, _ := language.ParseBase("nl")
us, _ := language.ParseRegion("US")
de := language.Make("de-1901-u-co-phonebk")
jp := language.Make("ja-JP")
fi := language.Make("fi-x-ing")

u, _ := language.ParseExtension("u-nu-arabic")
x, _ := language.ParseExtension("x-piglatin")

// Combine a base language and region.
fmt.Println(language.Compose(nl, us))
// Combine a base language and extension.
fmt.Println(language.Compose(nl, x))
// Replace the region.
fmt.Println(language.Compose(jp, us))
// Combine several tags.
fmt.Println(language.Compose(us, nl, u))

// Replace the base language of a tag.
fmt.Println(language.Compose(de, nl))
fmt.Println(language.Compose(de, nl, u))
// Remove the base language.
fmt.Println(language.Compose(de, language.Base{}))
// Remove all variants.
fmt.Println(language.Compose(de, []language.Variant{}))
// Remove all extensions.
fmt.Println(language.Compose(de, []language.Extension{}))
fmt.Println(language.Compose(fi, []language.Extension{}))
// Remove all variants and extensions.
fmt.Println(language.Compose(de.Raw()))

// An error is gobbled or returned if non-nil.
fmt.Println(language.Compose(language.ParseRegion("ZA")))
fmt.Println(language.Compose(language.ParseRegion("HH")))

// Compose uses the same Default canonicalization as Make.
fmt.Println(language.Compose(language.Raw.Parse("en-Latn-UK")))

// Call compose on a different CanonType for different results.
fmt.Println(language.All.Compose(language.Raw.Parse("en-Latn-UK")))

Output:

nl-US <nil>
nl-x-piglatin <nil>
ja-US <nil>
nl-US-u-nu-arabic <nil>
nl-1901-u-co-phonebk <nil>
nl-1901-u-nu-arabic <nil>
und-1901-u-co-phonebk <nil>
de-u-co-phonebk <nil>
de-1901 <nil>
fi <nil>
de <nil>
und-ZA <nil>
und language: subtag "HH" is well-formed but unknown
en-Latn-GB <nil>
en-GB <nil>

func Make

func Make(s string) Tag

Make is a convenience wrapper for Parse that omits the error. In case of an error, a sensible default is returned.

func MustParse

func MustParse(s string) Tag

MustParse is like Parse, but panics if the given BCP 47 tag cannot be parsed. It simplifies safe initialization of Tag values.

func Parse

func Parse(s string) (t Tag, err error)

Parse parses the given BCP 47 string and returns a valid Tag. If parsing failed it returns an error and any part of the tag that could be parsed. If parsing succeeded but an unknown value was found, it returns ValueError. The Tag returned in this case is just stripped of the unknown value. All other values are preserved. It accepts tags in the BCP 47 format and extensions to this standard defined in http://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers. The resulting tag is canonicalized using the default canonicalization type.

Example (Errors)

Code:

for _, s := range []string{"Foo", "Bar", "Foobar"} {
    _, err := language.Parse(s)
    if err != nil {
        if inv, ok := err.(language.ValueError); ok {
            fmt.Println(inv.Subtag())
        } else {
            fmt.Println(s)
        }
    }
}
for _, s := range []string{"en", "aa-Uuuu", "AC", "ac-u"} {
    _, err := language.Parse(s)
    switch e := err.(type) {
    case language.ValueError:
        fmt.Printf("%s: culprit %q\n", s, e.Subtag())
    case nil:
        // No error.
    default:
        // A syntax error.
        fmt.Printf("%s: ill-formed\n", s)
    }
}

Output:

foo
Foobar
aa-Uuuu: culprit "Uuuu"
AC: culprit "ac"
ac-u: ill-formed

func (Tag) Base

func (t Tag) Base() (Base, Confidence)

Base returns the base language of the language tag. If the base language is unspecified, an attempt will be made to infer it from the context. It uses a variant of CLDR's Add Likely Subtags algorithm. This is subject to change.

Example

Code:

fmt.Println(language.Make("und").Base())
fmt.Println(language.Make("und-US").Base())
fmt.Println(language.Make("und-NL").Base())
fmt.Println(language.Make("und-419").Base()) // Latin America
fmt.Println(language.Make("und-ZZ").Base())

Output:

en Low
en High
nl High
es Low
en Low

func (Tag) Extension

func (t Tag) Extension(x byte) (ext Extension, ok bool)

Extension returns the extension of type x for tag t. It will return false for ok if t does not have the requested extension. The returned extension will be invalid in this case.

func (Tag) Extensions

func (t Tag) Extensions() []Extension

Extensions returns all extensions of t.

func (Tag) IsRoot

func (t Tag) IsRoot() bool

IsRoot returns true if t is equal to language "und".

func (Tag) Parent

func (t Tag) Parent() Tag

Parent returns the CLDR parent of t. In CLDR, missing fields in data for a specific language are substituted with fields from the parent language. The parent for a language may change for newer versions of CLDR.

func (Tag) Raw

func (t Tag) Raw() (b Base, s Script, r Region)

Raw returns the raw base language, script and region, without making an attempt to infer their values.

func (Tag) Region

func (t Tag) Region() (Region, Confidence)

Region returns the region for the language tag. If it was not explicitly given, it will infer a most likely candidate from the context. It uses a variant of CLDR's Add Likely Subtags algorithm. This is subject to change.

Example

Code:

ru := language.Make("ru")
en := language.Make("en")
fmt.Println(ru.Region())
fmt.Println(en.Region())

Output:

RU Low
US Low

func (Tag) Script

func (t Tag) Script() (Script, Confidence)

Script infers the script for the language tag. If it was not explicitly given, it will infer a most likely candidate. If more than one script is commonly used for a language, the most likely one is returned with a low confidence indication. For example, it returns (Cyrl, Low) for Serbian. If a script cannot be inferred (Zzzz, No) is returned. We do not use Zyyy (undetermined) as one would suspect from the IANA registry for BCP 47. In a Unicode context Zyyy marks common characters (like 1, 2, 3, '.', etc.) and is therefore more like multiple scripts. See http://www.unicode.org/reports/tr24/#Values for more details. Zzzz is also used for unknown value in CLDR. (Zzzz, Exact) is returned if Zzzz was explicitly specified. Note that an inferred script is never guaranteed to be the correct one. Latin is almost exclusively used for Afrikaans, but Arabic has been used for some texts in the past. Also, the script that is commonly used may change over time. It uses a variant of CLDR's Add Likely Subtags algorithm. This is subject to change.

Example

Code:

en := language.Make("en")
sr := language.Make("sr")
sr_Latn := language.Make("sr_Latn")
fmt.Println(en.Script())
fmt.Println(sr.Script())
// Was a script explicitly specified?
_, c := sr.Script()
fmt.Println(c == language.Exact)
_, c = sr_Latn.Script()
fmt.Println(c == language.Exact)

Output:

Latn High
Cyrl Low
false
true

func (Tag) SetTypeForKey

func (t Tag) SetTypeForKey(key, value string) (Tag, error)

SetTypeForKey returns a new Tag with the key set to type, where key and type are of the allowed values defined for the Unicode locale extension ('u') in http://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers. An empty value removes an existing pair with the same key.

func (Tag) String

func (t Tag) String() string

String returns the canonical string representation of the language tag.

func (Tag) TypeForKey

func (t Tag) TypeForKey(key string) string

TypeForKey returns the type associated with the given key, where key and type are of the allowed values defined for the Unicode locale extension ('u') in http://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers. TypeForKey will traverse the inheritance chain to get the correct value.

func (Tag) Variants

func (t Tag) Variants() []Variant

Variant returns the variants specified explicitly for this language tag. or nil if no variant was specified.

type ValueError

type ValueError struct {
    // contains filtered or unexported fields
}

ValueError is returned by any of the parsing functions when the input is well-formed but the respective subtag is not recognized as a valid value.

func (ValueError) Error

func (e ValueError) Error() string

Error implements the error interface.

func (ValueError) Subtag

func (e ValueError) Subtag() string

Subtag returns the subtag for which the error occurred.

type Variant

type Variant struct {
    // contains filtered or unexported fields
}

Variant represents a registered variant of a language as defined by BCP 47.

func ParseVariant

func ParseVariant(s string) (Variant, error)

ParseVariant parses and returns a Variant. An error is returned if s is not a valid variant.

func (Variant) String

func (v Variant) String() string

String returns the string representation of the variant.

Subdirectories

Name Synopsis
..
display Package display provides display names for languages, scripts and regions in a requested language.