588 lines
28 KiB
Markdown
588 lines
28 KiB
Markdown
---
|
|
layout: default
|
|
title: Locales and Resources
|
|
nav_order: 5
|
|
has_children: true
|
|
---
|
|
<!--
|
|
© 2020 and later: Unicode, Inc. and others.
|
|
License & terms of use: http://www.unicode.org/copyright.html
|
|
-->
|
|
|
|
# Locale
|
|
{: .no_toc }
|
|
|
|
## Contents
|
|
{: .no_toc .text-delta }
|
|
|
|
1. TOC
|
|
{:toc}
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This chapter explains **locales**, a fundamental concept in ICU. ICU services
|
|
are parameterized by locale, to allow client code to be written in a
|
|
locale-independent way, but to deliver culturally correct results.
|
|
|
|
## The Locale Concept
|
|
|
|
A locale identifies a specific user community - a group of users who have
|
|
similar culture and language expectations for human-computer interaction (and
|
|
the kinds of data they process).
|
|
|
|
A community is usually understood as the intersection of all users speaking the
|
|
same language and living in the same country. Furthermore, a community can use
|
|
more specific conventions. For example, an English/United States/Military locale
|
|
is separate from the regular English/United States locale since the US military
|
|
writes times and dates differently than most of the civilian community.
|
|
|
|
A program should be localized according to the rules specific for the target
|
|
locale. Many ICU services rely on the proper locale identification in their
|
|
function.
|
|
|
|
The locale object in ICU is an identifier that specifies a particular locale and
|
|
has fields for language, country, and an optional code to specify further
|
|
variants or subdivisions. These fields also can be represented as a string with
|
|
the fields separated by an underscore.
|
|
|
|
In the C++ API, the locale is represented by the `Locale` class, which provides
|
|
methods for finding language, country and variant components. In the C API the locale
|
|
is defined simply by a character string. In the Java API, the locale is represented by
|
|
`ULocale` which is analogous to the `Locale` class but provide additional support
|
|
for ICU protocol. All the locale-sensitive ICU services use the locale information
|
|
to determine language and other locale specific parameters of their function.
|
|
The list of locale-sensitive services can be found in the Introduction to ICU
|
|
section. Other parts of the library use the locale as an indicator to
|
|
customize their behavior.
|
|
|
|
For example, when the locale-sensitive date format service needs to format a
|
|
date, it uses the convention appropriate to the current locale. If the locale is
|
|
English, it uses the word "Monday" and if it is French, it uses the word
|
|
"Lundi".
|
|
|
|
The locale object also defines the concept of a default locale. The default
|
|
locale is the locale, used by many programs, that regulates the rest of the
|
|
computer's behavior by default and is usually controlled by the user in a
|
|
control panel window. The locale mechanism does not require a program to know
|
|
which locale the user is using and thus makes most programming simpler.
|
|
|
|
Since locale objects can be passed as parameters or stored in variables, the
|
|
program does not have to know specifically which locales they identify. Many
|
|
applications enable a user to select a locale. The resulting locale object is
|
|
passed as a parameter, which then produces the customized behavior for that
|
|
locale.
|
|
|
|
A locale provides a means of identifying a specific region for the purposes of
|
|
internationalization and localization.
|
|
|
|
> :point_right: **Note**: An ICU locale is frequently confused with a Portable
|
|
> Operating System Interface (POSIX) locale ID. An ICU locale ID is not a POSIX
|
|
> locale ID. ICU locales do not specify the encoding and specify variant locales
|
|
> differently.
|
|
|
|
A locale consists of one or more pieces of ordered information:
|
|
|
|
### Language code
|
|
|
|
The languages are specified using a two- or three-letter lowercase code for a
|
|
particular language. For example, Spanish is "es", English is "en" and French is
|
|
"fr". The two-letter language code uses the
|
|
[ISO-639](https://www.loc.gov/standards/iso639-2/) standard.
|
|
|
|
### Script code
|
|
|
|
The optional four-letter script code follows the language code. If specified, it
|
|
should be a valid script code as listed on the
|
|
[Unicode ISO 15924 Registry](https://www.unicode.org/iso15924/iso15924-codes.html).
|
|
|
|
### Country code
|
|
|
|
There are often different language conventions within the same language. For
|
|
example, Spanish is spoken in many countries in Central and South America but
|
|
the currencies are different in each country. To allow for these differences
|
|
among specific geographical, political, or cultural regions, locales are
|
|
specified by two-letter, uppercase codes. For example, "ES" represents Spain and
|
|
"MX" represents Mexico. The two letter country code uses the
|
|
[ISO-3166](https://www.iso.org/iso-3166-country-codes.html) standard.
|
|
|
|
Java supports two letter country codes that uses ISO-3166 and UN M.49 code.
|
|
|
|
### Variant code
|
|
|
|
Differences may also appear in language conventions used within the same
|
|
country. For example, the Euro currency is used in several European countries
|
|
while the individual country's currency is still in circulation. Variations
|
|
inside a language and country pair are handled by adding a third code, the
|
|
variant code. The variant code is arbitrary and completely application-specific.
|
|
ICU adds "_EURO" to its locale designations for locales that support the Euro
|
|
currency. Variants can have any number of underscored key words. For example,
|
|
"EURO_WIN" is a variant for the Euro currency on a Windows computer.
|
|
|
|
Another use of the variant code is to designate the Collation (sorting order) of
|
|
a locale. For instance, the "es__TRADITIONAL" locale uses the traditional
|
|
sorting order which is different from the default modern sorting of Spanish.
|
|
|
|
Collation order and currency can be more flexibly specified using keywords
|
|
instead of variants; see below.
|
|
|
|
### Keywords
|
|
|
|
The final element of a locale is an optional list of keywords together with
|
|
their values. Keywords must be unique. Their order is not significant. Unknown
|
|
keywords are ignored. The handling of keywords depends on the specific services
|
|
that utilize them. Currently, the following keywords are recognized:
|
|
|
|
Keyword | Possible Values | Description
|
|
--------|-----------------|------------
|
|
calendar | A calendar specifier such as "gregorian", "islamic", "chinese", "islamic-civil", "hebrew", "japanese", or "buddhist". See the Key/Type Definitions table in the [Locale Data Markup Language](http://www.unicode.org/reports/tr35/) for a list of recognized values. | If present, the calendar keyword specifies the calendar type that the `Calendar` factory methods create. See the calendar locale and keyword handling section (§) of the [Calendar Classes](../datetime/calendar/index.md) chapter for details.
|
|
collation | A collation specifier such as "phonebook", "pinyin", "traditional", "stroke", "direct", or "posix". See the Key/Type Definitions table in the [Locale Data Markup Language](http://www.unicode.org/reports/tr35/) for a list of recognized values. | If present, the collation keyword modifies how the collation service searches through the locale data when instantiating a collator. See the collation locale and keyword handling section (§) of the [Collation Services Architecture](../collation/architecture.md) chapter for details.
|
|
currency | Any standard three-letter currency code, such as "USD" or "JPY". See the LocaleExplorer [currency list](http://demo.icu-project.org/icu-bin/locexp?_=en&SHOWCurrencies=1#Currencies) for a list of currently recognized currency codes. | If present, the currency keyword is used by `NumberFormat` to determine the currency to use to format a currency value, and by `ucurr_forLocale()` to specify a currency.
|
|
numbers | A numbering system specifier such as "latn", "arab", "deva", "hansfin" or "thai". See the Key/Type Definitions table in the [Locale Data Markup Language](http://www.unicode.org/reports/tr35/) for a list of recognized values. | If present, the numbers keyword is used by `NumberFormat` to determine the numbering system to be used for formatting and parsing numbers. The numbering system defines the set of digits used for decimal formatting, such as "latn" for western (ASCII) digits, or "thai" for Thai digits. The numbering system may also define complex algorithms for number formatting, such as "hansfin" for simplified Chinese numerals using financial ideographs.
|
|
|
|
If any of these keywords is absent, the service requesting it will typically use
|
|
the rest of the locale specifier in order to determine the appropriate behavior
|
|
for the locale. The keywords allow a locale specifier to override or refine this
|
|
default behavior.
|
|
|
|
### Examples
|
|
|
|
Locale ID | Language | Script | Country | Variant | Keywords | Definition
|
|
----------|----------|--------|---------|---------|----------|-----------
|
|
en_US | en | | US | | | English, United States of America. <br>Browse in [LocaleExplorer](http://demo.icu-project.org/icu-bin/locexp?_=en_US)
|
|
en_IE_PREEURO | en | | IE | | | English, Ireland. <br>Browse in [LocaleExplorer](http://demo.icu-project.org/icu-bin/locexp?_=en_IE_PREEURO)
|
|
en_IE@currency=IEP | en | | IE | | currency=IEP | English, Ireland with Irish Pound. <br>Browse in [LocaleExplorer](http://demo.icu-project.org/icu-bin/locexp?_=en_IE@currency=IEP)
|
|
eo | eo | | | | | Esperanto. <br>Browse in [LocaleExplorer](http://demo.icu-project.org/icu-bin/locexp?_=eo)
|
|
fr@collation=phonebook;calendar=islamic-civil | fr | | | | collation=phonebook <br>calendar=islamic-civil | French (Calendar=Islamic-Civil Calendar, Collation=Phonebook Order). <br>Browse in [LocaleExplorer](http://demo.icu-project.org/icu-bin/locexp?_=fr@collation=phonebook;calendar=islamic-civil)
|
|
sr_Latn_RS_REVISED@currency=USD | sr | Latn | RS | REVISED | currency=USD | Serbian (Latin, Yugoslavia, Revised Orthography, Currency=US Dollar) <br>Browse in [LocaleExplorer](http://demo.icu-project.org/icu-bin/locexp?d_=en&_=sr_Latn_RS_REVISED@currency=USD)
|
|
|
|
|
|
### Default Locales
|
|
|
|
Default locales are available to all the objects in a program. If you set a new
|
|
default locale for one section of code, it can affect the entire program.
|
|
Application programs should not set the default locale as a way to request an
|
|
international object. The default locale is set to be the system locale on that
|
|
platform.
|
|
|
|
For example, when you set the default locale, the change affects the default
|
|
behavior of the `Collator` and `NumberFormat` instances. When the default locale is
|
|
not wanted, you can set the desired locale using a factory method supplied with
|
|
the classes such as `Collator::createInstance()`.
|
|
|
|
Using the ICU C functions, `NULL` can be passed for a locale parameter to specify
|
|
the default locale.
|
|
|
|
## Locales and Services
|
|
|
|
ICU is implemented as a set of services. One example of a service is the
|
|
formatting of a numeric value into a string. Another is the sorting of a list of
|
|
strings. When client code wants to use a service, the first thing it does is
|
|
request a service object for a given locale. The resulting object is then
|
|
expected to perform the its operations in a way that is culturally correct for
|
|
the requested locale.
|
|
|
|
### Requested Locale
|
|
|
|
The **requested** locale is the one specified by the client code when the
|
|
service object is requested.
|
|
|
|
### Valid Locale
|
|
|
|
A **populated** locale is one for which ICU has data, or one in which client
|
|
code has registered a service. If the requested locale is not populated, then
|
|
ICU will fallback until it reaches a populated locale. The first populated
|
|
locale it reaches is the **valid** locale. The
|
|
valid locale is reachable from the requested locale via zero or more fallback
|
|
steps.
|
|
|
|
### Fallback
|
|
|
|
Locale **fallback** proceeds as follows:
|
|
|
|
1. The variant is removed, if there is one.
|
|
|
|
2. The country is removed, if there is one.
|
|
|
|
3. The script is removed, if there is one.
|
|
|
|
4. The ICU default locale is examined. The same set of steps is performed for
|
|
the default locale.
|
|
|
|
At any point, if the desired data is found, then the fallback procedure stops.
|
|
Keywords are not altered during fallback until the default locale is reached, at
|
|
which point all keywords are replaced by those assigned to the default locale.
|
|
|
|
### Actual Locale
|
|
|
|
Services request specific resources within the valid locale. If the valid locale
|
|
directly contains the requested resource, then it is the **actual** locale. If
|
|
not, then ICU will fallback until it reaches a locale that does directly contain
|
|
the requested resource. The first such locale is the actual locale. The actual
|
|
locale is reachable from the valid locale via zero or more fallback steps.
|
|
|
|
### getLocale()
|
|
|
|
Client code may wish to know what the valid and actual locales are for a given
|
|
service object. To support this, ICU services provide the method `getLocale()`.
|
|
The `getLocale()` method takes an argument specifying whether the actual or
|
|
valid locale is to be returned.
|
|
|
|
Some service object will have an empty or null return from `getLocale()`. This
|
|
indicates that the given service object was not created from locale data, or
|
|
that it has since been modified so that it no longer reflects locale data,
|
|
typically through alteration of the pattern (but not localized symbol changes --
|
|
such changes do not reset the actual and valid locale settings).
|
|
|
|
Currently, the services that support the `getLocale()` API are the following
|
|
classes and their subclasses:
|
|
|
|
### Functional Equivalence
|
|
|
|
Various services provide the API `getFunctionalEquivalent` to allow callers
|
|
determine the **functionally equivalent locale** for a requested locale. For
|
|
example, when instantiating a collator for the locale `en_US_CALIFORNIA`, the
|
|
functionally equivalent locale may be `en`.
|
|
|
|
The purpose of this is to allow applications to do intelligent caching. If an
|
|
application opens a service object for locale A with a functional equivalent Q
|
|
and caches it, then later when it requires a service object for locale B, it can
|
|
first check if locale B has the **same functional equivalent** as locale A; if
|
|
so, it can reuse the cached A object for the B locale, and be guaranteed the
|
|
same results as if it has instantiated a service object for B. In other words,
|
|
|
|
```
|
|
Service.getFunctionalEquivalent(A) == Service.getFunctionalEquivalent(B)
|
|
```
|
|
|
|
implies that the object returned by `Service.getInstance(A)` will behave
|
|
equivalently to the object returned by `Service.getInstance(B)`.
|
|
|
|
Here is a pseudo-code example:
|
|
|
|
The functional equivalent locale returned by a service has no meaning beyond
|
|
what is stated above. For example, if the functional equivalent of Greek is
|
|
Hebrew for collation, that makes no statement about the linguistic relation of
|
|
the languages -- it only means that the two collators are functionally
|
|
equivalent.
|
|
|
|
While two locales with the same functional equivalent are guaranteed to be
|
|
equivalent, the converse is **not** true: If two locales are in fact equivalent,
|
|
they may **not** return the same result from `getFunctionalEquivalent`. That is,
|
|
if the object returned by `Service.getInstance(A)` behaves equivalently to the
|
|
object returned by `Service.getInstance(B)`, `Service.getFunctionalEquivalent(A)`
|
|
**may or may not** be equal to `Service.getFunctionalEquivalent(B)`. Take again
|
|
the example of Greek and Hebrew, with respect to collation. These locales may
|
|
happen to be functional equivalents (since they each just turn on full
|
|
normalization), but it may or may not be the case that they return the same
|
|
functionally equivalent locale. This depends on how the data is structured
|
|
internally.
|
|
|
|
The functional equivalent for a locale may change over time. Suppose that Greek
|
|
were enhanced to change sorting of additional ancient Greek characters. In that
|
|
case, it would diverge; the functional equivalent of Greek would no longer be
|
|
Hebrew.
|
|
|
|
## Canonicalization
|
|
|
|
ICU works with **ICU format locale IDs**. These are strings that obey the
|
|
following character set and syntax restrictions:
|
|
|
|
1. The only permitted characters are ASCII letters, hyphen ('-'), underscore
|
|
('_'), at-sign ('@'), equals sign ('='), and semicolon (';').
|
|
|
|
2. IDs consist of either a base name, keyword list, or both. If a keyword list
|
|
is present it must be preceded by an at-sign.
|
|
|
|
3. The base name must precede the keyword list, if both are present.
|
|
|
|
4. The base name defines the language, script, country, and variant, and can
|
|
contain only ASCII letters, hyphen, or underscore.
|
|
|
|
5. The keyword list consists of keyword/value pairs. Each keyword or value
|
|
consists of one or more ASCII letters, hyphen, or underscore. Keywords and
|
|
values are separated by a single equals sign. Multiple keyword/value pairs,
|
|
if present, are separated by a single semicolon. A keyword may not appear
|
|
without a value. The same keyword may not appear twice.
|
|
|
|
ICU performs two kinds of canonicalizing operations on 'ICU format' locale IDs.
|
|
Level 1 canonicalization is performed routinely and automatically by ICU APIs.
|
|
The recommended procedure for client code using locale IDs from outside sources
|
|
(e.g., POSIX, user input, etc.) is to pass such "foreign IDs" through level 2
|
|
canonicalization before use.
|
|
|
|
**Level 1 canonicalization**. This operation performs minor, isolated changes,
|
|
such as changing "en-us" to "en_US". Level 1 canonicalization is **not**
|
|
designed to handle "foreign" locale IDs (POSIX, .NET) but rather IDs that are in
|
|
ICU format, but which do not have normalized case and delimiters. Level 1
|
|
canonicalization is accomplished by the ICU functions `uloc_getName`,
|
|
`Locale::createFromName`, and `Locale::Locale`. The latter two APIs exist in both
|
|
C++ and Java.
|
|
|
|
1. Level 1 canonicalization is defined only on ICU format locale IDs as defined
|
|
above. Behavior with any other kind of input is unspecified.
|
|
|
|
2. Case is normalized. Elements interpreted as **language** strings will be
|
|
converted to lowercase. **Country** and **variant** elements will be
|
|
converted to uppercase. **Script** elements will be title-cased. **Keywords**
|
|
will be converted to lowercase. **Keyword values** will remain unchanged.
|
|
|
|
3. Hyphens are converted to underscores.
|
|
|
|
4. All 3-letter country codes are converted to 2-letter equivalents.
|
|
|
|
5. Any 3-letter language codes are converted to 2-letter equivalents if
|
|
possible. 3-letter language codes with no 2-letter equivalent are kept as
|
|
3-letter codes.
|
|
|
|
6. Keywords are sorted.
|
|
|
|
**Level 2 canonicalization**. This operation may make major changes to the ID,
|
|
possibly replacing entire elements of the ID. An example is changing
|
|
"fr-fr@EURO" to "fr_FR@currency=EUR". Level 2 canonicalization is designed to
|
|
translate POSIX and .NET IDs, as well as nonstandard ICU locale IDs. Level 2 is
|
|
a **superset** of level 1; every operation performed by level 1 is also
|
|
performed by level 2. Level 2 canonicalization is performed by `uloc_canonicalize`
|
|
and `Locale::createCanonical`. The latter API exists in both C++ and Java.
|
|
|
|
1. Level 2 canonicalization operates on ICU format locale IDs with the
|
|
following additions:
|
|
|
|
1. The period ('.') is also a valid input character.
|
|
|
|
2. An at-sign may be followed by text that is not a keyword/value pair. If
|
|
present, such text is added to the variant.
|
|
|
|
2. POSIX variants are normalized, e.g., "en_US@VARIANT" => "en_US_VARIANT".
|
|
|
|
3. POSIX charset specifiers are **deleted**, e.g. "en_US.utf8" => "en_US".
|
|
|
|
4. The variant "EURO" is converted to the keyword specifier "currency=EUR".
|
|
This conversion applies to both "fr_FR_EURO" and "fr_FR@EURO" style IDs.
|
|
|
|
5. The variant "PREEURO" is converted to the keyword specifier "currency=K",
|
|
where K is the 3-letter currency code for the country's national currency in
|
|
effect at the time of the euro transitiion. This conversion applies to both
|
|
"fr_FR_PREURO" and "fr_FR@PREURO" style IDs. This mapping is only performed
|
|
for the following locales: ca_ES (ESP), de_AT (ATS), de_DE (DEM), de_LU
|
|
(EUR), el_GR (GRD), en_BE (BEF), en_IE (IEP), es_ES (ESP), eu_ES (ESP),
|
|
fi_FI (FIM), fr_BE (BEF), fr_FR (FRF), fr_LU (LUF), ga_IE (IEP), gl_ES
|
|
(ESP), it_IT (ITL), nl_BE (BEF), nl_NL (NLG), pt_PT (PTE).
|
|
|
|
6. The following IANA registered ISO 3066 names are remapped: art_LOJBAN =>
|
|
jbo, cel_GAULISH => cel__GAULISH, de_1901 => de__1901, de_1906 => de__1906,
|
|
en_BOONT => en__BOONT, en_SCOUSE => en__SCOUSE, sl_ROZAJ => sl__ROZAJ,
|
|
zh_GAN => zh__GAN, zh_GUOYU => zh, zh_HAKKA => zh__HAKKA, zh_MIN => zh__MIN,
|
|
zh_MIN_NAN => zh__MINNAN, zh_WUU => zh__WUU, zh_XIANG => zh__XIANG, zh_YUE
|
|
=> zh__YUE.
|
|
|
|
7. The following .NET identifiers are remapped: "" (empty string) =>
|
|
en_US_POSIX, az_AZ_CYRL => az_Cyrl_AZ, az_AZ_LATN => az_Latn_AZ, sr_SP_CYRL
|
|
=> sr_Cyrl_SP, sr_SP_LATN => sr_Latn_SP, uz_UZ_CYRL => uz_Cyrl_UZ,
|
|
uz_UZ_LATN => uz_Latn_UZ, zh_CHS => zh_Hans, zh_CHT => zh_Hant. The empty
|
|
string is not remapped if a keyword list is present.
|
|
|
|
8. Variants specifying collation are remapped to collation keyword specifiers,
|
|
as follows: de__PHONEBOOK => de@collation=phonebook, es__TRADITIONAL =>
|
|
es@collation=traditional, hi__DIRECT => hi@collation=direct, zh_TW_STROKE =>
|
|
zh_TW@collation=stroke, zh__PINYIN => zh@collation=pinyin.
|
|
|
|
9. Variants specifying a calendar are remapped to calendar keyword specifiers,
|
|
as follows: ja_JP_TRADITIONAL => ja_JP@calendar=japanese, th_TH_TRADITIONAL
|
|
=> th_TH@calendar=buddhist.
|
|
|
|
10. Special case: C => en_US_POSIX.
|
|
|
|
Certain other operations are not performed by either level 1 or level 2
|
|
canonicalization. These are listed here for completeness.
|
|
|
|
1. Language identifiers that have been superseded will not be remapped. In
|
|
particular, the following transformations are not performed:
|
|
|
|
1. no => nb
|
|
|
|
2. iw => he
|
|
|
|
3. id => in
|
|
|
|
4. nb_no_NY => nn_NO
|
|
|
|
2. The behavior of level 2 canonicalization when presented with a remapped ID
|
|
combined together with keywords is not defined. For example,
|
|
fr_FR_EURO@currency=FRF has an undefined level 2 canonicalization.
|
|
|
|
All APIs (with a few exceptions) in ICU4C that take a `const char* locale`
|
|
parameter can be assumed to automatically peform level 1 canonicalization before
|
|
using the locale ID to do resource lookup, keyword interpretation, etc.
|
|
Specifically, the static API `getLanguage`, `getScript`, `getCountry`, and `getVariant`
|
|
behave exactly like their non-static counterparts in the class `Locale`. That is,
|
|
for any locale ID `loc`, `new Locale(loc).getFoo() == Locale::getFoo(loc)`, where
|
|
Foo is one of Language, Script, Country, or Variant.
|
|
|
|
The `Locale` constructor (in C++ and Java) taking multiple strings behaves exactly
|
|
as if those strings were concatenated, with the '_' separator inserted between
|
|
two adjacent non-empty strings, and the result passed to `uloc_getName`.
|
|
|
|
> :point_right: **Note**: Throughout this discussion `Locale` refers to both the
|
|
> C++ `Locale` class and the ICU4J `com.ibm.icu.util.ULocale` class. Although C++
|
|
> notation is used, all statements made regarding `Locale` apply equally to
|
|
> `com.ibm.icu.util.ULocale`.
|
|
|
|
## Usage: Creating Locales
|
|
|
|
If you are localizing an application to a locale that is not already supported,
|
|
you need to create your own `Locale` object. New `Locale` objects are created using
|
|
one of the three constructors in this class:
|
|
|
|
```c++
|
|
Locale( const char * language);
|
|
Locale( const char * language, const char * country);
|
|
Locale( const char * language, const char * country, const char * variant);
|
|
```
|
|
|
|
Because a locale object is just an identifier for a region, no validity check is
|
|
performed. If you want to verify that the particular resources are available for
|
|
the locale you construct, you must query those resources. For example, you can
|
|
query the `NumberFormat` object for the locales it supports using its
|
|
`getAvailableLocales()` method.
|
|
|
|
New `ULocale` objects in Java are created using one the following three
|
|
constructor in this class:
|
|
|
|
```java
|
|
ULocale( String localeID)
|
|
ULocale( String a, String b)
|
|
ULocale( String a, String b, String c)
|
|
```
|
|
|
|
The locale ID passed in the constructor consists of optional languages, scripts,
|
|
country and variant fields in that oder, separated by underscore, followed by an
|
|
optional keywords. For example, "en_US", "sy_Cyrl_YU", "zh__pinyin",
|
|
"es_ES@currency=EUR,collation=traditional". The fields a, b, c in the other two
|
|
constructors are the components of the locale ID. For example, the following two
|
|
locale object are same:
|
|
|
|
```java
|
|
ULocale ul = new Ulocale("sy_Cyrl_YU");
|
|
ULocale ul = new ULocale("sy", "Cyrl", "YU");
|
|
```
|
|
|
|
In C++, the `Locale` class provides a number of convenient constants that you can
|
|
use to create locales. For example, the following refers to a `NumberFormat` object
|
|
for the United States:
|
|
|
|
```c++
|
|
Locale::getUS()
|
|
```
|
|
|
|
In C, a string with the language country and variant concatenated together with
|
|
an underscore '_' describe a locale. For example, "en_US" is a locale that is
|
|
based on the English language in the United States. The following can be used as
|
|
equivalents to the locale constants:
|
|
|
|
```c
|
|
ULOC_US
|
|
```
|
|
|
|
In Java, the `ULocale` provides a number of convenient constants that can be used
|
|
to create locales.
|
|
|
|
```java
|
|
ULocale.US;
|
|
```
|
|
|
|
## Usage: Retrieving Locales
|
|
|
|
Locale-sensitive classes have a `getAvailableLocales()` method that returns all of
|
|
the locales supported by that class. This method also shows the other methods
|
|
that get locale information from the resource bundle. For example, the following
|
|
shows that the `NumberFormat` class provides three convenience methods for
|
|
creating a default `NumberFormat` object:
|
|
|
|
```c++
|
|
NumberFormat::createInstance();
|
|
NumberFormat::createCurrencyInstance();
|
|
NumberFormat::createPercentInstance();
|
|
```
|
|
|
|
Locale-sensitive classes in Java also have a `getAvailableULocales()` method that
|
|
returns all of the locales supported by that class.
|
|
|
|
### Displayable Names
|
|
|
|
Once you've created a `Locale` in C++ and a `ULocale` in java, you can perform a
|
|
query of the locale for information about itself. The following shows the
|
|
information you can receive from a locale:
|
|
|
|
Method | Description
|
|
-------|------------
|
|
`getCountry()` | Retrieves the ISO Country Code
|
|
`getLanguage()` | Retrieves the ISO Language
|
|
`getDisplayCountry()` | Shows the name of the country suitable for displaying information to the user
|
|
`getDisplayLanguage()` | Shows the name of the language suitable for displaying to the user
|
|
|
|
> :point_right: **Note**: The `getDisplayXXX` methods are themselves locale-sensitive
|
|
> and have two versions in C++: one that uses the default locale and one that takes a
|
|
> locale as an argument and displays the name or country in a language appropriate to
|
|
> that locale.
|
|
|
|
> :point_right: **Note**: In Java, the `getDisplayXXX` methods have three versions:
|
|
> one that uses the default locale, the other takes a locale as an argument and the
|
|
> third one which takes locale ID as an argument.
|
|
|
|
Each class that performs locale-sensitive operations allows you to get all the
|
|
available objects of that type. You can sift through these objects by language,
|
|
country, or variant, and use the display names to present a menu to the user.
|
|
For example, you can create a menu of all the collation objects suitable for a
|
|
given language.
|
|
|
|
### HTTP Accept-Language
|
|
|
|
ICU provides functions to negotiate the best locale to use for an operation,
|
|
given a user's list of acceptable locales, and the application's list of
|
|
available locales. For example, a browser sends the web server the HTTP
|
|
"`Accept-Language`" header indicating which locales, with a ranking, are
|
|
acceptable to the user. The server must determine which locale to use when
|
|
returning content to the user.
|
|
|
|
Here is an example of selecting an acceptable locale within a CGI application:
|
|
|
|
C:
|
|
|
|
```c
|
|
char resultLocale[200];
|
|
UAcceptResult outResult;
|
|
available = ures_openAvailableLocales("myBundle", &status);
|
|
int32_t len = uloc_acceptLanguageFromHTTP(resultLocale, 200, &outResult,
|
|
getenv("HTTP_ACCEPT_LANGUAGE"), available, &status);
|
|
if(U_SUCCESS(status)) {
|
|
printf("Using locale %s\n", outResult);
|
|
}
|
|
```
|
|
|
|
Here is an example of selecting an acceptable locale within a Java application:
|
|
|
|
Java:
|
|
|
|
```java
|
|
ULocale[] availableLocales = ULocale.getAvailableLocales();
|
|
boolean[] fallback = { false };
|
|
ULocale result = ULocale.acceptLanguage(availableLocales, fallback);
|
|
|
|
System.out.println("Using locale " + result);
|
|
```
|
|
|
|
> :point_right: **Note**: As of this writing, this functionality is available in
|
|
> both C and Java. Please read the following two linked documents for important
|
|
> considerations and recommendations when using this header in a web application.
|
|
|
|
> *For further information about the Accept-Language HTTP header:* <br>
|
|
> https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4 <br>
|
|
> *Notes and cautions about the use of this header:* <br>
|
|
> https://www.w3.org/International/questions/qa-accept-lang-locales
|
|
|
|
## Programming in C vs. C++ vs. Java
|
|
|
|
See Programming for Locale in [C, C++ and Java](examples.md) for more information.
|