Displaying non-English characters

You can send non-English characters in the response to a clients' browsers by using special character sequences instead of setting the character sets. This section describes how to use HTML entities and Unicode sequences.

Using HTML entities

HTML 2.0 introduced character sequences called HTML entities that lets you send special characters to the client.

HTML entities begin with an ampersand (&) and end with a semicolon (;). The following HTML entities display angle brackets (< >):

&lt;HTML&gt;

This example renders the following in the browser:

<HTML>
The browser does not interpret the special characters as HTML.

When representing non-English characters as HTML entities, you must use the entity for the language in which you are presenting the character. The following example uses HTML entities to display the letter A in three different languages:

...
out.println("English: &#65;");
out.println("Greek: &#937;");
out.println("Cyrillic: &#1105;")
...

For a list of HTML entities, see http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html.

Using Unicode sequences

HTML 4.0 extends HTML entities so that they map to Unicode sequences. The Unicode consortium defines a Unicode character set that maps almost all languages on one table. Unicode define all characters as a two-byte sequence, not one-byte ASCII, as HTML entities are defined. The Unicode 2.1 standard defines encoding for approximately 40,000 characters.

Unicode sequences consist of an escaped u, followed by the two-byte representation of the character. The following Unicode sequences display angle brackets (< >):

\u003CHTML\u003E

This example renders the following in the browser:

<HTML>

The browser does not interpret the special characters as HTML.

The following servlet code prints Hello World in Japanese using Unicode:

out.println("<b>\u30cf\u30ed\u30fc\u30ef\u30fc\u30eb\u30c9</b>");

The following JSP code prints Hello World in English, Japanese, and Korean:

<h2><%= "\u0048\u0065\u006C\u006C\u006F \u0057\u006F\u0072\u006C\u0064" %></h2>
<h2><%= "\u30CF\u30ED\u30FC \u30EF\u30FC\u30EB\u30C9" %></h2>
<h2><%= "\uD5EC\uB85C \uC6D4\uB4DC" %></h2>

For a list of Unicode sequence characters, see http://www.unicode.org.

Note:   To view some character sets, you must install support for additional languages on your computer. For more information, see your operating system documentation.

 

Send me an e-mail when comments are added to this page | Comment Report

Current page: http://livedocs.adobe.com/jrun/4/Programmers_Guide/i10n7.htm