A character set is a table that maps bytes to characters. Internally, Java handles all contents in an escaped ASCII Unicode character set. When the Java compiler compiles a Java class, it converts the class's contents from the system's character set to escaped ASCII Unicode.
Each character set has its own encoding and decoding algorithm that defines its byte sequence. Encoding is implemented by web browsers on the client side when formulating requests, and Java on the server side when formulating responses. Developers are not required to provide the encoding and decoding mechanisms, but can set and get the character set used by the client's web browser.
Browsers use the character set specified by the HTTP Content-Type header to determine how to render the bytes into characters on the screen.
Common character sets include the following:
In Java, character sets are case insensitive, so the values Shift_JIS and shift_jis are equivalent.
To use multiple character sets on a single page, use the UTF-8 character set encoding.
For a list of common character sets, see http://www.eleves.ens.fr:8080/home/madore/computers/unicode/cstab.html.
For a list of character sets supported by Internet Explorer, version 5 and later, see http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/charsets/charset4.asp.
For more information on the supported encoding character sets in Java, see http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html.
The Content-Type HTTP header defines the character set that the response object is encoded. The format for this definition in the Content-Type header is charset=character_set. This header also defines the MIME type, but that is not relevant to this section.
The client's browser reads the charset of this header to determine the response's encoding so that it can render the page correctly on the client's screen. When a client receives a response with a charset, it must be able to decode this character set. The browser must support the character set and the system must have access to the proper fonts in order for the output to be rendered correctly.
The servlet API provides the following convenience methods that set the Content-Type HTTP header:
response.setContentType
response.setLocaleThis section describes these methods.
You can also use the following setHeader to explicitly change these HTTP headers:
In order for your response object to use the proper encoding, you must call getWriter after defining the content type. By default, PrintWriter uses the ISO-8859-1 character set.
To use multiple character sets on a single page, use the UTF-8 character set encoding.
The setContentType method sets the MIME type of the response as well as the character set of the HTTP Content-Type header. The charset indicates to the client browser what decoding algorithm to use to render the page in the client browser.
The following example sets the Content-Type header to the Japanese character set:
response.setContentType("text/html; charset=Shift_JIS");
PrintWriter out = response.getWriter();
The setLocale method sets the character set, the HTTP Content-Language header, and the HTTP Content-Type header of the response. The Content-Language header is often ignored by browsers. Using setLocale gives you more control over the language settings that using setContentType. It also requires you to provide a Locale object.
The following example defines the character set of the response object as Shift_JIS:
response.setContentType("text/html");
response.setLocale(new Locale("ja","")); //default for ja sets charset to Shift_JIS
PrintWriter out = response.getWriter();
You use the contentType attribute of the page directive to define character sets in JSPs, as the following example shows:
<%@ page contentType="text/html; charset=ShiftJIS" pageEncoding="Shift_JIS" %>
To use multiple character sets on a single page, use the UTF-8 character set encoding, as the following example shows:
<%@ page contentType="text/html; charset=utf-8" %>
If you write a JSP or servlet in a character set other than the one that you are compiling in, then you must take special care to ensure that the compiler transforms the class files into the proper encoding. This section describes how to change the encoding type when compiling your servlets and JSPs.
When you compile Java classes, add the -encoding option followed by the encoding type to set the encoding for the class. If you do not provide the -encoding option, Java compilers default to the system's current encoding. The following line shows using the encoding option to compile a servlet in Japanese encoding:
%> javac -encoding Shift_JIS -classpath c:/jrun4/lib/jrun.jar MyJapaneseServlet.java
The default encoding type is defined by the file.encoding system property. You can check your system properties using the getProperties method as the following code shows:
...
Properties properties = System.getProperties();
Enumeration keys = properties.propertyNames();
while(keys.hasMoreElements()) {
��String key = (String) keys.nextElement();
��System.out.println(key + " = " + properties.getProperty(key));
}
...
JSPs use the pageEncoding attribute of the page directive to determine what encoding type to use when compiling.
The following example encodes the page in the Japanese character set:
<% page contentType="text/html; charset=UTF-8" pageEncoding="EUC-JP" %>
The encoding of the page should match the type of encoding used when the page was saved in an editor.
RSS feed | Send me an e-mail when comments are added to this page | Comment Report
Current page: http://livedocs.adobe.com/jrun/4/Programmers_Guide/i10n4.htm
Comments
이걸 said on Jan 6, 2005 at 11:34 PM : No screen name said on May 9, 2005 at 4:53 PM :