|
HTML Tip:
Coding International Pages
by Larisa Thomason,
Senior Web Analyst,
NetMechanic, Inc.
The term World Wide Web isn't just a nice sounding alliteration. It's actually descriptive because approximately 60% of Web users are non-English speakers - and that number is growing quickly. Browser manufacturers understand that and provide non-English versions of their product. So do many large online portals and search sites. But what about your site? Is it ready for an international audience?
Unicode And Special Characters
English has a 26-letter alphabet, but that number doesn't include capital and lowercase letters and punctuation marks. Other languages use the same 26 letters - and a few extra in some cases - but add accent marks, tildes, etc. Then there are languages like Arabic and Hebrew: they use completely different alphabets and read from right to left instead of left to right. And how about Russian, Chinese, Japanese, and more?
That's where Unicode comes in. The Unicode Consortium released the Unicode character set in 1991 in an effort to create a global character set that could support all the world's languages. That's no small effort! Unicode is a 16-bit encoded character set, meaning that it is capable of supporting 65,536 different characters. Theoretically, each character in each language has its own unique code.
HTML 4 specifically adopted Unicode as its document character set. That means browsers use that character set to interpret and display characters that have special meaning in HTML (like quotation marks and brackets). Browsers also use Unicode specifications to display character entities like & for the ampersand sign.
With Unicode, you can include a combination of characters from different languages and feel sure they will display properly.
Selecting A Character Encoding
HTML and XHTML use a form of Unicode called UTF-8. It's backwards-compatible because it uses the same encoding scheme for ASCII characters as ASCII does. Older browsers that don't recognize Unicode are still able to display English characters and any special characters numbered 1 through 128.
However, UTF-8 may contribute to bloated file size if you're creating documents in Asian languages like Japanese and Chinese. Those files will be almost twice as large using UTF-8 encoding as they would with specific character set encoding.
Sometimes it is best to select a particular character set to display your page. The W3C Web site maintains a list of commonly used character sets.
You'll have to save your page using the proper encoding before you can declare the encoding for the browser. Check the Help menu of your favorite editor or word processing program for specific instructions.
Including A Character Encoding
Next, you'll have to specifically declare the encoding using a META tag in the HEAD section of your document. Make sure the encoding you declare in the META tag exactly matches the encoding you used to save the document! Otherwise, characters may not display like you expect and you page could be peppered with out of place question marks, square boxes, or other unusual characters.
You can also declare encoding with HTTP headers, but many low-cost Web hosting companies don't allow customers to modify header information.
Suppose you wanted to declare a Cryllic character set for your document. You'd use the ISO character set for the Cryllic alphabet like this:
<meta http-equiv="Content-Type"
content="text/html; charset="ISO-8859-5">
|
In this case, you could include both English and Russian characters on your Web page and be reasonably sure they'd display properly. If you fail to declare the character set, the browser will try to display the page by guessing what you intended.
The correct charset information creates both a better display and a better browsing experience for visitors. For instance, Explorer 6 for Windows displays a pop up box asking visitors to download the appropriate character set if the page uses characters not specifically declared by the encoding.
With all the computer virus and spyware issues now, people are reluctant to download anything from a Web page. They're more likely to just leave your page entirely.
If you're trying to create a top-notch, successful online business, it just doesn't make sense to create a site that's not user-friendly. Ideally, you want everyone in the world to visit your site and buy your products. But is your site really friendly to an international audience? Check your character set definition in your HTTP header or META tag to be sure!
|