Displaying International Fonts and Characters on the Web

How do you display web content in languages other than English?

The quick answer is to use unicode for the web. Unicode is a standard that includes character representations encompassing all living languages. However, there are additional factors to consider, as discussed below.

NOTE: When using unicode, appropriate fonts have to be available on each viewer's computer. Otherwise, the unicode characters won't be rendered. (Test which unicode characters are available on your system – unknown characters show up as rectangles or squares rather than characters. )

However, modern operating systems include a large assortment of fonts, which cover a range of languages and character sets. For all of the examples below (except Hindi and some Chinese characters, see notes), the correct characters appeared without special effort.

Using Unicode

Dreamweaver CS3 automatically adds the correct unicode specifications in the HTML document head:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Also, for any language used, you need to set a "lang" attribute to the appropriate country code for the page or for individual sections of the page. For example, a page primarily in English would have a language declaration like this:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

Examples

These examples are based on a website, StandingWomen.org, that was built last year by a colleague. The site invited people around the world – in their own languages – to participate in local events to support global human rights awareness. (Rather than attempt the complexities of encoding and testing a large number of languages she doesn't speak, my colleague opted to display the messages in PDF format to hold more control over the fonts and formatting.) The site's message provides parallel translations in a wide range of languages on which to test unicode encoding.

English: lang="en"

  • Original text
  • Sample HTML text:
    • Please stand with us for five minutes of silence at 1 p.m. your local time on May 13, 2007, in your local park, school yard, gathering place, or any place you deem appropriate, to signify your agreement with the statement below.

French: lang="fr"

  • Original PDF
  • HTML test
  • Sample HTML text:
    • Nous vous invitons à vous tenir debout avec nous pour cinq minutes de silence à 13h (à votre heure locale) le 13 mai 2007, dans votre jardin public, votre cour d’école, votre endroit de rassemblement, ou tout autre endroit vous semblant approprié, pour signifier votre accord avec la déclaration ci-dessous.

German: lang="de"

  • Original PDF
  • HTML test
  • Sample HTML text:
    • Bitte stehen Sie mit uns gemeinsam am 13. Mai 2007 um 13.00 Uhr Ortszeit für fünf Minuten des Schweigens in Ihrem öffentlichen Park, Schulhof, Gemeindeplatz oder jedem anderen Ort, den sie für angemessen halten, um Ihrer Zustimmung zur unten zitierten Erklärung Ausdruck zu verleihen.

Russian: lang="ru"

  • Original PDF
  • HTML test
  • Sample HTML text:
    • Мы обращаемся к вам с призывом присоединиться к нам и провести пять минут молчания. Эта акция пройдёт 13-ого мая 2007-ого года в час дня по местному времени в местном парке, школьном дворе или в любом другом месте, которое удобно для вас.

Arabic: lang="ar"

    • ا اذهب انعم اوآراشُت , سمخ ةدمل انعم فوقولا ءاجرلاف يف تمص قئاقدلا ةعاسلا ةدحاوقوتب ِرهظلا دعب خيراتب يلحملا مكتي13 رايأ 2007 ,ةيلحملا ةماعلا ةقيدحلاب ,ةسردملا ةعاس , ناكم يا وأ عيمجتلا نآامأعت رخآتلاا اذهل نكتقفاوم نع اوريبعتل ًابسانم هنوربنلاع.
    • Correct directionality (image of original PDF):
      correct directionality of Arabic sample

Hebrew: lang="he"

    • Original PDF
    • HTML test
    • Sample HTML text:
      • בואו לעמוד אתנו לחמש דקות של דממה ב 1:00 בצהריים – זמן מקומי ב 13 למאי, 2007 , בפארק המקומי, בחצר בית הספר, במקום כינוס ובכול מקום אחר שנראה לכן מתאים, כדי להראות את הסכמתכן עם ההצהרה הכתובה בהמשך.

NOTE: Some languages can be read from right to left. Both Hebrew and Arabic are examples. For these, the "dir" attribute has to be set at follows: dir="rtl"

However, in this test, the Arabic PDF original seems to have been improperly encoded for right-to-left display. When I apply right-to-left directionality in the HTML, the result does not match the appearance of the original text. The Hebrew text does not produce this kind of error when directionality is applied. (My guess is that instead of encoding directionality in the original Arabic document, the text was typed in reverse-order and aligned right to give the correct appearance.)

Chinese: lang="cn"

    • Original PDF
    • HTML test
    • Sample HTML text:
      • 请在2007年5月13日当地时间下午1点钟,和我们站在一起,静默五分钟, 在您当地的公园,学校,集会地点,或者你认为适当的任何地点, 表示你同意如下声明.

NOTE: In Internet Explorer, a number of the Chinese characters required in this sample may not be available. Sites such as Yahoo! China prompt users to download required fonts or language packs before entering.

Japanese: lang="jp"

    • Original PDF
    • HTML test
    • Sample HTML text:
      • 2007年5月13日の午後一時に、我々と一緒に五分間立ちましょう。

Hindi: lang="hi"

    • Original PDF
    • HTML test (flawed)
    • Sample HTML text:
      • Flawed sample:
        kRxpayaa duinayaa kxao bacaanao maoM hmaara saaqa dIijae| yaid Aapa inamnailaiKata inavaodna sao sahmata hOM taao taarIKa 13 ma[- 2007 kxao Aapakox samaya kox Anausaar daopahr 1 bajao, isaf-x 5 imanaT kox ilae ikxsaI ekx jagah par KaDo rihe| ikxsaI paak-x, skxUla kox maOdana yaa ikxsaI KaulaI jagah par, jaha^M Aapa {icata samaJaoM |
      • Alternate sample Hindi text (may not work in Firefox):
        आज सिरहाने

NOTE: I was not able to get an accurate version of the Hindi original when copying-and-pasting the source text for the test. However, various Hindi-language websites work in the Mac Safari browser and Internet Explorer and, with setting changes, can apparently be made to work in Firefox (details). I suspect the original PDF source is not using a unicode-friendly font, but it could probably be converted to unicode1 if the source format were known.

Additional information:

1 Before unicode was widely supported, people encoded languages and special characters for the web in any way they could. When unicode caught on, there was clean up to do. Devoted researchers and hobbyists developed conversion tools for specific languages to be "translated" from non-standard formats into unicode. For example:
Greek Font to Unicode Converter: http://www.jiffycomp.com/smr/unicode-converter/

Variation: Unicode by HTML Entity

In some situations, unicode characters are represented using their numeric reference. This method makes the correct character appear on screen, but take up extra file space because every single unicode character is composed of the bytes to represent the digits, plus the bytes for the required HTML entity syntax characters (&, #, and ;).

Example

Chinese character entity codes:

&#26597;&#25214;&#22312;&#32447;&#25903;&#25345;&#20449;&#24687;

which, when rendered, look like:

查找在线支持信息

Additional information:

Other Methods

Inserted Images

In cases where only a few characters or words need to be displayed and font support is unlikely, images can be inserted, as long as they include appropriate "alt" attributes for accessibility and searching.

NOTE: Wikipedia uses a sophisticated versions of this technique for hieroglyphics. (Although hieroglyphics could be represented using unicode, "browser support for this is likely to be near non-existent.")

Details:

CSS Image Replacement

Web developers often use CSS to hide HTML-based headings and replace them with image versions. (See almost any design on CSS Zen Garden for an example.) This technique allows for accessibility and search engine indexing, while given maximum flexibility in font choice.

This is not a good solution for large amounts of text, as image text doesn't resize and large images can slow page loading. It also makes within-page searching difficult. (Remember, the text that is replaced by an image must still be in the appropriate unicode language format to be accessible and searchable.) However, for special presentations, like transcriptions of historic documents and fancy headings, it is a viable option.

Additional information:

Embedding Fonts

It is possible to "embed" fonts in a web page or CSS file. (However, Firefox 2 doesn't support this natively and requires a plug-in.) Embedding, in which font specifications are downloaded along with page content, has the obvious down sides of creating download speed issues for viewers and taking a toll on server resources.

NOTE: With transferring fonts to viewers, there's also a question of copyright. See "Typographica: Embedded Web Fonts Return. Uh-oh." for a discussion.

Given that up-to-date resources are scarce, it seems that improvements in unicode support and international font availability have made embedding fonts less common.

Additional information: