technically speaking

Will Unicode Kill Japanese Kanji?

by Steven Myers

Reports in the Japanese media, and my own discussions with Japanese researchers and engineers, make it clear that the Unicode character encoding standard is not very popular right now among Japan's computer users. (See this month's "Help Desk," page 9, for a primer on the basics of Unicode.) Much of the dissatisfaction, however, centers on a misunderstanding of the "Han unification" method used by the Unicode Consortium to eliminate redundant characters from the large pool of Chinese, Japanese, and Korean ideographs. While much criticism has been leveled at members of the Unicode Consortium (Microsoft, in particular) for including only 20,902 kanji in the Unicode code space, the complaints are largely unjustified.

A perceived cultural invasion

A recent article in the Japanese weekly magazine Aera quotes Ken Sakamura, an information science professor at Tokyo University, as proclaiming that "Japanese kanji are in danger of disappearing from the world's computers." Sakamura complains that Unicode's 20,902 kanji don't even come close to the 50,000 found in a Japanese kanji dictionary. This limited set of kanji is insufficient to cover the names of Japanese people and places, he argues.

"It is a cultural invasion," complains Takeshi Tamura, a Tokyo University professor of French literature, in the same article. Tamura charges that the Unicode developers went through the list of Japanese, Chinese, and Korean kanji, indiscriminately reducing similar-looking characters to a single character. "Kanji can't simply be lumped together," wails Tamura. "They must be separated into Japanese kanji and Chinese kanji. The importance of doing this is beyond the understanding of people belonging to an alphabet culture."

The Aera article (which is representative of the view expressed by many other Japanese writers regarding Unicode) basically puts forth the opinion that a group of American vendors, led by Microsoft, has interfered with ISO (International Standards Organization) matters and pressured other countries to accept Unicode, even though the standard has insufficient support for Japanese. This argument generally attributes Unicode's poor Japanese support to the fact that it was developed by American vendors seemingly intent on dictating which Japanese characters can and cannot be used.

Bashing a strawman

That type of article, which presents a one-sided view of the situation, seems calculated to turn public opinion against Microsoft and other American computer companies. First, the charges make it sound as though decisions about which kanji to include were made entirely by American companies, neglecting to mention that the work was in fact carried out by a group of ISO experts from many countries, including China, Hong Kong, Korea, Taiwan, Vietnam, and -- yes -- Japan.

Second, in performing the "Han unification," the Unicode developers were not trying to eliminate differences between the appearance of Japanese and Chinese kanji, as the Aera article suggests; the information concerning these differences is maintained in the fonts. The intent, rather, was simply to conserve coding space by removing these differences from the character encoding level. Four separate-but-similar characters may share a single code point in Unicode, but that character can still be correctly displayed in Japanese, Chinese, Korean, or Taiwanese simply by using an appropriate font.

Finally, the Aera article makes it sound as though Unicode is complete and set in stone; in fact, new characters are being added all the time. The article fails to mention that nearly 30,000 of the 65,536 code points are as yet unfilled; the 20,902 kanji currently included are just a start. Also, the Consortium has reserved code points with which to implement a special class of 32-bit characters, allowing for further extension.

The advantages of Unicode

Significantly, most Japanese publications say little about the benefits of Unicode. Scant mention is made of the fact that Unicode allows easy exchange of data over a mixed environment, such as a network containing both English and Japanese Windows workstations. Unicode also does away with the code page model of language support, and all the inherent difficulties in converting characters between different code pages. Furthermore, it manages all of this using a uniform 16-bit code.

Of course, as with almost any "international" standard, it is impossible to please everyone. There has been research into alternative encoding methods at several Japanese universities, but the fact remains that no viable alternatives to Unicode as yet exist -- and the need for an international character set is only increasing. Many corporate PC users (both Japanese and foreign) who work in international environments need a means of exchanging data that they can use today. Unicode may not be perfect, but it is certainly a step in the right direction.

Fears that the implementation of Unicode will result in the loss of Japanese kanji are unfounded. History has shown that each new encoding method tends to increase, rather than decrease, the number of usable kanji -- and Unicode is no exception.

While proponents of Unicode extol the virtues of a universal encoding method, some Japanese critics see it as yet another instance of foreign (American) interference in Japanese affairs. Legitimate grievances concerning the handling of Japanese in Unicode may exist, but it is difficult to rationally discuss such problems as long as the standard's critics insist on using alarmist rhetoric and an "us against them" stance in presenting their case.ç




(c) Copyright 1996 by Computing Japan magazine