The Japanese Language Meets the Net.

by William Hall

Develop English as the second language is one of the controversial recommendations a blue ribbon advisory panel made in a recent report to Japan's prime minister. Unless all Japanese acquire a working knowledge of English, the international lingua franca, Japan's future as a leading industrial nation is at stake. Pretty radical stuff.

At the same time, the results of a nationwide poll published in the January 10, 2000, edition of Yomiuri Shimbun, Japan's largest-circulation newspaper, show that more than 80% of Japanese adults consider that the Japanese language is in disarray (midarete iru).

A major debate about language policy and direction is under way in Japan, one that affects the fundamental fabric of society. As well as its societal implications, the end result will have significant impact on the growth of the Internet and on virtually all business in and with Japan. But, despite its importance, for the non-reader of Japanese, the substance and tenor of this debate are about as arcane as the chanting of a shaman in the Gobi desert.

This month I'll attempt to shed some light on this critical topic for non-readers of Japanese. In the usual manner, a published Japanese-language study will be utilized to illustrate certain points. But, for this to make sense to the non-reader of Japanese, it is necessary to first provide a brief overview on the Japanese language and its history. This will entail squeezing several thousand years of history as well as thousands of learned tomes on linguistics and the Japanese language into a couple of pages. A difficult task, and one that is destined to draw flak from specialists in various disciplines. But, in the interests of international understanding, let me venture forth.

Modern Japanese has not one but three writing systems. One is based on Chinese ideographs and is known as kanji in Japanese. The other two scripts, known as hiragana and katakana, are syllabic, with each script containing 46 basic symbols. Since kanji was the first of the writing systems and is the most complex, let us turn our attention initially there.

The Japanese language is a member of a group of languages known as Ural-Altaic. These stretch in a swathe across the northern arc of Euro-Asia, and include Korean, the various "-stans" (Uzbekistan, Turkmenistan, et cetera) as well as, interestingly enough, Finnish, the one language in Scandinavia that doesn't fit with the other Scandinavian languages.

Japanese is a polysyllabic language, meaning that most words in the original language are made up of more than one syllable. It was a spoken language with no written form until somewhere around the 6th century AD, when Chinese ideographs (usually known as "characters" in English) were brought across to Japan and used to write Japanese.

So how do you superimpose an ideographic writing system on a polysyllabic language? There are basically two ways -- you can bring across an ideograph for its meaning and have it represent a number of syllables, or you can bring across a number of ideographs for their sound and then have the sound of that combination of ideographs represent a word.

The Japanese, being Japanese, of course did both, which is just the beginning of our problems. As an example of the first case, let us look at the word for mountain, a two-syllable word in Japanese. The ideograph in Chinese is ŽR, pronounced "shan," while the Japanese word is yama (pronounced "yah-mah"). The ideograph was brought over to Japan and used for its meaning as the written form of mountain in Japanese. At around the same time, the Chinese pronunciation "shan" also came across to Japan, and was pronounced "san" in Japanese. Thus, there were now two ways in Japanese to read the ideograph for mountain -- "san" and "yama."

An example of the second case is the word sushi. This is often written with the two Chinese ideographs ŽõŽi, which were chosen primarily for their sound and not for their meaning. The first is pronounced "su" and is an ideograph meaning long life, while the second ideograph is pronounced "shi" and is an ideograph meaning to govern. Thus, there is no direct relationship between the meanings of the ideographs and the word sushi.

Unfortunately, the complications don't end there. Sometimes two or more Japanese-style readings were applied to one Chinese ideograph, and, not to be outdone, sometimes two or more Chinese-style readings were also applied to one Chinese ideograph. To further complicate matters, a very large percentage of modern Japanese vocabulary is made up of compounds (combinations of two or more ideographs), leading to a mind-boggling number of possible readings.

Let us take an example of the combination of two very simple ideographs: ã, which is a horizontal line with two smaller stick lines above the horizontal line, and ‰º, which is a horizontal line with the same two smaller stick lines below the horizontal line. As might be surmised, ã is used in words which have meanings of above, upper, superior, and so on, while ‰º has the opposite meanings. The combination of these two ideographs can be read in modern Japanese in at least six different ways (joge, shoka, kamishimo, ageoroshi, agesage, noborikudari), with each having slightly different meanings.

William Hall ( is president of the RBC Group, which provides market research and consulting services to foreign clients in Tokyo.

An educated adult in modern Japan is normally able to read, without the aid of a dictionary, both the Chinese and Japanese readings of some 2,500 to 3,000 individual characters and some 20,000 to 30,000 compounds. A Japanese character dictionary approximately equivalent to Webster's New World Dictionary or the Concise Oxford English Dictionary contains about 5,000 to 6,000 characters, each with Chinese and Japanese readings, plus some 70,000 compounds. It is no wonder therefore that Japanese students are still learning new kanji in the final year of high school, and that 22% of Japanese students attend a juku after school to improve their Japanese (see Japan Studies, March 2000).

The above examples have merely scratched the surface of the complexities involved in kanji. Other complications include the existence of two or more variations in the way to write many of the kanji, the almost limitless number of ways to read characters in personal and place names, and the existence of kanji created in Japan that don't exist in China.

But in the interests of time and space let us move on to the syllabic scripts, hiragana and katakana. All Japanese was originally written in kanji when writing was introduced to Japan, but, over time, hiragana and katakana developed as a form of shorthand style for writing kanji. Although hiragana and katakana cover the same 46 syllables, hiragana --‚Ђ炪‚È--is a rounded, flowing type of script, while katakana --ƒJƒ^ƒJƒi--is a more square, boxy-looking one.

With few exceptions, women were not educated in Chinese writing in the bad old days some 1,200 hundred years ago, and that small group of women able to write usually did so in hiragana. Indeed, one of the world's great novels, The Tale of Genji, was written by a woman primarily in hiragana in the 10th century, preceding, by some 700 years, Fielding's Tom Jones, which is often considered to be the first English-language novel.

Katakana was the syllabic script preferred for official documents, and this differentiation continued up until recently. In modern Japanese, katakana is now used primarily for foreign words, of which there is a rapidly increasing number. Hiragana is used for those elements not written in kanji or katakana.

The above discourse is but a cursory overview of the complexities of a language that the first Christian missionaries to Japan in the 16th century described as the "devil's language." If nothing else, however, it should provide, for those unfamiliar with the Japanese language, an understanding of the enormity of the achievement of the Japanese educational system, with claims literacy rates of around 98%. Contrast that to the current literacy problems in the US, where there are only 26 letters in the alphabet to contend with!

Let us now fast forward to the modern world of computers and the New Economy. For a language like Japanese, there are a number of obstacles to overcome in order for it to keep up in the warp speed of the Net economy. First and foremost, how does one get all those kanji onto a keyboard, and from there onto the computer screen, without creating a keyboard that is 100 yards long and 50 yards wide?

The initial approach was to use one of the syllabic scripts to type in sounds corresponding to the pronunciation of a kanji. More recently, the standard QWERTY keyboard is used, and sounds are typed in romaji, the method of writing Japanese in Roman characters. Thus, Japan now has a fourth writing system to go with kanji, hiragana, and katakana.

When the sounds are typed in, all kanji having a similar sound to that typed in will appear on the screen. And the typist will then choose, from among those kanji with similar pronunciations appearing on the screen, the specific one he/she is looking for.

In contrast to Chinese, where tones are used to distinguish the monosyllabic characters that have a similar reading, kanji readings have very few sound variations. This has led to the birth of a large number of homophones (similar sounding words) in Japanese. Thus, for the sound koh-ka, there are at least 18 possible combinations of characters that can appear on the screen.

This approach of laboriously picking out the desired kanji from the offerings on the screen is still too slow for the modern world, so more recent software programs show characters in order of usage frequency, or select characters based on context. Work is under way also to develop programs that learn the kanji and writing style favored by the typist.

Another issue for an ideographic script in the modern age is how to incorporate into the Japanese language new words and concepts that did not exist thousands of years ago when the Chinese writing system developed. For example, automobiles, cameras, or computers.

In the pre-World War II period, there was a heavy emphasis by the government on trying to Japanify things brought in from the West, perhaps best typified by the slogan wakon yosai (keeping the Japanese spirit while learning from the West). Thus, new compounds were invented for almost all new products and ideas coming from overseas. And so we have the word camera written with the combination of three characters: ŽÊ^‹@ (copy, reality, and machine ), which is a pretty good description of a camera. (The more perspicacious of our readers might claim that there were no machines around thousands of years ago, and in fact the character for machine was originally used for a weaving loom. But in modern Japanese it has increasingly been used in a suffix-like form at the end of a compound with the meaning of machine. Thus, airplane is ”òs‹@, or fly, go, machine).

Particularly since the end of the war, there has been a torrent of new words, primarily from English, coming into Japanese, and, increasingly, these have been brought into the language via katakana rather than by creating a new compound for the new word from existing characters. Which brings us to the very important topic of the ability of the Japanese populace to cope with this tidal wave of English and other foreign language imports.

Japan now has the fastest-aging population in the world and has a decreasing percentage of young people in its population. (See Japan Studies, February 2000 - *This is a PDF file). Those persons who are 50 years and over now make up almost 40% of the total population, but they have had limited opportunities to learn English. In fact, those educated before and during World War II were for all intents and purposes forbidden to learn English.

How are they coping? A study published in April 1999 by the Japanese Government Agency for Cultural Affairs provides some fascinating insights on this topic. The study ("Opinion Poll Regarding the Japanese Language") was a nationwide random sample of a total of 2,200 Japanese males and females aged 16 and over.

Japanese government agencies are increasingly having to make choices about which words to use in official documents and announcements. This is particularly the case for concepts or products new to Japan: informed consent for clinical trials for new drugs, barrier free for ease of access for elderly/handicapped persons in public buildings, and so on.

In the current study, respondents were shown a card with eight such words on it, with each word being written in both a kanji version and a katakana equivalent. Respondents were then asked whether the meaning of the word was easier to understand from the kanji version or from the katakana version. Overall, for six of the eight words, the kanji version was considered to be easier to understand, with the other two words having a slight majority in favor of the katakana version. The results of the preferences are shown in Table 1, exclusive of that percentage of respondents who stated that the kanji and katakana versions were about the same in terms of ease of understanding.

What is of particular note, however, is the significant differences in ease of understanding by age groups for those words where katakana scored relatively high. The data in Table 2 for males, for example, shows the big difference between the young and the old in the understanding of katakana. The differences are even more pronounced for females.

Another question covered whether respondents remembered ever having seen a particular katakana word and whether they understood the actual meaning of it. Examples of katakana words that respondents were shown included kajuaru (casual), intahnetto (Internet), and onbuzuman (ombudsman).

For the total sample, recall of having seen the word casual in katakana is at 93%, with understanding of the meaning at 74%. Recognition of Internet is very high at 96%, but understanding of the word drops to 73%. And ombudsman has a recognition level of only 66% and understanding of only 38%, perhaps not surprising considering that the word is Scandinavian in origin.

Once again, there are big differences by age, with there being a particularly large drop-off in understanding for those aged 60 years and over. For the word Internet, for example, for the 60 years and over group, male recognition is at 92% and female recognition is at 84% respectively, but understanding drops down to only 57% for males and 43% for females. In contrast, for the under 30s, recognition is 98 to 100% for males and females, and understanding at 85 to 89%.

The above cases of katakana are examples of simple Japanization of the pronunciation of foreign words brought into the language. But when the imported word or concept is rather lengthy, it tends to get abbreviated, or an anagram is formed from two imported words. For example, word processor becomes an anagram pronounced "wah-pu-roh" (the "wah" being the first part of the Japanized pronunciation of word ("wah-do"), and "pu-roh" being the pro in processor).

A recent example of this anagram type, sekuhara, is used for an issue that has become topical of late in Japan -- sexual harassment. Risutora is used as an abbreviation for restructuring, while rihabiri is used for rehabilitation, a critical word for an aging population. However, once again the study shows significantly lower scores among the older generation even for the word rihabiri.

So where does this leave us? The older generation, like that in most countries, has difficulty in adapting to computers, but has the additional handicap of having to learn to type in romaji and absorb a flood of new katakana words. This could lead to a significant gulf between the generations, and slow down the inroads made by the Net, at least a Net that primarily utilizes a computer and a keyboard. Look, therefore, for rapid advances in voice recognition technology.

There is also the possibility of a backlash from some sectors of society. Already there have been petitions from writers and scholars to the government about attempts to impose a single standard for kanji in computers. Windows 98 reportedly recognizes only some 20,000 kanji/ kanji variants. This is considered insufficient to represent characters used in historical documents, place names, and even some modern literature.

Thus, there is a concern about an inability to search for and read literature on the computer, and the possible loss of cultural heritage. An emotional topic. This is a qualitatively different issue than the French government trying to keep English words out of the French language. At least in that case the form of writing is not lost. Fortunately, a new operating system called Cho Kanji was released in late 1999 that is capable of handling some 40,000 characters, plus an additional 90,000 variants thereof. This is based on the TRON operating system developed by Professor Ken Sakamura at the University of Tokyo. And thus, we may have peace on the kanji front. But the katakana influx is likely to increase.

