A Primer for Understanding Asian-language Computing

by John Drake

In 1993, O'Reilly & Associates published a volume in its international programming series that quickly became an essential reference on the topic of developing or adapting software that handles Japanese text: Ken Lunde's Understanding Japanese Information Processing. That book is still useful, but half a decade represents multiple generations in the modern PC era. As Lunde himself observes with understated ease, "a lot has changed since then." Has it ever, and not just in hardware!

Internationalization and localization have become industry buzzwords. And it's no longer just localization for the lucrative Japanese market that has engaged the attention of the world's software developers. The vast, largely untapped potential of the Chinese- and Korean-language consumer and business markets represents a revenue pasture just as green as the stagnating Japanese market.

Now, Lunde gives us a second edition of his seminal achievement, the expanded and retitled Understanding CJKV Information Processing. The unwieldy "CJKV" acronym of the title stands for "Chinese, Japanese, Korean, and Vietnamese" - essentially, the main Asian languages that utilize double-byte character sets instead of the single-byte character sets used by the major Western languages.

This new edition isn't just an addition to the first; besides covering three new languages (or four, if one considers the differences between hanzi in mainland China and hanzi in Taiwan), Lunde has completely revised the Japanese portions of the text as well. Understanding CJKV Information Processing covers the multiple, largely incompatible encoding systems currently in use, the differing character sets and strategies for conversion between them, locale-dependent input methods, Unicode support, and much more.

The prepublication review copy that I received through Computing Japan in January is over 950 pages - with some parts of it yet to be written. Little wonder, then, that O'Reilly has decided to split this unwieldy tome into two separate volumes. And, considering the general mindset of the book publishing industry, it's a clear indication of the great value of the content that O'Reilly has decided to go that route instead of simply asking Lunde to trim the text to a more manageable size.

Volume 1, which will likely bear the name Understanding CJKV Information Processing, will contain 13 chapters plus 8 appendices (some 600 pages). Volume 2, unnamed as of this writing (possibly CJKV Reference Supplement), will comprise some brief introductory material and 16 appendices (about 350 pages), including code and notation conversion tables, character lists, and character set tables.

Just to give an inkling of the scope of Lunde's effort, Volume 1 contains platform-independent discussions of the writing systems and character sets of each language, input and output methods, encoding methods, code conversion techniques, font formats, keyboard arrays, algorithms, useful data processing tools, e-mail, programming, and more. The range of topics is broad, and each is dealt with comprehensively and in-depth.

If you deal in any way with the nitty-gritty of Asian-language text processing, you'll kick yourself later if you don't go out and buy these two volumes just as soon as they hit local bookstore shelves. And even a typical end user with some curiosity about the challenges of creating multilingual software should find Volume 1 both entertaining and informative. (The last three chapters on Dictionaries and Dictionary Software, The Internet, and The World Wide Web are especially packed with valuable "I-can-use-that" information and contact lists.)

The first edition was called by one reviewer "the bible of those using Japanese on the Internet." To adopt and extend the metaphor, this revised and expanded second edition brings us "the new testament."

Ken Lunde. Understanding CJKV Information Processing, 2nd edition. O'Reilly and Associates, Inc.
Scheduled for April 1998 publication; ISBN 1-56592-224-7.


Back to the table of contents