Localization



Localization: it's not exciting, but it can be profitable. The art of converting software and manuals from English to Japanese enabled foreign companies to sell over Y70 billion ($640 million) in 1993, a 33% share of the total Japanese PC package softw are market (worth Y211 billion, or $2 billion). What is exciting for foreign software developers is the fact that the market is expanding by some 10% per year. IDC (International Data corporation) estimates that in 1994/95, the Japanese software market wi ll climb to more than Y235 billion ($2.24 billion).

The bad old days

Before the arrival of the 68030 Macintoshes and System 6.0.7 in the late 1980s, and later the X/Window System and Microsoft Windows 3.15, the effort of rewriting an English language program for the Japanese market was a task for the very few with deep pockets and strong commitment: companies such as Microsoft, Lotus, and IBM. As an example of the commitment and resources required, Lotus spent three full years and Y300 million ($1.25 million at that time) localizing and refining the first version of Lot us 1-2-3 (released in 1988).

For the PC market, the biggest problem in localizing for Japan was that DOS got off to a fractured start, with more than a half-dozen major hardware platforms. For a long time, DOS was the only small system platform with a large enough user base to s upport the volumes of software necessary to justify the costs of localization. The difficulty of porting software effectively discouraged all but the most enthusiastic, giving the Japanese market the reputation of being just "too hard."

In the bad old days of DOS, through the late eighties, a company had to know and understand intimately the hardware and operating system (OS) idiosyncrasies of the target machine, usually the NEC 9800 series, which then had over 70% of the Japanese m arket. NEC, however, was infamous for not releasing information about its machines' architecture and OS differences. Companies such as technical publisher Softbank made good profits by creating a library of "hacker" publications that revealed how the NEC PC really worked.

NEC's hoarding of technical information and a lack of competent PC engineers who understood the machine were major obstacles for a foreign company trying to port an IBM AT-type program to the NEC 9800. The problems ranged from unique floppy disk form ats to different logical screen sizes, nonstandard use of video and DOS high memory to anomalous keyboard handling. It was not unusual for a company to invest 12 months or more of intensive technical effort to bring out a Japanese version.

A change in the air

The first hints of change came when Apple beefed up its Macintosh range to 68030 CPUs in July 1989. With this move came not only the ability to load large kanji dictionaries into RAM, but also the processing power to make real applications such as desk -top publishing (DTP) software work in Japanese.

The X/Window System X11R5 for UNIX was introduced in the early 1990s. With it came the many utilities produced by dedicated academics worldwide to support kanji: kterm (virtual terminal), kinput2 (front-end processor), Wnn and Canna (kanji dictionaries ), nkf (printer driver), and xjdic (online Japanese dictionary). The X/Window System -- and its derivatives Open Windows, Motif, etc. -- is now the major environment of choice for UNIX.

Probably the decisive event in the bridging of East and West in terms of computing technology, though, was the introduction of Microsoft's Windows 3.1J in May 1993. Windows sales have experienced dizzying growth and now account for 37% of new sales (ve rsus 54% for DOS). This is up from only a 10% share lust one year earlier.

Great expectations

Expectations in 1994 have changed dramatically. The simultaneous release of a product in Japan and the US is now the goal of most major foreign software publishers. Even in the smaller firms, any delay in a product's Japanese release is usually due to the preparation of documentation or the inclusion of special culture-specific features not available in the English language version.

What has brought about the change is the advent of common OS environments that hide the hardware differences between Japanese and non Japanese machines from the programmer. As Huw Rogers of Fusion Systems Japan comments about UNIX X11RS-compatible appl ications, "If you write an application well so that it is portable, you can write that application in English, Japanese, whatever. then retarget it for another language on a Sun, HP, SGI, DEC, or [other system]." This can be done without significant chang es to the actual code itself and, best of all, it doesn't cost the Earth.

Windows programmers will be happy to know that if a Windows application has been developed for native Windows 3.1, then the application will in principle run on both NEC and DOS/V Windows 3.1 platforms. (Applications that need to access the BIOS, such as Norton Utilities, are obviously still machine dependent, however) Windows 3.15 has solved a huge number of problems, and because of this the most pressing issue is now the quality of the local version.

Still room for mistakes

The Japanese package software market has the potential to be a lucrative one for foreign publishers. With a user population of over 15 million computers in Japan, consumers bought only Y211 billion of packaged software in 1993. (The low figure is attri butable to user preference for running only one or two packages on their PCs, and the tendency of Japanese corporations to buy customized software.) Analysts are bullish about packaged software sales for 1994, however, especially with the surge of Macinto sh sales over the last two quarters. IDC predicts growth in the packaged software industry of about 11.3% for 1994.

Because of traditionally high setup costs coupled with distribution and marketing difficulties, foreign software firms would commonly team up with local software distributors and re-publishers to get their product into the market. In a lot of cases, lo calization was paid for by the Japanese partner who in return received a royalty for the sale.

This is changing as the size of the market grows, however, with foreign software companies seeking to control their own profits. Many are now opting to be in Japan themselves rather than have products localized on their behalf. While removal of the mai n technical obstacles may have made localization easier, it doesn't save the rushed businessman from making some basic mistakes. In particular, planning is key to every localization project. The general consensus is that the sheer volume of translation wo rk and the tight deadlines in a typical software localization project offer great scope for screw-ups. Therefore, planning should start as early as practical -- even before the software is produced.

In the past, without a well defined internationalization and localization plan, software developers have had headaches with how to implement double-byte codes in their products. For UNIX in particular, double-byte enabling of programs took a tremendo us amount of time. With the big growth here in Asia, however, global software companies are palling more attention to double-byte handling, and today internationalization and localization guides are available from Sun, Apple, TBM, DEC, Microsoft and other leading software companies.

The issue of quality

Localization was once so closely tied to technical expertise that most companies had to do their own in-house translations. "It was really an issue not of whether the product was of good quality, but of whether you could even have the product in the lo cal language and for the local platform at all," says Berlitz's Michael Shannon. "However, quality is now becoming an issue.... Since most of the technical issues have gone away, thanks to the unified interfaces of Windows, UNIX, and the Mac, the look and feel of a product has become the main focus."

Although the Japanese consumer has always insisted on quality, companies could often get away with quick and dirty fixes when the Japanese software market was still in its infancy. The dramatic expansion in the size of the Windows' share, however, has seen customer expectations increase at a similar rate. This has brought in numerous third-party localization companies -- both software publishers who now find it easier to do someone else's titles and documentation translation companies that are discover ing the technical know-how and confidence to recompile a program file after translating the text strings it contains.

Quality and attention to detail are vital ingredients to the success of a software company in Japan. One well known story recounts how Lotus added an option to silence the beeps normally output by its 1-2-3 package when an error is made, because in a c rowded Japanese office the beeping would cause embarrassment by broadcasting the mistakes to coworkers. Other examples include the localization of actual examples or tutorials (such as changing a help example featuring Mr. Mike Smith to one focusing on Ta naka Tare-san).

Overview of a typical project

Programs slated for "Japanization" range from multi-megabyte databases and full-motion adventure games to tightly coded utilities of only a few kilobytes. To identify industry expectations and help orient the newcomer. it may be instructive to discuss a "typical" localization project -- in this case, a make-believe Windows product that produces flowcharts. A product like this might contain program files having from 5K to 10K of text, including menus, dialog boxes, and error messages. Further, it would probably have Help files with about 30K of text and a manual of 100 pages.

Whether the localization project is for a UNIX, DOS/V, or Mac platform, the basic methodology consists of the same four basic steps:

First is the planning and consultation phase, which should establish expectations and responsibilities between the customer and the localization team. Especially when two cultures are involved, it is incumbent upon the localization team project manag er to emphasize the need for quick interaction. Software is a creative undertaking, so the customer will want close involvement with how the product looks; this makes the timeline totally dependent on receiving feedback and approval from the customer at e ach stage. Blow-outs because the client holds the purse strings but didn't meet their own deadlines are a sure way to wreck a prospering localization business. During this step, an experienced localization team can give pointers to the customer concerning country-specific changes: things such as yen values: Heisei dates, Japanese text wrapping: and handling of vertical text.

The second step consists of building a glossary of terms to ensure that the vocabulary used remains consistent throughout both the software and the manuals. The importance of this step becomes apparent when a large project with as many as 20 manuals being produced by five or more translators. (A proper level of care taken at this stage will bear even more importance when future software versions are taken into account.) Well-organized companies will already have a glossary in English, in which case t he project manager can simply send the text out to a competent translation company. Developers working in the Microsoft Windows environment car also make use of the Microsoft Windows software developer's kit (SDK), and Apple specialists can use Apple's Ma cintosh developer's guide (both available in English and Japanese).

The third step is the actual translation of the text strings in the program and help files. This involves scanning the compiled application, converting the strings to the target language, then recompiling with a version of software native to the targ et machine. The company doing the translation is usually expected to receive the software in compiled form, do the translation, then return the material also in compiled form. Many companies provide their own tool kits for this. The three maior requiremen ts of the translators are: accuracy, tailoring screens so that the Japanese text fits in the space set aside for it, and making the translation appropriate for the target audience. "Language is subjective," Shannon notes. "You might get a senior person in the company who may be more comfortable with text [havingl a lot of kanii. However, when we are doing Windows products, we like to target the actual users, typically younger people who want to see a style which is more refreshing."

That brings us to step number four, testing the compiled software to make sure that the text alterations did not damage the code, and that the replacement Japanese text string lengths do not exceed the display buffer sizes. This step is usually done by the software publisher.

Scheduling

A newcomer to localization stands little chance of getting a workable breakdown on times and costs from general sources. Looking at our hypothetical flowcharting program, the scheduling for a project based on current industry expectations might be as f ollows:

1. Planning/Kick-Off Meeting Before starting the localization process, the organizational aspects must all be prepared. First, a project manager must be chosen, who will then help set up the localization team of translators, engineers, and publication staff. Together with the client, this team must draw up a viable set of guidelines defining the project and the client's expectations. To smooth communication between the client and team, a means of communication and technical contacts should be establish ed.

2.Glossary/Terminology Development in many cases, English glossaries will be available for the actual product. If not, then one must be compiled and translated. This involves putting the glossary together using as much reference material as possible (r unning and understanding the software, consulting the documentation, and discussing problems with the technical contacts).

After translation, the glossary must be reviewed by the client. The finished product forms the basis for the project. Building a glossary can take anywhere from 2 or 3 days up to 2 weeks, but it is essential if future versions of the software are plann ed.

3. Software Translation and Building A variety of tools exist to aid in this step. Borland's Resource Workshop is probably the most commonly used. This package allows the decompilation of dialog and resource files, and gives access to actual strings of text for translation. For our example, this step would probably take about 1 to 1.5 weeks for the program files and 3 to 4 weeks for the manual.

4. Software Rebuilding and Testing Some companies provide their own test program suites. The testing can involve anywhere up to one week.

5. Help File Translation and Testing Online help files generally come as RTF (rich text format) documents often created with Microsoft's Word for Windows, and contain the help text along with hyperlinks in the form of hidden text. Translation involves overwriting the file but maintaining the links. After translation is completed, these files are recompiled using Microsoft's Help Compiler (HCP), which creates *.HLP files ready to be opened under the standard Windows Help Engine. This stage for our examp le would take 3 to 4 weeks.

6. Documentation Translation and DTP The documentation usually is produced with standard DTP software and may include screen shots, graphics localization (with call-outs), and redesign. The page size may also change to improve the kanii-oriented layout and legibility or just to conform to Ja panese standards. Final files are usually delivered to a printer as PostScript files. Creating a Japanese version of our 100 page example manual should take about 3 to 4 weeks, up to but not including printing.

What does it cost?

Because there is a flourishing market for localization right now, some market price standards have begun to emerge. As market-entry expert Jack Plimpton of Massachusetts-based Japan Entry says, "As a rule of thumb, on products that retail in the US for around $200 to $300, we expect a software localization cost of around $100,000 to $200,000. For manuals, we allow about $100 per page."

These amounts seem typical to slightly expensive for US and Japanese localizers, but they are consistent across the industry. For our example project, Computing-Japan was quoted by one localizer YS million ($45,000) for the software translation and Y80 0,000 for manual translation and DTP (excluding printing).

Special UNIX considerations: X/Windows makes it possible

In the UNIX world, the big breakthrough in internationalizing applications has been the advent of the X11R5 version of the X/Window System (by MIT in the US). The X/Window System is now the de facto industry standard as a UNIX operating environment, wi th over 90% of new software being written for that environment. Using X11R5, applications written for the environment can be made to run in the appropriate language (e.g., either English or Japanese) by defining a local variable from the command line. Alt hough this has been theoretically possible with IBM's AIX version of UNIX for some time, XllR5 is the first portable and well supported architecture to offer this simplified method of switching between languages for a single program.

Popular languages for the X/Window System are C and C++, and since X11RS takes care of internationalization, you don't have to change the compiler when localizing an application. This lets programmers stay with their favored English-language compiler a nd increases their productivity.

The X/Window System has been a boon for UNIX programmers who have to implement bilingual software. Even older applications not specifically designed for it can run under a standard xterm virtual window. Where Japanese is required, the user can run the same program under a utility called kterm and use the X/Window System's ability to display and output kanji. The only real precaution a programmer must take when running software in Japanese is to ensure that the software is "8-bit clean" -- meaning that it is capable of handling 8-bit character codes, not just 7-bit ASCII.

The important point for localizing is that what matters to the Japanese application is the XllR5 layer of the computer's environment, not the vendor-defined layer above (such as Sun's Open Windows). A traditional application is likely to have English-l anguage strings hard-coded into it, but a developer call easily make this same program display Japanese by moving all the English strings to a message file for English messages, and creating another message file with the same structure and content for Jap anese.

Modifying source code to use tokens The major task of a localization engineer having to Japanize a program with hard-coded text strings is to set up tokens in the code that reference either English or Japanese text strings from a database rather than using them locally. In C, for example, i nstead of

printf("Hello")

the programmer would use

printf(gettext(HELLO))

where gettext( ) is a function in UNIX SVR4 that will return a string for the macro value HELLO. Prior to the line containing the HELLO macro, the program would contain a line that defines it. The "Hello" message could be defined in one of two message files: MESSAGES.C for English messages or MESSAGES.JPN for Japanese ones. Depending on whether the program was running in the English environment or the Japanese one, it would use the appropriate MESSAGES file.

Testing the source code

After referencing all the text strings in a program by using tokens that are keys to messages in a message database, the engineer needs to verify that the program still works as it did before. This is a relatively simple task. For example, if gettext( ) is defined as a null macro -- lust temporarily and its argument is defined as a string, then the program should continue to work as it did before. The programmer can do this for the entire application and, after the initial tests, move the strings into the database then translate them into the target language.

Writing a customized message library to access text strings is relatively easy. If the total number of messages is small, the programmer can scan through the file, load the whole thing into memory, index it, and then implement some sort of fast lookup. If the total number of messages is large, then more complex methods including caching of the messages can be used.

Since messages pulled in by the gettext() function are Japanese codes, the user must be running kterm when they are printed to the terminal or the text will be displayed as garbage. Further, when localizing software that has fixed-length fields, the programmer must make sure that translated messages don't exceed the space available. The UNIX X11R5 architecture was built to support international languages. However, a particular vendor (such as Sun, if the UNIX box was bought in the States) may only support a subset of the needed functions. One solution is to go to a public domain serv er on the Internet and ftp Japanese text processing utilities that will work with X1IR5. For example, there is a standard input method called kinput2 (normally supplied with XllR5) that interacts with kterm. However, kterm is not restricted to a particula r input system, however, so kinput2 could be replaced by a different kanii front end processor (FEP). While not vendorspecific, kinput2 is very versatile -- it will do both romaji and kana input and supports four different dictionary protocols. These incl ude Wnn (written by a Japanese university team) and Canna (written by NEC for the public domain) to allow conversion of strings into kanji. Another needed utility is nkf, for Japanese printing services and conversion of data between EUC, JIS, and Shift-JI S.

Applications designed specifically for the X/Window System don't need kterm, but they do need kinput2 or a similar Japanese FEP With Motif 1.2 applications, the text widgets check with X11R5 for the input method appropriate to the language environment. Therefore, a programmer can recompile an application, set the LANG environment variable appropriately, and -- assuming that a Japanese input system is loaded -- enter text into the text widget and process Japanese. (Motif 1.1 didn't support Japanese port ability; so migrating to Motif 1.2 is an easy solution, since it is much better about supporting localization.)

XllR6 is due out shortly, and for those working on Japanese hardware, X11RS is available on NEC EWS4800 machines as well as most others.

There is a large repository of information about the localization of UNIX programs, much of it free and available on the Internet. The best servers for these utilities include:

  • ftp.uwtc.washington.edu
  • ftp.nec.co.jp
  • monu6.cc.monash.edu.au
  • ftp.ora.com
  • ftp.tohoku.ac.jp
  • nic.ad.jp
  • ftp.join.ad.jp
  • ftp.nic.ad.jp
  • ftp.sar.co.jp
  • ftp.foretune.co.jp
  • ftp.ascii.co.jp
  • ftp.waseda.ac.jp
  • moe.ipl.t.u-tokyo.ac.jp
  • ftp.apple.com
  • ftp.dit.co.jp
  • etlport.etl.go.jp
  • sh.wide.ad.ip
  • unicode.org
  • ftp.omron.co.jp

Summary: A Common Ground

The business of localization is no longer the technical challenge it once was, largely due to the influx of foreign software and software houses into Japan and their continuing pressure to change the status quo especiallv the incompatibility between pl atforms). Apple, Microsoft, and those companies supporting the X/Windows System deserve credit for fulfilling a vision that unifies the software environments of many different hardware platforms.

The technical issues have been reduced to a point where a small team can localize a program with only minimal access to the hardware and BIOS information for the target machine. This means a much quicker turnaround on localizing foreign software pack ages, and much lower product development costs. Localization delays on software have dropped from nine months to less than two. The challenge of releasing a new product has gone back to marketing -- which is where a company's post-development effort reall y belongs.