An Introduction to Japanese Search Engines

If you're searching the Japanese Web for information, don't limit yourself to the popular English-language search engines. There are now well over a dozen Japanese search engines waiting to assist in your quest for information about Japan.

by Shaun Lawson

The world wide web promises (or threatens, depending on your point of view) to change the ways that companies and their customers interact. The Japanese Web is just starting to take off, and all expectations are that it is poised for a boom similar to that seen in the English-language Web in 1995.

If you've been regretting that you missed the golden window of opportunity to start a Web business or to advertise your services and products in English back in late 1994, you have a rare second chance. The Japanese-language Web is at a similar stage of growth, and it now provides many of the same opportunities that the English-language Web offered 18 months ago. But if you expect to be successful, the key question is: In the midst of the data flood, how do you get your information out there where netsurfers can easily find it?

Those seeking information face the same question from a different angle: Where can you find information about your current topic of interest? The answer to both questions is the same: a Web-based search engine.

A World Wide Web search engine is a tool that returns a hypertext list of Web pages based on a set of keywords entered by the user. By clicking on a link in the resulting list, the user can go directly to a page of potential interest.

Search engines are vital tools in navigating the World Wide Web. Well known examples of English-language search engines are Yahoo, Lycos, and WebCrawler. For those interested in finding information in Japanese or promoting products and services in Japan, however, a Japanese-language search engine is called for. Just six months ago, there were few Japanese search engines, and those that did exist could hardly be called useful. On the Internet, though, six months is a generation (or two), and Japanese search engines have evolved and proliferated.

This article reviews the status of Japanese search engines (as of mid-March), discusses some of the unique technical problems they face, and provides basic information on how to search the Web and submit registration requests.

Status of search engines in Japan

English-language search engines have already stabilized into mature services. Many are entrepreneurial, and most are supported in full or part through advertising. They provide wide coverage and high-quality search results. Their Japanese counterparts are still in the formative stages, though, run on a largely volunteer basis by large companies, universities, and individuals.

Of the Japanese search engines I have identified, six are academically based while seven are commercial sites based in large firms. So far, only two of these sites carry third-party advertising to support their service.

For a listing of well-known Japanese search engines, including their URLs (Web addresses) and the availability of search, submission, and index pages, see the accompanying "A guide to Japanese search engines" sidebar. (For a Web-based version of this table prepared by the author, point your browser to http://www.atrium.com/ad/search.htm, or click on Computing Japan's search engine links at http://www.gol.com/cj/. In the Web version, the letters "E" (English) and "J" (Japanese) link to the appropriate search engine in the indicated language. The author says he will update this online table regularly and add links for new search engines as they come online.--Ed.)

Types of search engines

There are three basic types of search engine sites: robot (full-text) searches, index (keyword) searches, and "meta-search sites" (a "one-stop-shopping" link to several robot and/or index search engines).

Robot search engines: Odin, RCAAU Mondou, Senrigan, and Titan, for example, use "robots" to generate their search databases. Robots (also known as spiders, or webcrawlers) are specialized programs that traverse the Web, finding and adding pages to their databases as they go along. The actual implementation can be quite complex in order to make the searches efficient and to keep the search engine from overloading small hosts, but the underlying idea is simple: the robot visits a page, adds the data from that page to the database, also adds any links from that page not already in its database to the queue of pages yet to be visited, and then repeats the procedure.

There are a variety of strategies by which a robot decides which pages to index. Most start from a historical URL list, such as server lists or popular What's New pages. Many allow users to submit suggested URLs manually. And some also scan through USENET newsgroup postings and published mailing list archives. For more about search robots, check out the WWW Robot FAQ (Frequently Asked Questions) list at http://info.webcrawler.com/mak/projects/robots/faq.html.

Index search engines are catalog-based engines in which users and providers submit suggested sites with keywords and page descriptions for inclusion in an index. The pages included in the search engine database are usually screened for quality and, since the search is based on key terms and succinct descriptions, the search results generally have better focus than those derived from robot search engines. (Though they are also less inclusive.) Yahho, whose name is a play on the well-known Yahoo search engine, is a good example of an index search engine.

Meta-search engines are forms that allow searches with multiple search engines based on a single input of search keywords.

Web content
and coverage

How does the volume of Japanese Web pages, and the coverage of these Japanese search engines, compare to that of the English-language Web world. As of mid-January, the well-known Lycos search engine covered approximately 19 million URLs (see the "A look at URLs" sidebar). Some 4.8 million of these were Web documents, and the Lycos database contained 32 million links. (For details, see http://www.lycos.com/sow/TrueCounting.html.)

In comparison, Japan's Titan and Odin are estimated to cover approximately 300,000 URLs, or just 1.5% the volume of the Lycos data. These figures clearly illustrate the difference in the stage of growth between English and Japanese Web content. Up to date Japanese search engine statistics in Japanese are made available by Isao Asai at http://www.bekkoame.or.jp/~asaisan/.

Additional search
difficulties

A major difficulty for Japanese robot search engines is the need to parse Japanese text. Since spaces are not used to delimit words in Japanese, and the language has many homonyms, this creates a big problem for search engines that cover full text. The word "sushi" is an example of a Japanese term that could be mistakenly detected in completely unrelated text -- in the same way that a search for "art" might turn up Web pages with articles on "flowcharts" or "Descartes" if spaces were absent in English.

In addition, there are two commonly used character encoding schemes on the Japanese Web: EUC and SJIS. A search engine must be able to recognize and search both types of pages. Further, the robots must be able to distinguish between Japanese and non-Japanese pages, and use heuristics in Web searches to keep from having to search all the world's pages just to find the Japanese ones.

Searching only Japanese domains (.jp), at first glance, would appear to do this. Indeed, this is the technique used by Senrigan. But foreign-based Japanese pages (those on a server with a .com address, for example) will be missed by the Senrigan search engine and others using the same technique.

The search process

Even if you are working in an English-language computing environment, there are solutions for reading Japanese Web pages. Searching for kanji text, however, also requires Japanese input, which some of these solutions do not support.

If your system isn't capable of Japanese input, the Nippon Search Engine provides two creative solutions. If you input an English term and check the Translation box, then click the Search/Translate button, the system returns a list of translations. Pick the one you want, then run the search. Or, you can input romaji (romanized Japanese) and press the Japanization button. The search engine will then return a page with a list of possible kana/kanji conversions from which you can make your selection, then start your search. The home page for the Nippon Search Engine states that they are currently attempting to patent these search and translation systems.

The RCAAU Mondou search engine provides a similar function. If you use an English word for the search, a translation will often appear as a related keyword in the response. Clicking on this translation initiates a search focused on the Japanese term.

I did a quick comparison of Japanese search engines by doing a search for three arbitrary benchmark words: earthquake (地震), trade (貿易), and hot spring (温泉). The "Search engine comparison" sidebar ranks 15 search engines based on the number of hits for these three terms.

Submitting a site

Search engine databases are populated by URLs obtained by robot search or site registration.

In the case of a robot search engine, there is an unknown lag between the time a page is put on the Web and when it may be discovered by the robot -- or, it might never be discovered. Accordingly, robot search engines often allow for user submission of a URL, which puts that site on the robot's queue of places to visit. Even so, it can take as much as two weeks for a site to be added to the robot's database. (Of the Japanese search robots, only Titan currently allows user URL submissions.)

Index-based search engines, on the other hand, require submission of not only a URL, but other information as well. Depending on the provider, that data may include an e-mail address, the submitter's name, key words, category, and site description. And for the search engines covered here, this information may need to be in Japanese, or English, or both. Unfortunately, there is nothing equivalent to a "Submit It!" service that enables a single site information form to be submitted to multiple search engines and indices. You'll have to submit your site to each search engine manually. Surprisingly, some sites still require e-mail submissions.

Promoting your Web site through postings in Japanese USENET news groups is also possible. Take care in the presentation, though; the Japanese-language news groups are even more anti-commercial than the English ones, due to formal commercial-use restrictions in the acceptable use policies of Japan's academic networks.

Time's a wastin'

What more do you need to know? Fire up your browser, point it to some of the sites listed in the "A guide to Japanese search engines" sidebar, and try some trial searches. With a bit of experience, you'll soon settle on one or two favorites and, as the Japanese Web grows, you'll know right where to look to find the data you're searching for.ç

Shaun Lawson is a 12-year resident of Japan. He can be reached by e-mail as shaun@atrium.com.

A guide to Japanese search engines

Listed here are 17 Japanese search engine sites. Each listing gives the site name, the provider, the services offered (searching, site submission, and/or cataloging), and the URL address for the search engine. All services/pages are in Japanese unless otherwise specified as bilingual (E/J).

Index-based search engines

CSJ Index, from CyberSpace Japan; search, submit, catalog (E/J)

http://www.iijnet.or.jp/csj/

Hole-in-One, from Hitachi International Business; search, submit, catalog

http://hole-in-one.com/

InfoBee Search, from NTT; search (E/J), catalog (E/J)

http://navi.sl.cae.ntt.jp/index.html (English)

http://navi.sl.cae.ntt.jp/home.html (Japanese)

InfoNavigator, from Fujitsu; search

http://infonavi.infoweb.or.jp/

Japan Search Engine, from Kyoto University; search (E/J), submit (E/J), catalog

http://www1.nisiq.net/~jsengine/index-eng (English)

http://www1.nisiq.net/~jsengine/ (Japanese)

NetPlaza, from NEC; search, catalog

http://www.meshnet.or.jp/NETPLAZA/index.html

Nippon Search Engine, from Keio University; search, submit

http://www.juno.sfc.keio.ac.jp/NSE-NS/dive/

URL Square, from Osaka University Network; search (E/J), submit

http://www.orions.ad.jp/urls/index-jp.html

Wave Search, from Sony; search, submit

http://www1.sony.co.jp/InfoPlaza/WAVESearch/

WWW Navigator, from Impress; search, submit

http://home.impress.co.jp/magazine/inetmag/wwwnavi/index.htm

Yahho, from Yasuhiro Chikata; search, catalog

http://yahho.ita.tutkie.tut.ac.jp/yahho/search.html

Robot-based search engines

Odin, from University of Tokyo; search

http://kichijiro.c.u-tokyo.ac.jp/odin/

RCAAU Mondou, from Kyoto University; search

http://www.kuamp.kyoto-u.ac.jp/labs/infocom/mondou/search.html

Senrigan, from Waseda University; search (E/J)

http://www.info.waseda.ac.jp/search-e.html (English)

http://www.info.waseda.ac.jp/search.html (Japanese)

Titan, from NTT; search (E/J), submit

http://isserv.tas.ntt.jp/chisho/titan-e.html (English)

http://isserv.tas.ntt.jp/chisho/titan.html (Japanese)

Meta-search sites

Adults All-in-One, from Mizukawa; search (E/J)

http://www.bekkoame.or.jp/~mizukawa/etc/o-search.html (English)

http://www.bekkoame.or.jp/~mizukawa/etc/os-adult.html (Japanese)

EasySEARCH, from Shinya Honda, NIBA; search

http://www.aist.go.jp/NIBH/~honda/search.html

A look at URLs

URL, which stands for Uniform Resource Locator, is the address to a source of information -- what you type in your browser's "location" box. Each URL contains four distinct parts: the protocol type (such as http or ftp), the machine name (for example, www.gol.com), the directory path (such as /cj/), and a file name (such as main.html).