By Hugh Ashton
Say the words ‘disaster’ and ‘Tokyo,’ and the vast majority of us immediately think of earthquakes. But for one man, in charge of business continuity planning for the Tokyo branch of a leading European financial institution, disasters come in many shapes and sizes. All result in the inability to continue business as normal, and in Japan, more than anywhere else in the world, planning for disaster recovery and business continuity are necessities.
The professionals in this field distinguish between ‘disaster recovery’ and ‘business continuity,’ with the former typically referring to the loss of IT infrastructure and the subsequent recovery of lost capabilities; and ‘business continuity,’ encompassing a wider range of activities, with IT supporting the business and forming an integral part of the whole contingency plan.
Our professional mentioned above points out that a disaster could be microscopic—an influenza virus, for example. The Tokyo Metropolitan Government has the theoretical power to close the whole of a building in the event of the outbreak of infectious diseases. For foreign companies, typically sharing buildings with other enterprises, the threat of another company’s worker bringing back the SARS virus or avian influenza as an unwelcome souvenir of a trip abroad, and thereby locking out the employees of all companies using that building, is very real.
A counterpart in another European bank in Tokyo emphasizes the threat of power failures, mentioning a recent power failure lasting less than a second, which was monitored by his building’s managers. This failed to have any impact on the IT operations of the institution (protected by backup power systems), but what gave him cause for concern is that the glitch apparently spanned the Tokyo power grid system, as reported by building managers across the city. In another recent example, a barge-borne crane jib brushed power lines spanning a river, and large sections of central Tokyo were without electricity for an extended period (it was reported that at least one investment bank was forced to halt trading operations for several hours). It should also be remembered that the recent Niigata earthquakes underlined Japan’s reliance on nuclear power stations, which are not as reliable and fail-safe as was previously believed.
Furthermore, increased trading levels caused by Japan’s emergence from recession, coupled with the pressure imposed by the US subprime debt crisis, may cause a meltdown in data processing capacity within the trading houses, or within the bourses themselves. Either would constitute a disaster of a kind.
Moving down the list, terrorism and political action, including strikes, are always a possibility, though these events are rare in Japan. The Japanese government’s public support of the US has made Japan a potential target of extremist terrorist groups opposing the invasion and occupation of Iraq. But even without this, a transportation strike in a city as heavily dependent on public transport as Tokyo could cripple a company if its workers were unable to commute. Of course this also applies to employees housebound as the result of extreme weather conditions, for example a typhoon, and those stranded as the result of technical difficulties affecting the mass transit system—a relatively minor delay on Tokyo’s busy rail system can affect tens or even hundreds of thousands of commuters.
Japan’s rules and regulations, such as the measures mentioned above, can prevent the application of global standards to the Tokyo office, even though the usual risks that could occur anywhere should be taken into account—fire or flood in the data center, failure of airconditioning (maybe not so vital for humans, but essential for the computers underpinning businesses) and so on.
Given this long list of potential calamities, what is required of companies doing business in Japan? Though no legal compulsion exists on financial services firms to produce countermeasures, the Japanese Center for Financial Industry Information Systems (abbreviated as FISC) produces a manual, used by the Financial Services Agency (FSA) in its regular inspections of financial houses as a yardstick to measure preparedness. If the inspected entity fails to meet the standards as prescribed in the manual, the FSA has the option (as a last resort) to suspend or revoke the license of the offending entity.
For Japanese firms, with many regional centers, it is a relatively easy task to relocate almost all operations away from Tokyo and satisfy the requirements. Foreign firms operating out of a single base require an alternative site from which to work (as opposed to a data center), and must conduct risk analysis; determining which business operations are necessary for continued survival, and at what level, and which of these must be restored within four hours of a disaster, within 24 hours, three days, etc. For some businesses, such as fund management firms, instant resumption of operations may not be necessary, and an alternative continuity site may be located some distance from Tokyo in a cheaper location. Some operations may even be carried out equally well from an overseas office, but some (e.g. The trading of Japanese Government Bonds) must be carried out from within Japan. Very often, the Service Level Agreements between the IT department and the businesses served by IT play a key role in determining these issues.
Likewise, it should be determined which operations may be performed by staff members at home—after all, there is little point in duplicating every support function in the alternative site. However, Japanese regulations demand that certain infrastructure standards be met; for example, telephone conversations from trading desks must be recorded, and this secure recording facility is not available as part of a home telephone system. Neither, to take another example, is it feasible to keep expensive market data feeds at traders’ residences—these should be on standby at the continuity site.
Naturally, there is more to business continuity than putting a few desks, PCs and telephones in a warehouse on the outskirts of Tokyo and hoping that staff will magically make their way there. The “human infrastructure” of emergency call trees, education of staff roles in the event of a disaster, and regular testing (annual tests of the technology and plans are part of the FISC manual) also plays a vital role.
On the IT disaster recovery side, IT departments of firms should ask themselves “If our data center becomes inoperable, or our data becomes inaccessible, how would we go about restoring the services required by the businesses in the required times?” Recent advances in technology and Japan’s advanced communications network help to answer this question and steer solutions away from traditional backup tapes stored off-site. These tapes, even after retrieval to the recovery site, by comparison with disks, are agonizingly slow when it comes to restoring the data. Days could elapse before the data is actually usable again.
Today’s trend is towards large ultrareliable disk arrays incorporating a lot of redundancy, and Tadaaki Sumiya, the Japanese Product Marketing Manager for EMC, one of the global leaders in this field, explains the Japan’s high-speed telecommunications infrastructure makes real-time replication of critical data between such disk arrays realistic over relatively short distances (less than about 100km). In other words, it is possible to keep accurate copies of all critical transactions as they occur (be they equity sales, mobile phone usage records, or whatever) at two separate sites. Should disaster strike the primary site, the secondary automatically “fails over” to take its place. For less critical data, where a few minutes’ discrepancy is less business-critical, timed backups to the remote site are also possible.
The location of the remote data site to replace the lost primary site is important. Over a certain distance, physical constraints start to kick in, and there is a lag imposed by the length of the widearea network. However, it must be in an area which will remain safe should a disaster strike central Tokyo, and it should be relatively accessible for maintenance, etc. The areas to the west and north close to Tokyo (Tama, the western Chuo line, Tochigi and Saitama) lie on a geologically discrete area from that of Tokyo, and therefore several firms requiring up-to-the-second data replication have located their alternative data centers and business continuity sites there. The common practice of placing data centers and business continuity sites in separate locations mirrors the recent trends towards “thin buildings” where data centers are housed separately from the main office space.
As additional protection, some firms implement a third data facility—a backup site, where data is archived, not on tape, but over lower latency communication lines on a regular (at least daily) basis. A number of different hardware strategies may be involved, and specialists are ready to offer advice on what data should be maintained where, and on what systems—a practice often referred to as Data Lifecycle Management, which forms an important part of an enterprise’s disaster planning.
For those setting up shop in Tokyo whose business livelihood depends on constant access to up-to-date business data, be the business financial, manufacturing, retail, communications, or whatever, disaster recovery and business continuity should be more than simply items on a “to do” list and should form a key part of the business strategy.