Disaster-Recovery Planning for Telecommunications Security

Disaster-recovery planning is a corporate necessity in any location. The recent Kobe earthquake has brought the matter to the forefront of attention, but other causes of disaster could cause equal if not greater impact to your business ó fire, flooding, or wind damage from a typhoon, for example. Massive earthquakes are attention grabbers, however, so with dire predictions that "the Big One" will hit the Kanto area sometime during the next decade, Tokyo-based IT managers are probably in a better position than ever before to make the case for disaster-recovery planning and expenditures to their management.

by Thomas Giuffre

Advancing technologies and new services have changed the disaster-recovery landscape. New building designs can better withstand the vibration and shock of earthquakes or the spread of fire, and improved telecommunications infrastructures provide the cautious information technology (IT) manager with options for circuit diversity and alternate route backups. On the other hand, today's businesses more than ever depend on information systems and telecommunications. The risk to businesses that do not plan effectively for a disaster can be enormous. A once-thriving business can utterly fail during the days or weeks required to recover normal operations.

Business managers often fail to recognize the extent to which their business depends on reliably functioning technology. As the complexity and transaction rate of a business process increases, so too does the risk and potential impact of an extended outage. In the stocks and securities business, for example, where systems are designed to give traders an edge in terms of minutes or seconds, an outage of an hour or more is serious. An outage for a day could be catastrophic.

It is only after conducting a thorough business impact analysis that the full picture of the interrelationships and complex dependencies between the business and the technology becomes clear. But the planning process is complex. For medium- and large-scale enterprises, it requires consideration of myriad elements of vital business processes and the technology systems that support or execute them. Because disaster-recovery planning is by necessity a complex and detailed process, it should involve all of an organization's management.

An important part of the process is education. With information technology becoming increasingly widespread, business managers are often unpleasantly surprised by the degree and extent to which their operations could be impacted by a technology system failure. And because modern businesses must stay in close communication with other organizations, customers, and associates, the dependency on telecommunications is critical.

The difficult task of planning

For a foreign enterprise, the difficult process of disaster-recovery planning becomes even more complex. Foreign enterprises necessarily must view the problem in a global context, because national and local standards, regulatory policies, and matters of national defense differ from country to country. In this sense, local IT managers need to understand the particulars of Japan as well as their corporate headquarters' policies. Being a foreign company adds an additional disadvantage: During a national disaster, there will be a general procedure that is followed to bring services back on-line ó and your enterprise is probably not at the top of the list. The local MIS (management information systems) manager must be prepared to deal with the situation.

Disaster-recovery planning for telecommunications is a component of the entire business recovery planning process. It requires special attention, though, because of dependencies on technology systems that are beyond the immediate control of the enterprise. Lease-line circuits, satellite links, store-and-forward data services, data banks, and real-time data feeds: all of these are technology services for which your company is dependent on outside vendors.

Probably you have never visited these vendors' operation facilities to see how they are set up, yet your business success is contingent on the reliable delivery of their services. To prepare your own telecommunications disaster-recovery plan, it is a valuable exercise to understand how your carriers and data service vendors are set up to operate, what type of contingency plans they have in place, and to what extent your service agreements cover events at their facilities that can impact your business.

This is the area where most contingency plans fall short. Inexperienced planners will do the obvious by backing up a leased line and requesting circuit diversity, but they may not bother to find out that the backup line is part of the same cable, or that it terminates into the same central office. If that cable is cut by a careless construction crew, for example, both the primary and the backup link are out of service. And if both circuits of your mission-critical link go through the same central office, a fire there could knock out your service. While carriers are generally eager to help their clients draw up viable contingency plans, it is your responsibility to lay out your needs and ask the right questions.

Secure telecommunications

Telecommunications carriers are generally eager to brag about how well organized and prepared for disaster they are. Reliability, after all, is the life blood of all carriers. Carriers in Japan are relatively open to questions about their facilities; managers dealing with this issue should ask some tough questions about a provider's service-level capability and backup readiness.

Most carriers include in their literature a description of facilities locations, backup systems, marine cables and satellite earth stations, as well as the interconnections among them. There are some things to watch out for, though. If you use a Type II carrier in Japan, for example, you may encounter an NTT subcontract for the use of NTT cables to link your facility to your carrier. Since NTT owns most of the infrastructure in Japan, all of the Type II carriers lease NTT circuits. This detail is generally hidden from the end user. MIS and network mangers should ask to see a schematic diagram of the cable route from the carrier to their building. (Note especially the location of sub-stations and central office facilities, and who owns them. If the primary and backup sites are in the same city ó or worse, within a few kilometers of each other ó think seriously about earthquake integrity.)

For organizations with multiple international circuits running mission-critical applications, circuit diversity should probably take the form of alternative carriers. Using KDD and one of the Type II carriers in combination can be an effective method for reducing the risk of impact to your business, both from widespread disasters and those isolated to a single carrier. While I do not endorse the use of KDD per se, it does have the most extensive and sophisticated network facility in place in Japan. KDD once had a national mandate to be the international service provider for Japan. After deregulation in 1986, KDD became semi-privatized, and Type II carriers entered the market in competition with KDD. While KDD is still the most expensive service, it maintains numerous points of entry into Japan via both marine cable and satellite. Most of the Type II players lease circuits from KDD directly, but they do not enjoy the full range of circuit diversity and alternate-path options that form the KDD infrastructure.

One value of working with KDD is its Plan-H and Plan-M services that allow businesses to locate all or part of their telecommunications equipment within a KDD facility as either primary, or backup, or both. Customers also have the option of placing their own staff at the KDD site or contracting KDD personnel for maintenance and network management tasks. Other carriers offer these types of services to varying degree, but generally not to the extent that KDD does. KDD is also exploring the expansion of this service to include EDP (electronic data processing) functions.

Aside from the obvious benefit of outsourcing a highly technical task to a skilled operator, these types of services offer the additional value of being able to locate your corporate communications equipment in purpose-built facilities. The building codes of such structures must meet substantially higher levels of structural integrity and resistance to disruption. Your office building almost certainly cannot match these standards. (And can you be certain that your facilities' backup equipment is properly maintained and tested?) Ask your carrier what services of this type it can offer, and then determine what level of support is appropriate for your business.

Coping with disaster

By looking at recent events in Kobe, we can learn some lessons about effective planning, readiness levels, and what to expect if a similar event happens to us. The Kobe earthquake presents a rare case study for evaluating the integrity of various technology implementations, including commercial building codes, power and telecommunications cabling infrastructure, and cellular telephony infrastructure. Experts will be collecting and analyzing data from the disaster for months to come, but some preliminary lessons can be discerned from the experiences of Reuters and its customers.

Reuters, which is one of the world's largest providers of data and information services, is one provider that takes disaster-recovery planning seriously. Businesses that use Reuters data depend on its timely and reliable delivery. According to Geoffrey Flynn, managing director of Reuters Japan, the company implements numerous steps to ensure that its customers receive reliable and secure service. Reuters takes a proactive approach and integrates recovery planning into its basic business model to substantially reduce the likelihood of a severe outage.

Nine of ten Reuters' customers in Kobe experienced brief disruption of service connections. The subscriber that did not was using Reuters' Small Dish Service to receive its feed via satellite. This subscriber was operating off a small island in the bay area near Kobe, so its telecommunications were supported by microwave line-of-sight (LOS) links and power was backed up locally with generators. These precautions minimized the impact to business operations of the enterprise.

The Reuters' data operations center (known as the MTC, or Main Technical Center) in Tokyo was constructed to precise specifications ó purpose-built to meet stringent standards similar to a carrier's facility. While Reuters does not own the building, they fully occupy it and have established a close working relationship with the building management to ensure that contingency systems are available. Another facility, across town, serves as the backup site and subscriber data depository. These two sites are a reasonable distance apart, though one can argue the case for having a site outside Tokyo.

For Reuters, this means out of the country, in Singapore (site of another regional MTC). This may seem extreme and impractical for small- to mid-size organizations, but if your business has at least one leased-line link to another country, it is possible and practical to have your critical data backed up in this way. A well-engineered link can utilize a combination of powerful technologies, such as frame relay and ISDN, to supply an aggregate emergency bandwidth many times the normal value. And since these types of technologies are connection-oriented, the cost of maintaining contingency capability is near zero.

Aside from Reuters internal use of technology to mitigate the risk of system failure, the company offers several services to its customers to enhance the reliability of service delivery. One of these, the Small Dish Service, relies on a dedicated satellite channel to broadcast Reuters data to subscribers with small roof-mounted parabolic dish antennas. Reuters has leased a channel on JSAT-1. Currently, this is a low-bandwidth service and does not provide for full subscriber support, but an enhanced version of the service is scheduled for this summer that will provide for full subscriber support in addition to Reuters Financial Television. One limitation of the technology is that subscribers cannot interactively subscribe to additional data, as they can over the traditional leased line subscription service. Subscribers can continue to trade in most respects, however, thus limiting the impact of an outage isolated to the domestic carrier or nearby vicinity of the business. There are, however, some exciting technologies that will alleviate such problems altogether very soon, and you can expect a proactive organization like Reuters to aggressively deploy these technologies to maintain its competitive quality and value of service advantages.

NTT takes it well

Among telecommunications carriers, NTT suffered the greatest damage to its facilities from the Great Hanshin Earthquake. This was only to be expected given the proportionally extensive amount of infrastructure that it operates.

KDD, ITJ, and Sprint all reported no damage to their facilities in the Kobe and Osaka areas. Although these international carriers were able to provide service, many of their customers suffered disruptions because of the presence of an NTT cable in the local loop.

The degree of damage to cellular telephony differed by service provider. Again, NTT maintains the most extensive coverage in the area, and it suffered the most notable damage. The NTT network seemed robust, though, with many users relying on cellular phones for basic communications during the initial hours after the quake hit.

While NTT reported losing only six satellite communication dishes, facilities damage to the main switching office effectively downed the remaining 163 uplink stations covering the area. (The reports revealed that backup power generators were not able to come online due to cooling systems damage.) However, the speedy recovery of basic communications services is a testament to the readiness level of NTT to deal with the event. While the company has taken a media bath for the initial downed communications in the Kobe area, most other carriers have praised NTT and the company's response to the disaster.

Looking to the future

Personal communications services (PCS) terminals and the INMARSAT mobile systems satellite service will enable businesses to enjoy full-duplex transmission of voice and data traffic, just as they have over leased circuits in the past. These technologies have been around for some time, but regulatory issues in Japan (and other countries) have restricted their use. The continuing deregulation of telecommunications services will usher in new and broader applications of the technology.

Contingency systems will be one of the early applications. In Japan, the regulatory policy of the Ministry of Post and Telecommunications (MPT) currently restricts the use of INMARSAT technology to businesses that are clearly mobile in nature (such as the shipping and aviation industries). Subscribers to the INMARSAT service must hold a license in order to operate the mobile terminal systems. KDD and several of the other international carriers are expected to offer PCS service sometime during 1997 or 1998.

There are two factors to bear in mind regarding telecommunications services. First, public policy issues and the regulatory policy of the MPT ensure that telecommunications facilities are built to substantially higher standards than ordinary commercial buildings. National law requires all carriers to obtain licensing that includes, among other things, compliance with stringent building codes. This helps to maintain a quality assurance level not always observed in other countries or industries.

The second point is that carriers have to maintain the image, especially for basic telephony services, that their network is available 100% of the time. Naturally, equipment breaks, circuits fail, and human errors occur, but the carriers go to great lengths to ensure that these localized events are transparent to their users. (It is likely that some of the damage suffered by carriers during the Kobe quake ó a damaged sub-station or two, localized power loss, microwave relay towers brought down ó will never be reported fully). The extent to which a carrier can isolate its customers from outages and local disasters, and minimize service disruptions, is a testament to the carrier's disaster planning and ability to respond to events as they occur. Disaster-recovery planning is a process needed by all businesses, regardless of their technology level. Technology merely complicates an already difficult task. If you manage technology or business functions within your enterprise, you should be aware of all existing contingency plans. When was the last time you dusted off those documents and had a good look? (If you can't remember, then it is too long.)

If your employees were prevented from entering your office tomorrow, could you adequately recover your business functions? What steps should you take? If you expect to stay in business, you can't afford not to know all the right answers.

The phases of business impact analysis
Analyze the business environment

* Clarify vital business processes and their supporting applications.
* Identify interim disaster impact-reduction measures.
* Raise corporate level of disaster planning awareness.

Assess the processes and applications

* Determine current recovery status.
* Specify IS environment in which each application functions.
* Identify application recovery challenge.

Determine anticipated business impact if the process cannot function

Prioritize application recovery

* Determine business recovery requirements and individual application recovery priorities.
* Specify each application's data requirements (e.g., data currency, data loss, and catch-up workload).

Analyze the probable impact

* Develop aggregate definition of enterprise impact.
* Identify feasible recovery options.
* Form a consensus among management and business process leaders on assigned criticality level, acceptable level of residual risk, recommended recovery model, and needed level of readiness.

Develop a workable business recovery plan

* Define a disaster-recovery strategy and its implementation steps.
* Develop a step-by-step business recovery plan.