Delta Security Solutions

"The Difference in IT Security"

 

 

 


Gotham Housing Assistance Group

Business Continuity/

Disaster Recovery Analysis

 

 

 

 

 

 

Prepared By

 

John McDonald

Delta Security Solutions

July 28, 2001


 

1 Introduction. 1

2 Overview.. 2

2.1 Executive Summary. 2

3 Disaster Recovery Requirements. 4

3.1 Work Processes 4

3.1.1 Payroll Processing. 4

3.1.2 Employee Compensation. 4

3.1.3 Bank Reconciliation. 5

3.1.4 Leased Housing Payments. 5

3.1.5 Client Request for Public Housing. 5

3.1.6 Client Request for Leased Housing. 6

3.1.7 Unit Availability Update. 6

3.1.8 Yearly Client Re-certification. 6

3.1.9 Quarterly HUD Reports. 6

3.1.10 Ad-Hoc Reports. 6

3.1.11 Work Order Processing. 7

3.1.12 Employee Information Update. 7

3.1.13 Workflow Support Functions. 7

4 Issues. 9

4.1 Documentation. 9

4.2 Backup Sun Server 9

4.3 Payroll Processing. 10

4.4 Paper Files 11

4.5 Tape Transfers 12

4.6 Monitoring. 12

4.7 Boot Drives Not RAID.. 12

4.8 Alternate Floor DR Site. 13

4.9 Employee Pay Policy During a Disaster 13

4.10 Licensing for Applications 13

4.11 Forms 13

4.12 Backup Copies of Applications 14

4.13 Backup Processing. 14

5 Initial DR Plan Overview.. 15

5.1 Alternate Sites 15

5.2 Systems 15

5.3 Network. 16

5.4 Physical Plant 16

5.5 Phones 17

5.6 Elimination of a single server system.. 17

5.7 Elimination of multiple server systems 17

5.8 Elimination of Network Components 17

5.9 Elimination of the 6th floor data center 17

5.10 Elimination of the entire 6th floor 18

5.11 Elimination of the entire building. 19

 

 


1        Introduction

 

This document defines the results of a Business Continuity/ Information Infrastructure (IT) Disaster Recovery analysis effort performed by Delta Security Solutions (Delta) for the Gotham Housing Assistance Group (GHAG). The primary goals of this effort were to analyze GHAG's current environment in the context of business continuity and disaster recovery planning, develop recommendations for implementing an adequate disaster recovery plan, and develop an initial plan for use by GHAG.

 

This document contains information that is proprietary and confidential to Gotham Housing Assistance Group and may not be disclosed, either in part or in whole, to any person or agency outside of GHAG without the express written consent of GHAG.

2        Overview

 

Gotham Housing Assistance Group currently maintains a moderately complex IT infrastructure, with a single Sun/Solaris server running the critical core GRC application, various Windows NT servers providing email, file and print services, and numerous Windows-based client systems. The network consists of a high-speed connection from Gotham City Hall (which acts as an ISP) to the Main St. facility, switched network connections within the data center and to the other floors within Main St., and medium-speed connections to the various other GHAG facilities, including the remote offices at Broad St. and other GHAG properties.

 

Phase I of this effort involved analyzing and understanding the type of workflow currently implemented within GHAG. This consisted of identifying the work processes utilized by the organization, identifying what IT resources were involved for each process, and assigning a level of time criticality to each process.

 

Phase II consisted of analyzing the current IT infrastructure, including systems, networks, applications and facilities. This analysis was done with an eye towards resources that could be brought online to support GHAG operations in the event of an emergency. Five levels of disasters were considered during this analysis:

 

        Elimination of a single server system

        Elimination of multiple server systems

        Elimination of the entire 6th floor data center

        Elimination of the entire 6th floor

        Elimination of the entire building

 

In the context of this analysis, 'elimination' indicates that the component is not capable of performing its required function. This could be due to its destruction, having access to the resource rendered unsafe, etc.

 

Phase III consisted of developing a series of recommendations regarding options for implementing disaster recovery. This document defines the various issues associated with developing a comprehensive disaster recovery plan for GHAG, and provides recommendations for addressing those issues. In some cases, multiple recommendations are provided; GHAG will be required to choose one recommendation as the best possible approach.

 

Once GHAG has analyzed all of the recommendations and the issues have been addressed, the final document for the disaster recovery plan outline will be completed and delivered to GHAG.

 

It should be noted that the analysis effort focused exclusively on Disaster Recovery issues related to IT support for business processes. This document and the associated plan outline should be integrated into an overall Disaster Recovery planning effort for GHAG that includes areas that are not supported by IT.

 

2.1       Executive Summary

 

Delta recently undertook an effort to analyze GHAG's IT infrastructure in the context of Business Contunity/Disaster Recovery. The goals of this effort were:

 

        Identify issues that would hinder implementing an effective and comprehensive Disaster Recovery plan for GHAG's IT infrastructure

        Develop recommendations for addressing those issues

        Develop a basic plan to be used as the basis for a more comprehensive DR plan as the issues identified are addressed

 

Based on the analysis of GHAG's current overall disaster recovery readiness, Delta estimates that GHAG would be unable to effectively recover from any disaster higher than a single floor disaster in less than 6-12 months. The primary factor that will prevent GHAG from recovering from a disaster is the extensive use of paper files in an unprotected environment. For example, if the current fire sprinklers were to be set off in the building, most of the paper files would be rendered unusable.

 

Even with the implementation of a disaster recovery plan based on the recommendations in this document, another critical factor would significantly impact the cost and complexity of the plan - the processing of payroll in-house. This processing has the tightest recovery time requirement, and even a major failure of a single system (GRC) at the right time may prevent GHAG from meeting its payroll requirements in a timely manner. Addressing this issue could potentially reduce GHAG's maximum recovery time requirements from 4 hours to 1-2 days, as well as dramatically reduce the complexity of the DR effort by reducing the need to accommodate special forms and printers in the plan.

 

3        Disaster Recovery Requirements

 

This section defines a series of requirements for GHAG's disaster recovery processing.

 

3.1       Work Processes

 

During the Phase I analysis effort, Delta identified a series of work processes that are critical to the correct functioning of GHAG as an organization. Each of these was qualified in terms of level of criticality and the amount of acceptable downtime.

 

Note that the maximum acceptable downtime is based on the worst-case scenario, not on a sliding time window. For example, if it is acceptable for a process to be down for 1 day most of the time, but has a critical 4 hour window at one point during the week, the maximum acceptable down time is defined as 4 hours. This approach reduces the complexity of the Disaster Recovery plan and ensures that the final plan can accommodate any type of disaster occurring in any timeframe.

3.1.1       Payroll Processing

Payroll processing involves the following steps:

 

        Employee timesheets are turned in to manager Monday AM

        Timesheet information is entered into the GRC system Monday PM and Tuesday

        A validation is performed on the data and any corrections are made on Wednesday

        The data for those employees with direct deposit is sent to the bank by 5PM on Wednesday

        Checks are cut on Wednesday

 

Payroll processing utilizes the GRC application running on node 'ghag'. Paychecks and timesheets are printed on custom print stock utilizing a Printronix P5215 132-column printer.

 

This process is currently the most time-critical one in GHAG's environment. In a worst-case scenario, downtime of more than 4 hours at the right time (i.e. Wednesday afternoon) can completely disrupt the entire process and prevent employees from being paid on time. Based on this criticality, Delta estimates that maximum acceptable downtime for this process is 4 hours.

3.1.2       Employee Compensation

Employee compensation processing involves creating a file containing various employee payroll-related information and sending it to Megacorp via the Internet. This processing is accomplished on a weekly basis in conjunction with Payroll Processing.

 

Employee Compensation utilizes the GRC application running on node 'ghag'. In addition, it requires the creation of a data file for transfer to Megacorp via the Internet.

 

This process is closely tied to Payroll Processing and must be accomplished in parallel with that process. However, a delay of up to 1 week in the processing will have a minimal effect. Based on this criticality, Delta estimates that maximum acceptable downtime for this process is 1 week.

 

Delta recommends that GHAG's Human Resources and Legal departments develop a comprehensive policy regarding possible outages of Employee Compensation functionality during disaster recovery periods.

 

3.1.3       Bank Reconciliation

Bank reconciliation involves receipt of a 9-track tape from the bank with various financial data and transferring the data into the GRC system. This process utilizes the GRC application running on node 'ghag', along with the attached 9-track tape drive.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 1 week.

3.1.4       Leased Housing Payments

The Leased Housing Payment process involves creating checks to pay landlords providing leased housing for GHAG clients. This processing occurs at mid-month and at the end of the month. It should be noted that the mid-month payment issue is optional and is not critical to the proper execution of this process.

 

Leased Housing Payment processing utilizes the GRC application running on node 'ghag'. Checks are printed on custom print stock utilizing a Printronix P5215 132-column printer.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 1 week.

3.1.5       Client Request for Public Housing

Client request for public housing is one of the core processes for GHAG. It involves the following steps:

 

        Client attends a briefing at GHAG

        Client fills out request paperwork and submits it to GHAG

        GHAG enters the request into the GRC system and a client ID is generated for the client

        The GRC system ranks the client request based on a number of criteria

        Twice a week a list of client requests with matching available units is run and reviewed by GHAG personal

        A letter to the client is generated (utilizing Microsoft Word), informing them of the availability of a housing unit

        The client's folder gets sent to the development manager for the unit

        The development manager contacts the client to arrange a showing of the unit

        If the client accepts the unit, the client's folder stays at the development

        If the client rejects the unit, the folder is sent back to GHAG

        The GRC system is updated to reflect the client's new status

 

The Client Request for Public Housing process utilizes the GRC application running on node 'ghag', along with Microsoft Word for creating the client letters. All client letters are stored in a shared storage area on node 'ghagfiles'.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 1 week.

3.1.6       Client Request for Leased Housing

For purposes of IT Disaster Recovery, the Client Request for Leased Housing process is similar to the Client Request for Public Housing process with the following exceptions:

 

        The request/availability listing is run weekly, as opposed to twice a week

 

The Client Request for Leased Housing process utilizes the GRC application running on node 'ghag', along with Microsoft Word for creating the client letters. All client letters are stored in a shared storage area on node 'ghagfiles'.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 1 week.

3.1.7       Unit Availability Update

The Unit Availability Update process involves updating the status of unit availability within the GRC system. This process involves the following steps:

 

        A Vacancy Action Form is filled out by the development manager

        The form is sent to Tenant Accounting

        The unit's status is updated in the GRC system

 

The Unit Availability Update process utilizes the GRC system running on node 'ghag'.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 1 week.

3.1.8       Yearly Client Re-certification

The Yearly Client Re-certification process involves re-certifying the client's eligibility for public housing assistance, and utilizes a combination of a paper-based process and the GRC system. The client obtains the form from the management office at the development they inhabit. The form is returned to GHAG for review and entry into GRC, and is then placed in the client's folder. The data from this process is utilized for Tenant Rent calculations and HUD subsidy calculations.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 1 month.

3.1.9       Quarterly HUD Reports

The Quarterly HUD Reports process involves creation of a series of reports required by the Federal Government's Housing and Urban Development (HUD) organization. These reports are created every quarter and are used to determine GHAG's eligibility for federal housing funds.

 

The Quarterly HUD Reports process utilizes the GRC system running on node 'ghag'.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 2 weeks.

3.1.10    Ad-Hoc Reports

The Ad-Hoc Reports process involves the creation of custom reports at the request of various GHAG departments. The exact purpose of the reports varies, and most are generally non-time critical.

 

The Ad-Hoc Reports process utilizes the GRC system running on node 'ghag'.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 2 weeks.

3.1.11    Work Order Processing

Work Order Processing involves the creation and management of repair work orders for the various housing developments. The following steps are involved:

 

        A request is made by a client at the development

        A work order is generated at the development

        On a daily basis, work orders are batched and printed at the development

        The repair is performed

        The information regarding the request if entered by the development manager into the GRC system

 

The efficiency of GHAG in performing repairs, especially emergency repairs, is one of the factors utilized to determine GHAG's eligibility for funding from HUD.

 

Work Order Processing utilizes the GRC system running on node 'ghag', and local printers at each development.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 2 weeks. Note that the critical information (i.e. mean time to repair) is retained by the paperwork process and could be entered into the GRC system at any time. This information is included in the Quarterly HUD Reports process discussed in Section 3.1.8.

3.1.12    Employee Information Update

 

The Employee Information Update process involves updating employee information in the GRC system. This information is required as part of the Payroll Processing process discussed in Section 3.1.1 and the Employee Compensation process discussed in Section 3.1.2. Having current employee information is critical in that it affects both of these processes.

 

The Employee Information Update process utilizes the GRC system running on node 'ghag'.

 

Based on this process's level of criticality, Delta estimates that the maximum acceptable downtime is 1 week.

3.1.13    Workflow Support Functions

 

Workflow support functions involve IT capabilities that support the various workflow processes, but are not entire workflows themselves. This includes:

 

        Email

        Document management (i.e. Word, Excel, etc.)

        File sharing

        Internet access

        General printing

        Backup

 

GHAG's email system is based on Microsoft Exchange, and runs on node 'ghagmail'.

 

Document management is generally accomplished on each user's individual client system, with critical shared files residing on node 'ghagfiles'.

 

File sharing is accomplished utilizing standard Windows-based file sharing, with the majority of shared files residing on node 'ghagfiles'. It should be noted that the policy of GHAG's IT group requires that all work-related data files must be stored on shared drives; the IT group provides no backup capability for local drives on client systems.

 

Internet access is provided via a connection to Gotham City Hall.

 

Printing is accomplished via several networked printers located at various places within the infrastructure.

 

Backup is accomplished via several different methods, depending on the specific node being backed-up.

 

Based on the criticality of these various support functions, Delta estimates that the maximum acceptable downtime for the functions as a whole is 1 day.

4        Issues

 

This section identifies issues that currently preclude developing a comprehensive and effective disaster recovery plan for GHAG, along with recommendations for addressing those issues. In addition, several general infrastructure issues are addressed.

 

4.1       Documentation

 

One of the most critical factors affecting the ability of an IT infrastructure to be reconstituted after a disaster is adequate documentation on how the infrastructure is implemented. This is necessary so that, in the event critical IT staff aren't available, individuals less familiar with the environment can quickly and effectively duplicate the required functionality.

 

Adequate documentation is currently a critical issue with GHAG's infrastructure. There are some high level documents showing the overall network and system layout, but many of the most critical components are not documented in sufficient detail. This includes:

 

        The exact configuration of the email server, including address books, configuration parameters, etc. This information would be necessary to reconstitute a functional email server for GHAG in the event of a disaster.

        The exact configuration of the GRC application on the ghag server. Based on Delta's analysis, it would be very difficult, if not impossible, to re-create the current configuration of GRC on another system without adequate documentation. Documentation should include a list of all files for the application and their installed locations, with particular attention paid to the data files. This would allow GHAG to extract just the data files from a backup tape and install them on the backup server to resume processing.

        A list of all applications currently in use, including name, version, configuration, etc., both on the server systems as well as the client desktops.

        A list of changes, patches and updates made to each of the servers.

 

Delta recommends that GHAG undertake an effort to fully document their current IT environment, and develop policies for ensuring that the documentation is kept up-to-date. Also, provisions should be made for ensuring that a complete copy of the current documentation is available at both GHAG's off-site disaster recovery location as well as the alternate on-site location. This documentation should be complete enough that a competent IT professional, without any prior knowledge of GHAG's infrastructure, could reconstitute the critical components of the infrastructure in the event of an emergency,

 

Given the size and complexity of GHAG's infrastructure, Delta recommends that GHAG utilize manual methods to collect the data and create a simple Microsoft Access database to contain and manage the data, along with a notebook for data that does not lend itself to a database format. While there are automated tools available that can perform some of this work, most of them are more complex and costly than required to accomplish the effort.

 

Note: GHAG has committed to undertaking an IT infrastructure documentation project.

 

4.2       Backup Sun Server

 

GHAG does not currently have a system available to act as a backup server for the GRC server that is critical to virtually all operations within GHAG. While another Sun box (an Ultra 5 workstation) is present on the site, this box, as it is currently configured, would not be an adequate replacement for the existing GRC server (a Sparc 1000E server).

 

A second associated issue is that it would be very difficult and expensive for GHAG to obtain another system configured exactly the same as the current GRC system. Much of the hardware has been discontinued by Sun, and obtaining an exact copy of the configuration on the used hardware market would be problematic at best. The lack of an exact copy of the current hardware would prevent GHAG from utilizing a full disk restore to create a backup copy of the GRC system.

 

Should GHAG require a new Sun system, even one not configured identically, the lead time could be anywhere from 48 hours to 2 weeks, depending on the current channel load.

 

To address these issues, Delta recommends the following:

 

        As an interim backup system, GHAG should upgrade the existing Ultra 5 system by adding more memory, disk storage, and any necessary tape drives. This system should be configured to run the GRC application.

        When possible, GHAG should acquire a second Sun server and configure it to be as similar to the existing GRC server as possible. GHAG may want to consider acquisition of a newer Sun system, use it as a replacement for the existing system, and utilize the current system as the backup.

        GHAG should document the existing GRC configuration (see Section 4.1), including the critical database and configuration files required by the application. This could potentially allow a selective restore of the necessary data and configuration files onto a non-identically configured Sun system. However, it should be noted that the complexity and distributed nature of the GRC installation may preclude the ability to effectively utilize a selective restore strategy.

        The backup server should be installed at GHAG's alternate site location (Delta recommends Broad St.). While this would require relocating the system from Broad St. back to Main St. in the event of anything less than a building level disaster, this approach would also provide the highest level of survivability.

 

4.3       Payroll Processing

 

Payroll processing is currently the most time-critical process within GHAG's infrastructure. A failure or disaster during critical times (i.e. Wednesday PM) could prevent GHAG from meeting its payroll obligations and potentially result in legal action against GHAG (note that in this context, Payroll Processing is assumed to include Employee Compensation processing). Delta estimates that the maximum acceptable downtime for Payroll Processing would be 4 hours. However, even with GHAG's current Sun maintainance agreement (Silver level support), there is no guarantee that the system can be recovered from even a single component failure in less than 4 hours (the 4 hours specified by the contract is response time, not guaranteed repair time).

 

Additional considerations for Payroll Processing that impact disaster recovery are

 

        The requirement for a specialized printer

        The requirement for specialized forms

 

The result is that the only viable way for GHAG to meet its disaster recovery requirements for Payroll Processing would be to have an exact duplicate of all of the current hardware, including the printer and forms, located at a remote site, with all payroll-related data being replicated in realtime via a high-speed network connection.

 

From a disaster recovery perspective, Delta recommends that GHAG consider outsourcing its entire payroll processing to a third-party company. This will greatly simplify GHAG's disaster recovery requirements and allow a larger window of recovery time (1 full day versus 4 hours). An additional benefit of outsourcing payroll processing is that doing so would greatly reduce the security requirements for the infrastructure. Payroll information is consistently one of the primary targets for security attacks - removing this processing from the infrastructure would eliminate the need for GHAG to provide protection for it.

 

An alternative approach to implementing a Disaster Recovery strategy that meets the requirements for Payroll Processing is the use of a realtime data replication strategy. Such a strategy would require that the backup system located off-site be continuously synchronized with data from the active GRC system as it changes. While this solution is possible, it would be very expensive. Some issues that would impact the expense and effort of implementing this type of solution include:

 

q      The need for a very high-speed data connection between the active system and the backup system

q      Software for data replication

q      The overhead for full-time management of the backup system in addition to the regular infrastructure

q      A second full-time license for both the GRC application as well as the database may be required

 

It should also be noted that GHAG is exploring the possibility of implementing a short-term temporary manual solution for processing payroll in the event of a disaster.

 

4.4       Paper Files

 

GHAG's reliance on paper files for client information has a dramatic impact on its ability to recover after a disaster. Even a floor-level disaster that impacts a floor the files are stored on could result in 6-12 months of recovery time for the client housing request process. In the event of a building-level disaster, Delta estimates that GHAG may not be able to fully recover all of the information stored in the files.

 

Delta recommends that GHAG begin to evaluate the feasibility of a paperless solution for all client information. This would allow the data to be backed-up and available in the event of a disaster.

 

As an alternative to a paperless office solution, Delta recommends that GHAG consider replacing the existing storage cabinets with fireproof/waterproof storage cabinets. This would allow the files to survive most types of disasters and allow recovery of the files once the situation has stabilized. Note that, with ruggedized cabinets, the paper file could potentially survive a building-level disaster.

 

A third alternative that would reduce the risk of a disaster would be to replace the portions of the water sprinkler system that currently cover the file cabinets with a gas- or dry foam-based fire suppression system. This would reduce the risk of damage or destruction to the paper files in the event of a fire of a fire or accidental discharge, but would not contribute to the recoverability of the files in the event of a floor- or building-level disaster.

 

4.5       Tape Transfers

 

GHAG current relies on tape transfers between itself and several external organizations. The result is a more complicated disaster recovery configuration (i.e. additional tape drives), and a more complex disaster recovery plan. In addition, tape drives, in particular 9-track tapes, are the weakest component of any IT system in terms of reliability and could result in increased downtime for the system.

 

Delta recommends that GHAG undertake an effort to migrate all data transfers between itself and external organizations to a file transfer via the Internet.

 

Note: GHAG has several initiatives underway to replace tape processing with file transfers.

 

4.6       Monitoring

 

The GHAG IT staff currently operates primarily in a reactive mode, responding to issues called in by the users. If a system were to experience a critical failure during the evening, the IT staff would not be made aware of this fact until the next morning, at which time recovery operations would be started. This results in increased downtime for the IT infrastructure overall.

 

Delta recommends that GHAG implement some simple low-level monitoring of all IT servers. This monitoring could be implemented utilizing simple PERL scripts, or by a commercial monitoring package (i.e. BMC Patrol, etc.). The monitoring should include, at a minimum, the following:

 

        Up/down status of all servers and network devices

        Parsing and monitoring of the error log files to track critical errors, allowing the IT staff to anticipate and correct failures before they occur

        CPU, memory, disk storage and network utilization for all servers to allow trend analysis of utilization rates

 

The monitoring system should provide for automatic notification messages to the IT staff in the event of a failure. This can be accomplished via a modem attached to the monitoring system.

 

At GHAG's request, Delta performed an analysis of various commercial monitoring packages that would provide the level of functionality required. Based on the cost and complexity of most of these packages, Delta concluded that GHAG would be better served with a simple, in-house developed solution based on PERL or some other scripting language. This can be implemented incrementally, adding additional functionality as necessary, for a relatively low cost.

 

4.7       Boot Drives Not RAID

 

Currently, none of the boot drives on GHAG's Windows NT servers are configured as RAID devices. The result is that the failure of a single boot disk could bring an entire server down.

 

Delta recommends that GHAG implement a RAID solution for boot disks on all servers.

 

4.8       Alternate Floor DR Site

 

GHAG has not currently designated an alternate location within Main St. to act as a backup data center in the event of a data center or floor-level disaster.

 

Delta recommends that GHAG consider designating the conference room located on the 11th floor as an in-building disaster recovery site for the data center. This location is the farthest away from the current data center and would allow for handling even multiple-floor disasters.

 

To support this location as an alternate data center, Delta recommends that GHAG:

 

        Run fiber to allow re-direction of the current external data links to this location from the 4th floor wiring closet.

        Prepare a detailed list of all hardware (i.e. systems, networking, etc.) that would be required to activate this location as an alternate data center.

        Prepare a floor plan identifying the location of all hardware in the alternate data center

        Ensure that adequate power outlets exist in the room to support the data center hardware as well as a room air conditioner.

 

Note: GHAG has committed to evaluating the various options for an alternate data center location.

 

4.9       Employee Pay Policy During a Disaster

 

GHAG's Human Resources department does not currently have a specific policy regarding employee payroll in the event of a disaster. Delta recommends that, as part of the overall disaster recovery plan for GHAG, HR develop and disseminate a written policy that covers each of the possible types of disasters. This policy would minimize any legal ramifications for missing payroll in the event of a disaster.

 

4.10   Licensing for Applications

 

In order to effectively recover from a disaster utilizing an alternate server for the GRC application, the application and all of its associated support software must be pre-installed and configured on the backup server. As part of this effort, Delta was able to determine that ECS corporation allows a single copy of the application to be installed and configured on a backup system for disaster recovery. However, Delta was unable to determine if the supporting software, in particular the Unidata database, provides the same level of licensing.

 

Delta recommends that GHAG work with ECS to determine the exact level of backup licensing provided for Unidata and any other required support software.

 

4.11   Forms

 

The current payroll processing relies on custom forms, in particular for timesheets, paychecks, tenant accounting and rental agreements. Should GHAG decide to retain payroll processing in-house, the payroll-related forms would need to be available at the alternate location in the event of a floor or building-level disaster.

 

Delta recommends that GHAG stockpile at least 1 months worth of all required forms at the Broad St. disaster recovery facility. Given the critical nature of these forms, Delta also recommends that they be stored in a secure cabinet.

 

GHAG may also want to consider the creation of a separate 'disaster' pay account. This account would not initially contain any funds; however, check forms would be printed for this account and stored at the Broad St. facility. In the event of a disaster, funds would be transferred into this account, which would then be used for payroll processing.

 

GHAG should also consider the replacement of standard hardcopy forms with laser printer generated versions. This could potentially eliminate the requirement for any custom forms and greatly simplify Disaster Recovery processing.

 

4.12   Backup Copies of Applications

 

All of the installation media for all applications currently utilized by GHAG reside in or near the data center on the 6th floor at Main St. In the event of a data center, floor or building-level disaster, GHAG would be required to obtain alternate copies of all installation media in order to reconstitute the backup systems.

 

Delta recommends that GHAG make backup copies of all installation media utilized in the IT infrastructure and store these copies in a secure location at the disaster recovery facility (Broad St.). Note that creation of copies of installation media for backup purposes is allowed by the license agreement of the majority of software vendors.

 

Also, Delta recommends that copies be made of any patches or updates installed and stored at the disaster recovery location.

 

4.13   Backup Processing

 

GHAG currently utilizes several different methods for backing up servers. This includes different software as well as different media. In addition to the complexity created by this approach, the current backup solutions are reaching the limit of their storage capacity.

 

An additional consideration for the current backup strategy is that backup tapes are kept onsite for 2 days to allow for quick recovery of deleted files. However, the result of this approach is that, in the event of a disaster, 2 days worth of data could be lost. Should a disaster occur on a Tuesday afternoon, virtually all of GHAG's patrol data for the previous week would be lost. Note that this is a consideration only if GHAG maintains payroll processing in-house. The impact is less critical for the other work processes.

 

Delta recommends that GHAG consider implementing a centralized backup solution. A package such as Veritas would reduce the complexity of the backup processing and provide for faster restores in the event of a disaster, as well as reducing the number of required tape drives.

 

Delta also recommends that GHAG consider a backup option that allows remote backups over the Internet. This would eliminate the requirement for removable media altogether and further reduce the complexity of the backup process. However, this approach can be very costly and may not be viable for GHAG based on this cost.

5        Initial DR Plan Overview

 

This section provides an initial overview of the DR plan proposed by Delta technology. Various components of the plan are discussed, and, in some cases, Delta has made assumptions regarding the resolution of the issues discussed in Section 2. Should Delta's assumptions prove incorrect, some parts of this overview may need to be modified.

 

5.1       Alternate Sites

 

GHAG has defined 2 alternate sites for Disaster Recovery purposes. The first is an alternate site within the Main St. facility, located in [TBD]. This site shall be used in the event of a localized disaster that disallows the use of the existing data center without affecting the entire building. The room will be pre-wired with a fiber connection leading to the 4th floor data closet to provide an Internet connection. Wiring for other connections (i.e. server and client systems) will be run manually utilizing CAT5 wire as necessary from this location. It is assumed that most of the users will be able to remain at their current work locations in the event of such a disaster.

 

For disasters that disallow the use of the entire Main St. facility, the alternate DR site is located at the Broad St. facility. The site consists of three rooms located in the southwest corner of the building on the first floor. These rooms are 2 conference rooms and a training room, and are located adjacent to each other. The current training room will be used as a data center. Wiring will be run from the existing frame connection to the training room to provide for an Internet connection. The remaining two room (conference rooms) will be utilized by employees equipped with notebook computers.

 

The Broad St. facility will also be used as a storage location for any DR components requiring off-site storage. This consists of:

 

        The backup Sun server for the GRC application

        A backup printer (132 column) required for printing checks

        Copies of all required forms

        Copies of installation media for all required applications

        A copy of the IT infrastructure documentation

 

5.2       Systems

 

GHAG utilizes three types of systems:

 

        A Sun server

        Various Windows NT servers

        Windows-based clients

 

The backup system for the Sun server will be the existing Ultra 5 server, configured with the GRC application pre-installed. The system will be stored at the Broad St. facility. In the event of a disaster affecting only the Main St. data center, the system will be returned to the Main St. facility and installed in the alternate data center there.

 

For the various Windows NT servers, GHAG shall rent the required systems until provisions can be made to purchase replacement systems. Delta recommends utilizing a local rental company such as 2000rents.com. For more information, please refer to the web site at:

 

http://www.2000rents.com/computer-rental-gotham.asp

 

For DR purposes, several of the existing Windows NT systems will be combined into two servers for the duration of the crisis as follows:

 

        Nodes 'ghagfiles' and 'ghagmail' will be combined

        Nodes 'ghagproxy', 'ghagintra1' and 'ghagldsql' will be combined

 

The number and configurations of the Windows NT server systems to be rented are:

 

        2 x 500MHz, 128MB, 10GB disk

 

For Windows-based clients systems, GHAG will rent [TBD - how many clients?] Windows notebook systems. [nn] of these notebooks will include external keypads to simplify the entry of financial data.

 

5.3       Network

 

For Internet connectivity, the alternate DR site within the Main St. facility will be pre-wired with a fiber connection to the 4th floor data closet. A backup Cisco router will be stored at the Broad St. facility.

 

For Internet connectivity at the Broad St. facility, GHAG will contact Verizon and have the Broad St./Main St. frame relay connection switched to City Hall.

 

Connectivity within the alternate DR site (i.e. between servers and between servers and clients) will be accomplished by running CAT5 wiring where necessary and through utilization of simple hubs/switches. These hubs/switches can be rented from the same company recommended for system rental (200Rents.com). Please refer to their web site at:

 

http://www.2000rents.com/computer-rental-gotham.asp

 

[Need to prepare a network diagram for the DR sites]

 

5.4       Physical Plant

 

In order to effectively run an alternate site as a DR facility, the site requires an adequate physical plant (i.e. power and environment) to support the servers.

 

Power will be provided utilizing the existing power infrastructure within both the alternate Main St. facility as well as the Broad St. facility. Given the reduced number of systems required to run in a DR mode, this should be adequate. However, spare surge-protected power strips should be provided for both sites. [If possible, GHAG should consider purchasing a spare UPS or two for the alternate sites]

 

To address environmental concerns, GHAG will rent a portable air conditioning unit for the duration of the crisis. [Delta recommends GHAG utilize a company such as US Distributing to rent a portable AC unit. A 14,000 BTU unit, utilizing 110VAC power, should be adequate and can be rented for approximately $600 per month]

 

5.5       Phones

 

[GHAG needs to determine phone requirements]

 

5.6       Elimination of a single server system

 

A single server system event is defined as the complete failure of a server system that cannot be resolved in the scope of existing maintainance contracts (i.e. complete destruction of a system). This type of event assumes that the existing data center remains viable for continuing operations.

 

In the event of a complete failure of the Sun server running the GRC application, the backup Sun system will be retrieved from the Broad St. facility and installed in the current data center. The latest backup tapes will be used to restore the GRC database to its last saved state, and operations will resume. The users should be notified that a reduced level of performance will exist until such time as GHAG can replace the Sun server with an appropriate system.

 

In the event of a failure of one of the Windows NT servers, GHAG will first contact the preferred vendor to determine the delivery time for a replacement system. Should delivery of a new system be possible within 24 hours, a new system will be ordered. The backups will be used to configure the new system upon arrival.

 

Should a replacement system require more than 24 hours, GHAG will contact a rental company [see Section 5.2] to obtain a temporary Windows NT server. The rental server will be configured utilizing the backup tapes and used until a permanent replacement system can be delivered.

 

5.7       Elimination of multiple server systems

 

Recovery from the elimination of multiple will follow the same process as defined for elimination of a single system (see Section 5.6) for each system eliminated.

 

5.8       Elimination of Network Components

 

Recovery from the elimination of network components will depend on the exact component being eliminated.

 

For elimination of a single switch located on the 1st, 4th, 6th or 9th floors, a replacement will be obtained from the Broad St. facility and installed.

 

For elimination of the central router, [the only viable solution here is to order a replacement, as GHAG does not have anything in-house that can perform the appropriate functions]

 

5.9       Elimination of the 6th floor data center

 

Elimination of the 6th floor data center assumes that a significant percentage of the hardware in the data center has been destroyed or rendered inoperable, and that the data center itself is no longer a viable location to support ongoing IT operations. It further assumes that the event has not significantly affected any location outside of the data center itself.

 

In the event of this type of failure, the following process shall be implemented:

 

        Backup Windows NT servers will be ordered from the preferred rental vendor for immediate delivery. Upon delivery the systems will be configured utilizing the infrastructure documentation and backup tapes.

        Backup network hubs/switches will be ordered from the preferred rental vendor for immediate delivery.

        A room air conditioner will be ordered from the preferred vendor for immediate delivery. Upon delivery, the air conditioner will be installed in the on-site backup data center facility.

        The most recent backup tapes will be requested from Iron Mountain.

        The backup materials located at Broad St. (i.e. the Sun server, printer, forms, etc.) will be relocated to the alternate in-house data center site at Main St. [Need to decide which location]. This does not include any materials that were not originally stored in the data center and are still available on-site.

        The backup cable from the 4th floor data closet to the alternate in-house location will be connected in place of the original, with the other end connected to the backup router in the on-site backup data center. A system will be plugged into the router to verify connectivity.

        The backup Sun server will be configured with the last saved data from the most recent backup tapes.

        An internal network between the various servers and the router will be configured.

        CAT5 cables will be run as necessary to the various client systems.

        The IT staff will continue to utilize their existing work area on the 6th floor.

 

 

5.10   Elimination of the entire 6th floor

 

Elimination of the entire 6th floor data center assumes that a significant percentage of the hardware in the data center has been destroyed or rendered inoperable, and that the data center itself is no longer a viable location to support ongoing IT operations. It further assumes that the event has rendered the rest of the 6th floor non-viable for any human occupation.

 

In the event of this type of failure, the following process shall be implemented:

 

        The alternate in-house data center [need a location] at the Main St. facility will be readied for operations.

        Backup Windows NT servers will be ordered from the preferred rental vendor for immediate delivery. Upon delivery the systems will be configured utilizing the infrastructure documentation and backup tapes.

        Backup network hubs/switches will be ordered from the preferred rental vendor for immediate delivery.

        A room air conditioner will be ordered from the preferred vendor for immediate delivery. Upon delivery, the air conditioner will be installed in the on-site backup data center facility.

        The most recent backup tapes will be requested from Iron Mountain.

        The backup materials located at Broad St. (i.e. the Sun server, printer, forms, etc.) will be relocated to the alternate in-house data center site at Main St. [Need to decide which location].

        The backup cable from the 4th floor data closet to the alternate in-house location will be connected in place of the original, with the other end connected to the backup router in the on-site backup data center. A system will be plugged into the router to verify connectivity.

        The backup Sun server will be configured with the last saved data from the most recent backup tapes.

        An internal network between the various servers and the router will be configured.

        CAT5 cables will be run as necessary to the various client systems.

        The IT staff will utilize available space close to the alternate data center as a work area.

        When and if possible, the file cabinets containing GHAG's paper files will be re-located to Broad St.

        Work order operations at the remote sites (i.e. the developments) will be performed manually until such time as a fully functioning data center, with full frame-relay connectivity, can be re-constituted. Any data entry required for the GRC system from these locations will be performed manually either via telephone or by having a representative from the remote location come to Broad St.

 

5.11   Elimination of the entire building

 

Elimination of the entire building assumes that access to all IT resources within the building has been lost and that personnel cannot enter the building to work.

 

In the event of this type of failure, the following process shall be implemented:

 

        IT personnel will re-locate to the Broad St. facility in preparation for activating the alternate data center located in the existing IT training room at that facility.

        Backup Windows NT servers will be ordered from the preferred rental vendor for immediate delivery. Upon delivery the systems will be configured utilizing the infrastructure documentation and backup tapes.

        Backup client notebook systems will be ordered from the preferred rental vendor for immediate delivery. Upon delivery the systems will be configured utilizing the infrastructure documentation and stored copies of installation media.

        Backup network hubs/switches will be ordered from the preferred rental vendor for immediate delivery.

        A room air conditioner will be ordered from the preferred vendor for immediate delivery. Upon delivery, the air conditioner will be installed in the on-site backup data center facility.

        Verizon will be contacted to re-direct the frame link from Broad St. to City Hall.

        The most recent backup tapes will be requested from Iron Mountain.

        The Sun server will be moved to the alternate data center location.

        Other backup materials located at the Broad St. facility will be obtained as required.

        A cable will be run from the incoming frame relay location to the alternate data center and connected to the backup router. A system will be plugged into the router to verify Internet connectivity.

        The backup Sun server will be configured with the last saved data from the most recent backup tapes.

        An internal network between the various servers and the router will be configured.

        CAT5 cables will be run as necessary to the various client systems. The client systems will be installed in the two conference rooms adjoining the IT training room.

        [Need to figure out telephone requirements]

        The IT staff will utilize space in the alternate data center as interim office space.

        [Need to define a method to disseminate the alternate location to current and future clients]