Mission Critical: Basic Concepts

Have you ever wondered what would happen if the system of your bank to stay hours without work? What would the consequences be if your Internet service provider Internet access lost data of their customers? Already he thought about the chaotic situation in the city of São Paul would live if the computers of the metro system simply stop? For many companies and sectors of activity, the use of systems computing is essential to the maintenance of the business. If such a system is the victim of a failure that disrupts its operation or causing the loss of important data, the company can simply fail. To avoid this kind of inconvenience, such companies “assemble” their systems as being of mission critical, a concept that explained in the following.

What is mission critical

In a nutshell, mission-critical it environment technological built-in to avoid the hang services computing, and the loss of important data to a business. For this, a series of equipment and technologies is applied to the environment.

What determines which type of equipment and what type of technology will be used in an environment of mission-critical level is the level the importance of the business and the operation. Ifthese aspects are not well worked out, a company can invest most of what you need in this area, or, in the worst case, invest less, which can mean that the little investment made little thanks.

For you to better understand, imagine the following situation: a chain of stores has locations in major shopping malls in the country. It is possible that the system of one of the shops to stop working for some reason. The problem is that this downtime affects immediately the company, because the customers are in the box waiting for care and, soon, many others will do the same. Until an IT staff investigate the problem and make the repairs necessary, a very long time will be spent and customers will go to a competitor’s store, and certainly not return more, since it will store the image of a service is of poor quality. To avoid this type of situation, the store can take a series of measures. One of them is to allow the system continues to operate even if you lose the connection with a central base. Another possibility is to make the system of the branch office closest to continue operations while the system crashed is checked. Another idea is to make use of redundant equipment.

When we refer to the operation and stoppage of a system, it is important to consider two terms: uptime and downtime. The first indicates the time at which a system is available. The second indicates the time in which a system is out of use.

Fault tolerance and high availability

As previously stated, a company needs to assess the level of criticality of its operations to determine how much to invest in an environment that is mission-critical. In the case of an operation level critical very high, you can make use of the equipment and systems known as “fault-tolerant”, or, in English, “fault tolerance”. With equipment of this type, there is always another that is in the rear, that is, if the master stops working, a second immediately takes over the operation.

Another important concept is the “high availability” or “high availability”. In equipment of this type, usually there is no machines in the rear, at a maximum, there is mirroring of the Drives (as the RAID). However, such equipment are developed to have the lowest risk of failure as possible.

In high availability systems, it is often use as a measurement the value of uptime corresponding to 99.9% a year. This means that, as the year has 365 days – 8760 hours-the system needs to operate for at least 8751 hours, as this rate amounts to 99.9%. In other words, for a system of high availability do live up to their name their downtime has to be up to 9 hours per year. However, these values may vary according to the system used.

If a system has a level of complexity so high that almost can not stop working, the ideal is to make the use of fault tolerant systems, since the uptime of these matches to 99.999%, that is, this system works, at least for 8759,91 hours (8760) per year. This means that systems of this type virtually do not stop.

It is important to make clear that, when dealing with high availability and fault tolerance, the above approach does not consider the time outages, scheduled, for cases in which the servers go into maintenance, for example.

Scalability

In mission-critical environments it is important to work so that the systems will not cease to work only for faults and errors, but also not be paralyzed by overload. If, for example, the website of the Revenue Federal is able to receive thousand Tax declarations Income per hour is necessary to observe if this limit is not being reached. If this is happening, it should be increase the capacity of the system, otherwise, the servers will be so overloaded that almost no one will be able to make the statement.

On the other hand, it is wasteful to spend with systems that have a capacity too high, and that will not be used. For example, if the AbbreviationFinder spends monthly traffic of about 25 GB, what for use servers that support the monthly traffic of 1 TB?

These questions are answered with the concept of “scalability”. It is about the possibility of a system to expand capacity as the need.

First of all, the company needs to evaluate which of the possibilities to increase the use of their systems. From there, one should create conditions for that the capacity is increased according to the need. For example, the company can acquire equipment that support 4 processors. Only instead of using 4 of these chips, you can only use 2 and add the other if necessary. Another solution quite interesting is to make use of clusters and increase the amount of machines when needed.

To build an environment of mission-critical, not just think of the computers that will be part of the system, but also in the location where it will be the environment and access to it.

To begin with, it is ideal that the computers are in a living room with fire protection and air conditioning appropriate. If this room is in the basement, is also important that it is protected against flooding.

The access should also be controlled. If an employee works with customer support, there is no reason to he have access to the server room. In addition, people authorized can obey a policy which should give the satisfactions on what was done in the living room. If an authorized employee exit of the company, their access passwords should be deleted, to avoid that it be able to access the system remotely.

The provision of the equipment and cables must be well-planned also. For example, cables should not be exposed, otherwise, a person may trip over them. It should be make sure that the cabinets support the weight of the equipment and, in addition, it is necessary that they are well established,because if, for example, a person fall on the cabinet, this does not it will be dropped. The question of disposal still you should consider the possibility of removing or adding equipment no others are to be turned off.

Another key issue is the electrical energy. In addition nobreaks – equipment with a battery which allows operation in of the computer when the main power source is cut off – is necessary to evaluate the need for installation of energy generators. This is essential to maintain the apparatus from the ICU of a hospital working, for example.

Obviously, the issue of security is not limited to the physical aspect. The systems need to have firewalls, IDS (Intrusion Detection System – Detection Systems of Intruders), encryption, control access by user levels, etc.

The issue of security is so important that large companies no longer centralize its operations. For example, a multinational can replicate your systems in branch offices of other countries. Thus, if any unit stop working – for example, a terrorist attack or in an accident related to the environmental, like a hurricane – the business of the company will not be stopped.

If a company finds that it will incur extremely high costs to working with the aspect of security, an alternative is use the services of IDCs (Internet Data Centers), as the companies Optiglobe, Embrateland Intelig. These companies have environments that respond to all requirements of security and provide services such as: colocation (the client “rent” the space and the means of communication to install their equipment), dedication (the IDC takes over the entire operation of equipment), etc.

The volume of data increases each day in the companies and for the case of the main point of a business, the treatment of this question it is also considered in mission-critical environments. As mentioned in the beginning of this text, what would happen if a the bank lost customer data? What are the consequences of an online shop lose all of the data relating to the sales of the day? There is still the fact that it’s not enough just to have the data also you should allow access to them when needed and in a time satisfactory. You should already have noticed the size of trouble…

To deal with these aspects, companies are seeking solutions of storage, that is, data management. Two of them are the (SANStorage Area Nfor network) and IN (Nfor network Attached Storage). The first consists of a network of devices data storage is managed by servers under a network of high speed, such as Fibre Channel (Optical Fibre), and iSCSI. The second it is a set of media storage integrated into a LAN network (Local Area Nfor network) already existing.

The SAN is suitable for situations where data needs to be securely stored and accessible in a timely manner. A SAN allows sharing of storage devices between the various servers, are they in one location or arranged remotely.Once that are made up of high-speed networks, the SANs can even avoid bottlenecks on the network, once they are able to work with large volume of data. Among the companies that offer solutions SAN are IBM, HP and Itec.

In turn, the NASs are the simplest solutions that the SANs, since its implementation occurs on the networks already existing. In cases of criticality greater, the solution in IN you can count on a dedicated channel (exclusive) access to the network. The great advantage of the solutions of this type is the share easy data between servers and machines-the client, the same when there are different operating systems between them. Companies as IBM, HP, EMC and Sun offer solutions in IN.

Ending

The technologies and resources related to the concepts of mission critique is not limited to those listed here. The subject is more complex, to the point of practically not exist experts in mission-critical, but in some of the related technologies. As the computational needs vary from company to company, it is it is necessary that each one identify with clarity which segments are operating that can be considered critical to so apply their corresponding solutions. In the information age in which we entered, what you can’t is to relax as the this aspect. No system is foolproof and not there is nothing 100% safe. This is why it is a mistake to limit yourself to a solution or not to consider a risk just because it is minimal. This, perhaps, makes it clear that the biggest problem is in the aspecthuman, highly able to underestimate or expect something bad to happen to to take any action.

Mission Critical Concepts Basic 1