|
In-depth analysis of server fault tolerance technologyTime:2023-05-08 Servers have higher availability and reliability than PCs.With the deepening of informatization and the advancement of IT informatization of key business platforms, servers are facing the heaviest pressure in history, especially in ISP, NCP, finance, telecommunications, securities, energy, scientific research and other industries and departments, which constantly pose challenges to servers. This challenge is essentially 24/7 stable operation.How to ensure that the server can operate normally in an emergency situation and ensure that the failure will not bring business interruption has become the top priority of server fault tolerance technology. "Fault tolerance", as the name implies, is the server's ability to accommodate and correct errors and faults generated in system operation, which is the goal of server stability pursued in enterprise applications.The so-called 99.999% is a direct embodiment of the high stability of the server system.Fault-tolerant servers that tolerate certain errors (failures) often have functional modules that automatically repair and support redundancy.When errors or failures occur, these faulty parts can be repaired or switched in time to ensure uninterrupted server operation.At present, the fault-tolerant technology of servers mainly focuses on three types: server cluster, dual-machine redundant backup, and single-machine fault-tolerant technology. Server fault tolerance technology did not appear in recent years, it has appeared and been applied as early as the eighties of the last century.Server fault tolerance technology did not appear in recent years, it has appeared and been applied as early as the eighties of the last century.In the eighties of the last century, the first generation of fault-tolerant technology began to enter the commercial field.At that time, it was mainly used in finance, telecommunications, securities, aviation and other industries. Subsequently, the server fault-tolerant technology has been further developed, and has successively experienced the development of second-generation I860, third-generation HP PA-RISC, and fourth-generation IA architecture fault-tolerant technology.The server fault tolerance technology currently talked about is actually more for a single server.This method is less costly, more fault-tolerant than other methods, and can meet the needs of most users.Next, we'll focus on single-machine and dual-machine (redundant) fault-tolerant technologies. As we mentioned earlier, server fault tolerance technology is mainly composed of server cluster, dual-machine hot backup and stand-alone fault tolerance technology.Among the three server fault tolerance technologies, they are progressing from low to high, that is, the single-machine fault tolerance technology is the highest, and the cluster technology is the lowest. Dual-machine hot backup technology is a system-level fault-tolerant technology, that is, it adopts a software-hardware integration to achieve fault tolerance.Generally, they add an additional shared disk array to the two servers, or RAID arrays in the two servers, and are implemented by the corresponding two-machine hot backup software. The dual-machine hot standby fault-tolerant technology is mainly a "double-insurance" mechanism to ensure that any server fails, and the other machine switches over in time to ensure the continuous operation of the business.However, because this method often requires another server to be in a backup state at all times, there is a certain waste for the investment of hardware facilities and the utilization of computing resources. In contrast, single-machine fault tolerance technology is mainly achieved through component redundancy.The fault tolerance of this single-machine fault-tolerant technology is higher than that of server clusters and dual-machine hot standby. Fault-tolerant servers typically perform redundant backups of cpus, memory, disks, network CARDS, and even power supplies without causing system downtime or data loss in the event of any component failure.Many x86 servers based on industry standards can implement this kind of redundant fault-tolerant mechanism in a more cost-effective way. Fault-tolerant servers typically implement redundant backups of CPUs, memory, disks, network cards, and even power supplies, without system downtime and data loss in the event of a problem with any component.Many x86 servers based on industry standards today can implement this redundancy fault tolerance mechanism, and it is done in a more cost-effective way. Fault-tolerant servers are designed and synchronized with redundant hardware components to ensure minimal impact from failures.At present, fault-tolerant servers mainly revolve around processors, and for now, many server vendors have their own fault-tolerant servers. For example, HP provides a nonStop (including NonStop S and Integrity NonStop) series of servers with mission-critical fault-tolerant technologies, which are divided into two categories according to different processors, namely NonStop S with MIPS and Integrity NonStop servers with Intel Itanium chips. Integrity NonStop has a lot of new designs, and its product family is divided into entry-level, mid-high-end, and highest-end servers.Last year, HP also expanded the Itanium server family, introducing the NS2100 and NS2200 for heterogeneous environments. There are also two well-known fault-tolerant server vendors, including NEC and Express5800/ft servers and Stratus' ftServer servers.The latter has mature experience in the field of fault-tolerant server technology, and has developed server products based on different processors such as Motorola M68000, Intel I860 chip, HP PARISC, and VOS proprietary operating system.Later, the company gradually adopted Linux, Windows and other general-purpose platforms instead of dedicated VOS operating systems to reduce the application cost of fault-tolerant servers. Through its investment in Stratus, NEC has acquired and adopted a similar strategy for the development and promotion of fault-tolerant servers.In the field of fault-tolerant technology, NEC introduced the first fault-tolerant server based on IA architecture as early as 2001.Its Express5800/ft series achieves 99.999% reliability on Windows and Linux platforms, and this real-time protection technology comes from STRATUS Continuous Pro-cessing Design.(Fundamentals of Continuous Pro-cessingDesign). At present, fault-tolerant technology has gradually transitioned from traditional key application industries such as telecommunications, securities, and finance to basic industries, such as manufacturing, energy, logistics, transportation, and so on.In addition, fault-tolerant servers will pay more attention to TCO total cost of ownership, and more users will abandon the traditional dual-machine hot standby method to maintain complex cluster servers in favor of server platforms with fault-tolerant technology. |