Fujitsu Develops Industry's First System-Failure Management Technology for Cloud Computing Era

Tuesday, 23 February 2010, 13:51 HKT/SGT

KAWASAKI, Japan, Feb 23, 2010 - (ACN Newswire) - Fujitsu Laboratories Ltd. today announced the development of technology that will enable the company to implement the Trusted-Service Platform it has been advocating for cloud computing services, in view of the shift toward the era of cloud computing. As an industry first, Fujitsu has developed technologies that can detect system failures before they happen, by improving the ability to analyze cloud system data and gather information, narrowing down the causes of failures, and automatically resolving them. Cloud systems play an important role in supporting various societal infrastructure systems and must be able to continuously deliver services without interruption. Even in the event that a failure does occur, services must not be interrupted. Through Fujitsu's new technology, it is possible to address cloud system failures before they occur. Furthermore, because failures can be automatically resolved, the technology reduces the workload of administrators and delivers cloud services that users can utilize with confidence.

Cloud computing is a delivery model whereby remotely located IT resources, such as servers, storage, networks, middleware and business applications, are provided as services over the Internet. Users have the ability to use the functions they need, in whatever amounts they need, and only when they need them.

In addition to its use as a platform for further enhancing work efficiency and productivity, cloud computing is also employed as a system for supporting various societal infrastructure systems, like those used in entertainment or lifestyle-related applications. In order to support the creation of a human-centric networked society, whereby IT is employed with an emphasis on people (information and communication technologies, "ICTs"), there is a need for cloud systems to continue delivering secure and stable services non-stop.

Traditionally, many companies have addressed system failures immediately after their occurrence. However, because companies cannot afford downtime for cloud systems that play an important role in supporting infrastructure systems, a different approach is required. In addition, large-scale systems thus far have ensured the continuous operation of services through expensive, redundant configurations. In order to deliver high reliability and stability to cloud systems - which aim to operate economically - what is needed is technology that can predict and resolve failures before they emerge.

Technological Challenges

Cloud computing systems have the following characteristics:

1. Large-scale:

When companies take existing systems that operate independently and consolidate them into data centers and enterprise IT systems, the scale of the systems increases.

2. Complexity:

When companies employ virtualization technologies and operate numerous services on the same physical server, system configurations and system dependency relationships become complex.

Given the aforementioned characteristics, when a failure takes place in a cloud system, it can affect various services, in addition to requiring a great deal of manpower and time to locate where the failure has occurred.

Newly-Developed Technology

In order to provide highly reliable and stable services via cloud computing, Fujitsu Laboratories developed a technology that detects failures and averts them before they occur. Specifically, the technology monitors the system, predicts failures, narrows down their causes, and quickly resolves them.

1. Detection of signs of failure (Prediction)

Fujitsu Laboratories has developed two technologies to detect signs of failures depending on the type of failure.

(1) Detection of failures through the analysis of system messages:

This technology focuses on specific patterns in messages that are generated just before failures occur and detects warning signs. By comparing the pattern of generated messages with messages from previous system failures, the technology can pick up on signs of failure. By employing Bayesian learning* methods to assign weights to example data from previously generated message patterns, the system can detect signs of failure with great accuracy.

(2) Detection of potential failures that do not generate messages:

When configuring equipment such as servers, human error can lead to the input of incorrect settings. In this kind of situation, the server will operate according to the settings and may not generate any error messages. An effective method for detecting failures in this instance is to gather and analyze data packets that travel across networks that link servers and systems, and then analyze minor changes on the packet level - such as data loss, resent packets and transmission delays. In order to monitor large-scale systems that are involved in cloud computing, Fujitsu Laboratories has developed a technology that is compatible with 10Gbps high-speed communication technology, and which detects network and server system failures in real time.

2. Narrows down causes of failures

The technology scans through detected signs pointing towards system failure and makes inferences about the most likely areas that have generated these signs. Using the observed symptoms as a point of origin, the technology employs network and system configuration information to trace the symptoms' causes. It then overlays the results of evaluations taken from multiple points of origin, generating inferences about the most likely causes based on the areas with the most overlap or with no proper activities.

3. Resolves causes of failures

The system leverages past knowledge of how to deal with system failures, including system log information, and presents administrators with the most suitable methods for dealing with the determined causes of the failures. Due to the fact that previous failures will often occur again, the system stores previous cases of system failures and the procedure history to resolve them in its knowledge base, so that it can quickly determine a solution in order to resolve the cause of the failures.

Results

With this new technology, Fujitsu is able to quickly address cloud system failures, allowing the delivery of high-reliability, continuous-operation cloud system services to its customers.

In its own internal systems that employ the technology, Fujitsu has been able to detect instances of mistaken system settings prior to errors actually occurring. In addition, Fujitsu has been able to reduce the average time required to resolve failures from an average of 15 minutes to approximately one (1) minute.

Future Developments

Fujitsu plans to gradually deploy this technology in its On-Demand Virtual System Services and LCM services, on its Trusted-Service Platform.

* Bayesian learning: A probabilistic method for estimating the cause for an event based on evidence. Fujitsu Laboratories' application of Bayesian learning in its technology has achieved a failure detection rate of 96.2% after training an example of a failure 10 times.

Contact:

Fujitsu Laboratories Ltd.
Cloud Computing Research Center
Tel: +81-44-754-2575
E-mail: cloud-mate@ml.labs.fujitsu.com

Topic: Press release summary
Source: Fujitsu Ltd
Sectors: Electronics
https://www.acnnewswire.com
From the Asia Corporate News Network

Fujitsu Ltd Links

http://www.fujitsu.com

https://plus.google.com/+Fujitsu

https://www.facebook.com/FujitsuJapan

https://twitter.com/Fujitsu_Global

https://www.youtube.com/user/FujitsuOfficial

https://www.linkedin.com/company/fujitsu/

Fujitsu Ltd Related News

2025年4月28日 11時00分 JST

富士通、光ネットワークの発展に貢献したデジタルコヒーレント光受信技術により、令和７年春の褒章において紫綬褒章を受章

Thursday, 24 April 2025, 16:24 JST

Fujitsu launches new company 1FINITY to strengthen network products business

Wednesday, 23 April 2025, 11:55 JST

Fujitsu expands strategic collaboration with Supermicro to offer total generative AI platform

2025年4月22日 11時00分 JST

富士通と理研、世界最大級の256量子ビットの超伝導量子コンピュータを開発

Tuesday, 22 April 2025, 11:37 JST

Fujitsu and RIKEN develop world-leading 256-qubit superconducting quantum computer

More news >>


Home \| About us \| Services \| Partners \| Events \| Login \| Contact us \| Cookies Policy \| Privacy Policy \| Disclaimer \| Terms of Use \| RSS

US: +1 214 890 4418 \| China: +86 181 2376 3721 \| Hong Kong: +852 8192 4922 \| Singapore: +65 6549 7068 \| Tokyo: +81 3 6859 8575