Network outages create problems. They stifle productivity and can lead to lost revenue when infrastructure goes down. They annoy management, employees, customers, and of course, the IT staff responsible for keeping the network up and running in the first place. Outages can occur for a variety of reasons, but there are 10 issues you need to keep top of mind.
Top 10 Reasons for Network Downtime
1. Human Error
Computers don’t make “mistakes,” but people do. Where does human error stack up on the list, and what can you do about it?
A 2016 study set out to learn how network professionals deal with increasingly complex networks even while they keep them running securely, performing as they should, and maintaining compliance with established IT department rules. It questioned 315 IT personnel consisting of individual IT techs and admins, supervisors, and executives. Key findings include:
- Nearly all (97%) acknowledge that outages arise from human error.
- About half (52%) point to human error as causing “only a few” outages, while 25% say it’s the cause of “frequent” downtime. Another 18% attribute “most network outages” to human error.
Another study finds that 75% of downtime can result from human error. Those errors may come from lack of training, or simply from techs who are in a rush, who may be tired, who may be distracted or stressed, or who tried to take a shortcut that didn’t work out.
A mistake can be as simple as pulling the wrong plug or not knowing a proper procedure. As networks grow and become more complex, more planning, written procedures, and checklists are necessary to avoid simple but costly mistakes. Contracting with a NOC to develop a runbook and monitor your network can help reduce the frequency of human error.
2. Understaffed IT Departments
Keeping a company’s network, servers and applications all running smoothly takes a concerted team effort. If a company doesn’t have sufficient head count to monitor the network 24×7 or to manage updates, network downtime is likely. The simplest solution, which offers many far-reaching benefits, is to hire a professional NOC to monitor your network and provide remediation services if problems do occur.
3. Old Equipment or Applications
The older your network equipment or applications, the more likely they can trigger an outage. For instance, frequent updates to software in the WISA, LAMP, and Java application stacks can gradually demand more from older equipment that was designed long before some of the latest O/S updates. Servers that ran reliably just a few years ago may no longer have the capacity to run today’s more elaborate O/S software. In such cases, performance suffers and crashes occur.
Every outdated, obsolete, or unsupported device or application is a potential threat to your network’s functioning. If your business uses old hardware or software, it’s time to take inventory and be proactive in planning upgrades where needed.
4. Server O/S Bugs
As noted above, bugs and vulnerabilities in the server O/S can lead to performance as well as security issues. Everyone in IT knows it’s important to keep the O/S up to date, but too often patches are not applied on a timely basis.
Even worse, when a patch is applied that hasn’t been fully tested, applications can be corrupted or rendered inoperable, bringing the network or segments of it to a halt. Managing bugs and patches to the server O/S is another issue a NOC can help you manage.
5. Incorrect Configurations
Device configuration changes can create outages if done incorrectly. The University of Michigan conducted a year-long reliability study of IP core routers and found that router problems caused almost 16 hours of downtime. More than one-third (36%) were the result of configuration errors, software and hardware upgrades. The University of Michigan study cites other findings that router software problems are the biggest cause of outages and contribute up to 25% of all outages.
6. Incompatible Changes
Unlike configuration changes made in error, these problems arise when a change you intend doesn’t work properly alongside your other equipment. Your NOC can help you verify that your network is functioning properly after a planned change.
7. Hardware Failures
Even as engineering and design best practices have extended the MTBF for every kind of network device in recent years, equipment is bound to break down at some point. It’s usually unpredictable even when you’re running advanced analytics and AI apps that try to predict failures. And, as we’ve noted, outdated hardware is particularly vulnerable to failure.
Networks generally have a degree of redundancy built in, but it’s not uncommon to find hardware and device configurations that offer a single point of failure. For example, a server with a single power supply, rather than redundant power supplies, can lead to an outage. A UPS battery that’s past its serviceable life expectancy can do the same when a power blip occurs.
8. Software Failures
The outages caused by software failures have become legendary. Millions of customers of TSB Bank were locked out of their accounts for weeks. Six times in one year, British Airways suffered a global outage; one caused the cancellation of more than 1,000 flights. The Federal Communications Commission found an entirely preventable software error was the root cause of failed emergency 911 services across six states, which left 6,000 callers without help.
9. Power Failures
Power failures happen every day and often affect millions of people beyond the data center. The largest to date occurred in India in July 2012, leaving 620 million people (about 9% of the world’s population) in the dark.
You need to safeguard against both short-term and long-term disruptions with UPS and generator systems that are regularly tested and maintained.
10. Natural Disasters
Hurricanes, severe storms, and other natural disasters disrupt power services, communications, and make transportation difficult or impossible. Networks, along with many other services, suffer in the aftermath of a natural disaster.
IT managers live in dread of downtime. When a network does go down, it carries many of the same unwanted effects that losing water, heat, or electricity would bring. In some cases, such as outages at a medical facility, peoples’ lives can be at risk.
An outage also hurts the morale of the IT team whose job is to diagnose the problem and restore service as quickly and fully as possible. While there is no universal solution to preventing all outages and downtime, partnering with a professional NOC can drastically reduce the duration of an outage or prevent it entirely. A NOC can also save your team the stress that responding to a major outage always brings. Contact iGLASS to learn how a U.S.-based NOC can become one of your most valued partners.