Where We’ve Made A Difference (War stories from the trenches)
iGLASS helps companies around the world improve their network uptime with our network and server monitoring services. But you don’t have to take our word for it. Here are some real-world examples, or war stories, of where we've made a difference:
Logging in the New Year
In the wee morning hours of Jan. 1 one of our small business customer’s key servers developed a problem with logging. Instead of deleting older log files every couple of hours (as it was supposed to), the log files stuck around and the hard disk became critically full. iGLASS was monitoring the disk, detected the problem and our NOC staff called both the customer’s IT staff and the application vendor, giving them the time they needed to take corrective action, free up some space and avoid a costly outage. If the hard drive had filled up, the system would have crashed, directly affecting revenue. The post-mortem analysis showed that the process designed to clean up the logs had been hard-coded to work in the previous year. It was now a New Year!
Always test your Generators
One of our cable customers had planned ahead and installed UPSs and a backup generator at each of its key hub sites. During a recent power outage, a backup generator failed to start, putting their systems at risk. But because iGLASS proactively monitors UPSs and generators, we immediately alerted the customer and they had a local technician start the generator manually before the UPS ran out of juice. iGLASS saved the company from lost data, lost revenue and the massive headaches that accompany a total outage. The customer followed up by implementing a weekly generator test.
Late one evening in June, one of the Ethernet interfaces on a core router shut down. Our NOC notified the customer who then dispatched a technician to investigate. It turns out a piece of paper had been sucked against an air intake vent on the chassis, causing a card to shut down due to high temperature. The shutdown saved the card and iGLASS enabled the customer to minimize the downtime. Once the card cooled down, service was restored (and the offending paper was introduced to the shredder!)
Copper Caper Causes Customer to Call Cops
A cable operator in Indianapolis had a remote hub site housed inside a strip mall. iGLASS monitored a spike in ambient temperature from within the hub and immediately notified customer technicians. At first they thought it was a false alarm since they hadn't heard from their security monitoring firm (who was supposed to monitor environmental alarms too) but when the temp hit 109F, they dispatched someone to check it out. Upon arrival at the site the tech discovered some brazen thieves had harvested all of the copper off of the primary and backup A/C units causing a cooling failure. He opened the doors and turned on some fans, avoiding a potentially catastrophic equipment failure and customer outage. Cops were called but the suspects are still at large.
Don’t Forget about Set-tops!
A cable customer in Alabama had a digital video outage but none of their other tools raised the alarm. Well, none of their other tools monitored set-top boxes! They used to think monitoring their DOCSIS plant was sufficient. It wasn’t until their DMOD locked up and stopped issuing IP addresses that they truly realized the value of monitoring their triple-play environment. Don’t forget the set-tops!
A customer in Mississippi had a complete power outage at a remote hubsite. iGLASS NOC engineers let them know within minutes and they sent a technician to investigate. Upon arriving at the site, the tech was surprised to find the crispy remains of a 7-ft python inside one of their circuit breaker cabinets. No doubt the python was also surprised, albeit briefly, when it made contact with two terminals which would have been better left alone. The critter was removed, breakers reset and power restored with minimal disruption as (thankfully) the outage occurred in the middle of the night. A missing conduit knock-out on the breaker cabinet was also closed up to avoid a repeat performance.
The Power of Syslog
A customer in Rochester, NY is one of the nation's largest metro-area fiber optics broadband providers to ISPs throughout the northeast. While iGLASS was monitoring their core router for availability, bandwidth by VLAN, CPU and other key data metrics, the status of internal power supplies usually isn't done proactively. Because iGLASS will receive and process SNMP traps and Syslog however, we had them covered. On a Thursday in November Power Supply B died on the router and we alerted the customer within minutes. They didn't have a spare, so they ordered one and had it shipped overnight. They swapped it out on Friday and then on Friday evening Power Supply A died! It turned out they had a power issue coming into the cabinet (it was running Hot!) and a quick visit by an electrician squared them away. If iGLASS hadn't alerted them to the power issue however, the outage could have affected literally thousands of customers. A great example of why monitoring Syslog (and SNMP traps) is so important!
VOD-addicted middle-schooler Busted!
A cable MSO in Austin, TX came to us with a problem. A single, working lady received her cable bill and it reflected over $100 worth of VOD purchases. The problem was she never watched VOD. Because iGLASS monitors VOD utilization down to specific converters, we were able to pinpoint the dates and times each feature was viewed - weekdays between 2:45 and 4:30 PM. The operator "staked out" the lady's condo accordingly and made a startling discovery: A middle-schooler lived next door and got out of school at 2:30. He walked home and (because he was forbidden from ordering VOD features at home) grabbed his remote control, walked to the lady's patio, pulled up a chair and ordered movies on her TV through her glass patio door! Her TV was compatible with his, and so no one was the wiser; at least until iGLASS ratted him out! The kid's parents paid the cable bill and the lady implemented Parental Control on her cable box, just in case. Revenue assured - kid busted!
50-count... uh, make that 50-"shot" Fiber Outage
A customer in Memphis, TN had a 50-count fiber running through a remote part of town. Shortly after midnight on the new morning of January 1st, several fiber links went down simultaneously. That was unusual in that normally a fiber cut affects all of the fibers and not just some of them. iGLASS NOC personnel notified the customer immediately of course, and technicians were dispatched. It appears that as part of New Years' celebrations, some local revelers decided it would be fun to shoot off their guns - lots of them. One of the rounds pierced the armored jacket of the fiber and severed just those fibers in the middle of the bundle! Click the thumbnail below to see what the techs found at the site of the outage.
Always look for the Backhoe
A financial services customer leased some dark fiber from a regional ISP to connect two of their offices together. Their regular nightly backup failed and connectivity was lost. Rather than wake the customer, our NOC engineers called the dark fiber provider. They verified the outage and sent a crew to find the break. A big yellow backhoe was in the area and, yep, that’s exactly where they found the fiber cut. There had been a water main break and a utilities crew had accidentally nicked their fiber while digging. Once service was restored, iGLASS restarted the backup, and the customer enjoyed a good night’s rest. Always look for the backhoe!
What happened to our Bandwidth?
One of our corporate IT customers started experiencing intermittent, sluggish Internet performace and couldn't understand why. iGLASS implemented NetFlow monitoring on their core router interfaces and began generating reports. By identifying which source and destination IP addresses were consuming the most bandwidth, the customer was able to determine that their call center employees had recently discovered Pandora and were streaming music to their desktops during periods of low call volume. The customer was able to "throttle" the traffic and eliminate the bandwidth bottlenecks.