One of the most important things you can do to protect your network is to ensure that each desktop, laptop and server is running anti-virus software. To further clarify, it is not enough to just run anti-virus software but the solution should protect against viruses, spyware and malware. For the rest of this article I will refer to the solution as anti-virus although I mean a solution that protects against viruses, spyware and malware.
Many of the anti-virus solutions sold today operate in a similar fashion. The console or main program of the anti-virus software is installed on a server on a network. From there each desktop, laptop and server has the anti-virus client installed on it. The anti-virus console is responsible for checking for new virus definitions from the software vendor and then pushing the definitions out to each of the clients (desktops, laptops and servers). Normally this process occurs several times a day without incident. The process ensures that all of the computers running on a network have the latest virus definitions and are prepared to protect against all known threats. Unfortunately, lately the process has failed and caused major problems.
On April 22, 2010 the McAfee anti-virus program experienced a bug that caused networks around the world to fail. According to a story by InfoWorld:
The update distributed at 3 a.m. Eastern time Wednesday misclassified a critical Windows XP system file, called svchost.exe, as a malicious program. As a result, McAfee’s AV software was instructed to detect and remove the threat, sending affected PCs into fits of rebooting that made the machines useless.
Steve Shillingford, chief executive of tech forensics firm Solera Networks, told USA Today that one large U.S. multinational company saw 50,000 PCs go into a reboot frenzy as a result of the destructive update. Solera was in the process of helping the client clean up the mess, which could only be corrected manually by a technician at each PC.
The problem that McAfee experienced was that the latest virus definitions had a bug that mistakenly took a Windows XP system file and classified it as a virus. The software then tried to remove the virus which caused Windows XP machines to go into a endless loop of reboots. This process made the machine unusable. What made the problem worse was the process that I previously described. The console program downloaded the buggy virus definition and then pushed that definition to each of the desktops, laptops and servers running on the network. You can see how the problem then begins to escalate.
To be honest, I was on the sidelines on April 22, 2010 as the McAfee fiasco occurred. We had previously migrated all of our clients away from McAfee and Symantec anti-virus to Sunbelt’s VIPRE anti-virus. So when McAfee had the problem I was smug knowing that my clients were not impacted.
Unfortunately, my smugness wore off yesterday. Starting around 8am our helpdesk started receiving a lot of support requests. The support requests were spread over almost all our our clients and each request had a similar theme – the client network was slow and unresponsive. From there the morning only got worse. The phones rang off the hook, the requests kept coming in and we were in full scale fire fighting mode.
Finally we received a series of emails from Sunbelt Software Support which explained why all of our clients were having the same problem.
From: Sunbelt Software Support [support@sunbeltsoftware.com]
Subject: High CPU utilization with definitions 6272 – 6274
Product Notification: VIPRE Enterprise, VIPRE Enterprise Premium, CounterSpy Enterprise
Date: May 7th, 2010
Notification Type: Support Issue
Product: VIPRE Enterprise, VIPRE Enterprise Premium, CounterSpy Enterprise
Version: All
Operating System: All product-supported Operating Systems
Dear VIPRE/CounterSpy Enterprise customer,
Customers running a scan with definition versions 6272, 6273, or 6274 may experience extremely high CPU usage when running a scan.
The issue started with definition 6272, released 5/6/2010 at 5:53:19 PM EDT. The issue is caused by a virus detection (Virus.VBS.Redlof.f) that causes a loop condition on the system, resulting in high CPU usage.
This problem has been fixed in definition version 6275.
If you are unable to abort a currently running scan on your agent machines, the solution to the 100% CPU usage is to do as follows:
- Ensure the Enterprise Server has received definition version 6275.
- Stop the following processes on any unresponsive agent machines:
- �
- SBAMsvc.exe
- SBAMTray.exe (if tray icon is set to be visible)
- sbamui.exe (if agent interface is open)
- SBPIMSvc.exe (4.0 Agents only)
- Restart your enterprise agents.
- Update any outdated agents within your console to the latest definitions.
We are aggressively researching why this detection was able to release to the public and are putting in place additional quality assurance processes, so that we can ensure that this type of detection doesn’t occur again.
Thanks for choosing Sunbelt Software,
Sunbelt Software Support
Sunbelt Software
email: support@sunbeltsoftware.com
Voice: 1-877-673-1153 Ext 510
Fax: 1-727-562-5199
Web: http://www.sunbeltsoftware.com
Physical Address:
33 N Garden Ave
Suite 1200
Clearwater, FL 33755
United States
It turns out that Sunbelt pushed out a virus definition file that caused each desktop, laptop and server to spike to 100% CPU utilization. This spike made the computers basically unusable. The resolution was to download the newest definition which corrects the issue. The problem was each of the computers were unresponsive so trying to deploy the latest definition was extremely difficult.
I would like to give kudos to my staff who wrestled with VIPRE for most of the morning and were eventually able to push out the newest definition files and resolve the issue for all of our clients.
On the other hand, I push back on Sunbelt and McAfee to realize the impact of releasing updates that are buggy and cause major network problems. These vendors have a responsibility to their customers to ensure that what is released is fully tested and is bug free. These problems are not just a pain for IT staffs but have major impact to medical practices, hospitals, financial institutions, Universities and corporations around the world. It is easy to say in an email or press release that you will take steps to ensure that these incidents don’t occur again. What your customers really want are concrete plans that show your are serious about your efforts and are doing everything your can to prevent wide spread outages that happened over the past few weeks.