The information you can monitor with SNMP is wide-ranging--from standard items, like the amount of traffic flowing into an interface, to far more esoteric items, like the air temperature inside a router. In spite of its name, though, SNMP is not especially simple to learn. Written for network and system administrators, the book introduces the basics of SNMP and then offers a technical background on how to use it effectively. The book contains five new chapters and various updates throughout. Administrators will come away with ideas for writing scripts to help them manage their networks, create managed objects, and extend the operation of SNMP agents.
|Published (Last):||13 December 2007|
|PDF File Size:||5.83 Mb|
|ePub File Size:||18.28 Mb|
|Price:||Free* [*Free Regsitration Required]|
This section of the chapter provides insights into some of the issues surrounding network management. Business Case Requirements The endeavor of network management involves solving a business problem through an implementation of some sort.
A business case is developed to understand the impact of implementing some sort of task or function. It looks at how, for example, network administrators do their day-to-day jobs.
The basic idea is to reduce costs and increase effectiveness. Levels of Activity Before applying management to a specific service or device, you must understand the four possible levels of activity and decide what is appropriate for that service or device: Inactive No monitoring is being done, and, if you did receive an alarm in this area, you would ignore it.
Reactive No monitoring is being done; you react to a problem if it occurs. Interactive You monitor components but must interactively troubleshoot them to eliminate side-effect alarms and isolate a root cause. Proactive You monitor components, and the system provides a root-cause alarm for the problem at hand and initiates predefined automatic restoral processes where possible to minimize downtime. Reporting of Trend Analysis The ability to monitor a service or system proactively begins with trend analysis and reporting.
Chapters 12 and 13 describe two tools that are capable of aiding in trend reporting. In general, the goal of trend analysis is to identify when systems, services, or networks are beginning to reach their maximum capacity, with enough lead time to do something about it before it becomes a real problem for end users. For example, you may discover a need to add more memory to your database server or upgrade to a newer version of some application server software that adds a performance boost.
Doing so before it becomes a real problem can help your users avoid frustration and possibly keep you employed. Response time reporting measures how various aspects of your network including systems are performing with respect to responsiveness.
Chapter 11 shows how to monitor services with SNMP. Alarm Correlation Alarm correlation deals with narrowing down many alerts and events into a single alert or several events that depict the real problem. Another name for this is root-cause analysis. The idea is simple, but it tends to be difficult in practice. For example, when a web server on your network goes down, and you are managing all devices between you and the server including the switch the server is on and the router , you may get any number of alerts including ones for the server being down, the switch being down, or the router being down, depending on where the real failure is.
You really only need to know that the router is down. Network management systems can often detect when some device or network is unreachable due to varying reasons. The key in this situation is to correlate the server, switch, and router down events into a single high-level event detailing that the router is down. This high-level event can be made up of all the entities and their alarms that are affected by the router being down, but you want to shield an operator from all of these until he is interested in looking at them.
Keeping this storm of alerts and alarms away from the operator helps with overall efficiency and improves the trouble resolution capabilities of the staff.
Clearing alarms is also important. This notion of state transition, from bad to good, is common. It helps operators know that something is indeed up and operational. It also helps with trending. If you see that a certain device is constantly unreliable, you may want to investigate why. Trouble Resolution The key to trouble resolution is knowing that what you are looking at is valuable and can help you resolve the problem.
As such, alarms and alerts should aid an operator in resolving the problem. If possible, alerts and alarms should provide the operator with enough detail so that she can effectively troubleshoot and resolve the problem. Change Management Change management deals with, well, managing change. In other words, you need to plan for both scheduled and emergency changes to your network. Not doing so can cause networks and systems to be unreliable at best and can upset the very people you work for at worst.
The following sections provide a high-level overview of change management techniques. The following techniques are recommended by Cisco. See the end of this section for the URL to this paper and others on the topic of network management. Planning for Change Change planning is a process that identifies the risk level of a change and builds change planning requirements to ensure that the change is successful.
The key steps for change planning are as follows: Assign all potential changes a risk level prior to scheduling the change. Document at least three risk levels with corresponding change planning requirements. Identify risk levels for software and hardware upgrades, topology changes, routing changes, configuration changes, and new deployments.
Assign higher risk levels to nonstandard add, move, or change types of activity. The high-risk change process you document needs to include lab validation, vendor review, peer review, and detailed configuration and design documentation. Create solution templates for deployments affecting multiple sites. Include information about physical layout, logical design, configuration, software versions, acceptable hardware chassis and modules, and deployment guidelines.
Document your network standards for configuration, software version, supported hardware, and DNS. Additionally, you may need to document things like device naming conventions, network design details, and services supported throughout the network.
Managing Change Change management is a process that approves and schedules the change to ensure the correct level of notification with minimal user impact. The key steps for change management are as follows: Assign a change controller who can run change management review meetings, receive and review change requests, manage change process improvements, and act as a liaison for user groups.
Hold periodic change review meetings with system administration, application development, network operations, and facilities groups as well as general users. Document change input requirements, including change owner, business impact, risk level, reason for change, success factors, backout plan, and testing requirements. Document change output requirements, including updates to DNS, network map, template, IP addressing, circuit management, and network management.
Define a change approval process that verifies validation steps for higher-risk change. Hold postmortem meetings for unsuccessful changes to determine the root cause of change failure. Develop an emergency change procedure that ensures that an optimal solution is maintained or quickly restored.
Scope Scope is the who, what, where, and how for the change. In other words, you need to detail every possible impact point for the change, especially its impact on people.
Risk assessment Everything you do to or on a network, when it comes to change, has an associated risk. The person requesting the change needs to establish the risk level for the change. It is best to experiment in a lab setting if you can before you go live with a change. This can help identify problems and aid in risk evaluation. Process flow for planned change management Test and validation With any proposed change, you want to make sure you have all of your bases covered.
Rigorous testing and validation can help with this. Depending upon the associated risk, various levels of validation may need to be performed. For example, if the change has the potential to impact a great many systems, you may wish to test the change in a lab setting.
Change planning For a change to be successful, you must plan for it. This includes requirements gathering, ordering software or hardware, creating documentation, and coordinating human resources.
Change controller Basically, a change controller is a person who is responsible for coordinating all details of the change process. Change management team You should create a change management team that includes representation from network operations, server operations, application support, and user groups within your organization. The team should review all change requests and approve or deny each request based on completeness, readiness, business impact, business need, and any other conflicts.
Tip The change management team does not investigate the technical accuracy of the change; technical experts who better understand the scope and technical details should complete this phase of the change process. Communication Many organizations, even small ones, fail to communicate their intentions.
Make sure you keep people who may be affected up-to-date on the status of the changes. Implementation team You should create an implementation team consisting of individuals with the technical expertise to expedite a change.
The implementation team should also be involved in the planning phase to contribute to the development of the project checkpoints, testing, backout criteria, and backout time constraints. This team should guarantee adherence to organizational standards, update DNS and network management tools, and maintain and enhance the tool set used to test and validate the change.
Test evaluation of change Once the change has been made, you should begin testing it. Hopefully you already have a set of tests documented that can be used to validate the change. Make sure you allow yourself enough time to perform the tests. If you must back out the change, make sure you test this scenario, too.
Network management update Be sure to update any systems like network management tools, device configurations, network configurations, DNS entries, etc.
This may include removing devices from the management systems that no longer exist, changing the SNMP trap destination your routers use, and so forth.
Documentation Always update documentation that becomes obsolete or incorrect when a change occurs. Documentation may end up being used by a network administrator to solve a problem.
Documentation means a lot more during emergency changes than it does in planned changes. In the heat of the moment, things can get lost or forgotten. Accurately recording the steps and procedures taken will ensure that troubles can be resolved in the future. If you have to, take short notes while the process is unfolding. Later, write it up formally; the important thing is to remember to do it. Figure shows the process flow for emergency changes.
Emergency change process Issue determination Knowing what needs to change is generally not difficult to determine in an emergency. The key is to take one step at a time and not rush things. In some cases, the outage can be unnecessarily prolonged. Limited risk assessment Risk assessment is performed by the network administrator on duty, with advice from other support personnel. Her experience will guide her in how the change is classified from a risk perspective.
Communication and documentation If at all possible, users should be notified of the change. Also, be sure to communicate any changes with the change manager.
Essential SNMP, 2nd Edition
Essential SNMP, 2nd Edition by Douglas Mauro, Kevin Schmidt