Skip to main content

Module 1: Network Support

Diagnostics and Troubleshooting Methodologies

Troubleshooting is the process of identifying, locating and correcting problems. This process involves gathering information and using one or more structured troubleshooting methods. The seven-step troubleshooting process:

image.png

1. Define the problem

The goal of this stage is to verify that there is a problem and then properly define what the problem is. Problems are usually identified by a symptom (e.g., the network is slow or has stopped working). Network symptoms may appear in many different forms, including alerts from the NMS, console messages, and user complaints.

While gathering symptoms, it is important to ask questions and investigate the issue in order to localize the problem to a smaller range of possibilities. For example, is the problem restricted to a single device, a group of devices, or an entire subnet or network of devices?

In an organization, problems are typically assigned to network technicians as trouble tickets. These tickets are created using trouble ticketing software that tracks the progress of each ticket. Trouble ticketing software may also include a self-service user portal to submit tickets, access to a searchable trouble tickets knowledge base, remote control capabilities to solve end-user issues, and more.

2. Gather information

In this step, targets (host, devices...) to be investigated must be identified, access to the target devices must be obtained (not always possible), and information gathered. During this step, the technician may gather and document more symptoms, depending on the characteristics that are identified.

3. Analyze information

Possible causes must be identified. The gathered information is interpreted and analyzed using network documentation, network baselines, searching organizational KBs, searching the internet, and talking with other technicians.

4. Eliminate possible causes

If multiple causes are identified, then the list must be reduced by progressively eliminating possible causes to eventually identify the most probable cause. Troubleshooting experience is extremely valuable to quickly eliminate causes and identify the most probable cause.

5. Propose hypothesis

When the most probable cause has been identified, a solution must be formulated. At this stage, troubleshooting experience is very valuable when proposing a plan.

6. Test hypothesis

Before testing the solution, it is important to assess the impact and urgency of the problem. For instance, could the solution have an adverse effect on other systems or processes? The severity of the problem should be weighted against the impact of the solution. For example, if a critical server or router must be offline for a significant amount of time, it may be better to wait until the end of the workday to implement the fix. Sometimes, a workaround can be created until the actual problem is resolved.

7. Solving the problem

When the problem is solved, inform the users and anyone involved in the troubleshooting process that the problem has been resolved. Other IT team members should be informed of the solution. It is important to properly document the cause and solution as this can assist other support technicians to prevent and solve similar problems in the future.

Structured troubleshooting methods

A technician may choose one or more of the following methods to solve a problem:

  • Bottom-Up: Start with the physical layer and the physical components of the network and move up through the layers of the OSI model until the cause of the problem is identified.
  • Top-Down: Start with the end-user applications and move down through the layers of the OSI model until the cause of the problem has been identified.
  • Divide-and-Conquer: Start by collecting user experiences of the problem, document the symptoms and then, using that information, make an informed guess as to which OSI layer to start your investigation.
  • Follow-the-Path: Discover the traffic path all the way from source to destination. This approach usually complements one of the other approaches.
  • Substitution: Physically swap the problematic device or component with a known, working one. If the problem is fixed, then the problem is with the removed item. If the problem remains, then the cause is elsewhere.
  • Comparison: Compare specifics such as configurations, software versions, hardware, or other device properties, links, or processes between working and nonworking situations and spot significant differences between them.
  • Educated guess: A less-structured troubleshooting method that uses and educated guess based on the experience of the technician and their ability to solve problems.

image.png
A technician must document the:

  • Problem: Includes the initial report of the problem, a description of the symptoms, information gathered and any other information that would help resolve similar problems.
  • Solution: Includes the steps taken to resolve the problem.
  • Commands and tools used: Include the commands and tools used in diagnosing the problem and solving the problem.

Verify the solution with the customer. If the customer is available, demonstrate how the solution has corrected their problem. Have the customer test the solution and try to reproduce the problem. When the customer can verify that the problem has been resolved, you can update the documentation with any new information provided by the customer.


Network Documentation

Common network documentation includes the following:

  • Physical and logical network topology diagrams.
  • Network device documentation that records all pertinent device information.
  • Network performance baseline documentation.

The distinguishing characteristic for LANs today is that they are typically owned by an individual, such as in a home or small business, or wholly managed by an IT department, such as in a school or corporation.

A wireless mesh network (WMN) uses multiple access points to extend the WLAN.

A CAN is a group of interconnected LANs, belonging to the same organization and operating in a limited geographical area. Campus area networks typically consist of several buildings interconnected by high-speed Ethernet links using fiber optic cabling.

A MAN is a network that spans across a large campus or a city. The network consists of various buildings connected through wireless or fiber optic media.

All network documentation should be kept in a single location, either as hard copy or on the network on a protected server. Backup documentation should be maintained and kept in a separate location.

High-level view of how different parts of an enterprise network connect along its connection to its cloud provider:

image.png

For an enterprise network, the network documentation will typically include several network topology diagrams showing different levels of detail and different types of information:

  • Physical layout and connections
  • IP address and VLAN management
  • Security and VPN policies
  • Cloud services and management
  • Routing policies
  • Remote access policies for remote and hybrid workers

---

XaaS is not a specific cloud service but is defined as the delivery of anything and everything as a service, XaaS includes SaaS, PaaS and IaaS:

  • DRaaS (Disaster recovery)
  • CaaS (Communications)
  • MaaS (Monitoring)
  • DaaS (Desktop)

----

The IEEE 802.11 WLAN standards define how radio frequencies are used for wireless links. Most of the standards specify that wireless devices have one antenna to transmit and receive wireless signals on the specified radio frequency (2,4/5/6 GHz). Some of the newer standards that transmit and receive at higher speeds require APs and wireless clients to have multiple antennas using MIMO technology. MIMO uses multiple antennas as both the transmitter and receiver to improve communication performance. Up to 8 transmit and receive antennas can be used to increase throughput.

image.png

The unlicensed spectrum is open for anyone to use. The unlicensed spectrum is where we find IEEE 802.11 Wi-Fi technologies and is available free to the public. Anyone can transmit over the unlicensed spectrum.

----

The purpose of network monitoring is to watch network performance in comparison to a predetermined baseline. A baseline is used to establish normal network or system performance to determine the "personality" of a network under normal conditions. Establishing a network performance baseline requires collecting performance data from the ports and devices that are essential to network operation. A network baseline should answer the following questions:

  • How does the network perform during a normal or average day?
  • Where are the most errors occuring?
  • What part of the network is most heavily used?
  • What part of the network is least used?
  • Which devices should be monitored and what alert thresholds should be set?
  • Can the network meet the identifies policies?

Without a baseline, no standard exists to measure the optimum nature of network traffic and congestion levels. It may also reveal areas in the network that are underutilized, and quite often can lead to network redesign efforts, based on quality and capacity observations.

---

CDP is a Cisco proprietary L2 protocol that is used to gather information about Cisco devices which share the same data link. CDP is media and protocol independent and runs on all Cisco devices.

With CDP enabled on the network, the show cdp neighbors [detail] command can be used to determine the network layout.


Help Desks

Organizations operate with well-defined corporate, employee, and security policies.

The "Security Policy" document contains policies that inform users, IT staff, and managers of the requirements for protecting technology and information assets. There are policies for:

  • Specifying how users are identified and authenticated.
  • Setting password length, complexity, and refresh interval.
  • Defining what behavior is acceptable on the corporate network.
  • Specifying remote access requirements, etc.

image.png

The Security Policy document is a constantly evolving document that reacts to changes in the threat landscape, new vulnerabilities, and business and employee requirements. The Security Policy helps the IT team understand what they must do to keep the network operational and secure by using:

  • Standard operating procedures (SOP): These define step-by-step actions that must be completed for any given task to comply with a policy. There are SOPs to follow when replacing network devices, installing (or uninstalling) applications, onboarding new employees, terminating existing employees, and more.

  • Guidelines: These cover the areas where there are no SOPs defined. When users encounter a problem or need network support, they must contact a "help desk". The help desk assists users by following the defined SOPs and guidelines. The help desk will use a ticketing system to manage the steps within the troubleshooting life (1.1). This topic 1.3 will focus on using a ticketing system to complete the first 3 steps of define problem, gather information, and analyze information.

A help desk is a specialized team in an IT department that is the central point of contact for employees or customers. When users require support, they must contact the IT help desk. This may be done in many ways. Help desk often use a "shared" email account. 

The online reporting tool could be integrated into the ticketing system.

Often, the help desk technician may be able to quickly answer or solve user issues. For example, if an organization had an internet network failure, users may contact the help desk asking why they cannot reach external sites. The technician would inform them that the network is down, and that it should be operational within a specific time. 

However, if the request for support is valid, then the technician will create a "trouble ticket". This is done using special ticketing system software to manage requests, incidents, and reported problems. These "tickets" can be created by the user using a ticketing system dashboard or by a helpdesk technician. Typically, a user initiates the ticket, and the help desk technician validates it.

The help desk technician may have to gather additional information about the request. When questioning users, use effective questioning techniques and listen carefully to the user answers. You may also have to physically investigate the device or connect remotely to replicate the problem, execute commands, and check configurations.

The technician would then analyze the collected data and either:

  • Solve the problem: Once the user problem has been addressed, the technician would update and close the trouble ticket. Updating the ticket solution is important because it can populate the ticketing system DB. Therefore, if the same problem is reported by another user, the responding technician can search the DB to quickly resolve the problem. In addition, administrators can analyze the tickets to identify common issues and their causes in order to globally eliminate the problem, if possible.
  • Escalate the trouble ticket: Some problems are more complex or require access to devices which the technician has no credentials for. In these cases, the technician must escalate (forward) the trouble ticket to a more experienced technician. It is important that all documentation captured from the user is clear, concise, and accurate. 

Workflow example:

image.png

When completing the details on the ticket, it is important to use clear and concise written communication. Use plain language and short sentences. Always pay attention to your spelling, grammar, and style.

When entering the trouble ticket, the help desk technician must discover the "who", "what", and "when" of the problem. The following recommendations should be employed when communicating with a user:

  • Always be considerate and empathize with users while letting them know you will help them solve their problem. Users reporting a problem may be under stress and anxious to resolve the problem as quickly as possible. Never talk down (hablar mal), belittle (minimizar), insult, or accuse the user of causing the problem.

  • Speak at a technical level they can understand. Avoid using complex terminology or industry jargon. Always listen or read carefully what the user is saying. Taking notes can be helpful when documenting a complex problem.

  • Good interpersonal skills are an asset to the helpdesk technician. It is important to develop this skill set to better serve and communicate with users and peers. For example, a technician should address a user by their preferred name, attempt to relate to the user, and work to clarify exactly what it is that they are requesting.

When interviewing the user, guide the conversation and use effective questioning techniques to quickly ascertain the problem. Two common methods to do so, include using:

  • Open-ended questions: These types of questions allow users to explain the details of the problem in their own words and are useful to obtain general information.

  • Closed-ended questions: These are simple yes, no, or single word answers that can be used to discover important facts about the network problem.
GUIDELINES EXAMPLES OF OPEN-ENDED END USER QUESTIONS

Ask pertinent questions

What does not work?

What exactly is the problem?

What are you trying to accomplish?


Determine the scope of the problem

Who does this issue affect? Is it just you or others?

What device is this happening on?


Determine when the problem occurred / occurs

When exactly does the problem occur?

When was the problem first noticed?

Were there any error message(s) displayed?

Determine if the problem is constant or intermitten

Can you reproduce the problem?

Can you send me a screenshot or video of the problem?

Determine if anything has changed What has changed since the last time that it worked?
Use questions to eliminate or discover possible problems

What works?

When done interviewing the user, repeat your understanding of the problem to the user to ensure that you both agree on what is being reported.

To better understand the problem reported by the user, practice active listening skills. Allow the customer to tell the whole story. During the time that the customer is explaining the problem, occasionally interject some small word or phrase, such as "I understand", "Yes", "I see", or "Okay". 

However, a technician should not interrupt the customer to ask a question or make an statement. This is rude, disrespectful, and creates tension. Often in a conversation, you might find yourself thinking of what to say before the other person finishes talking. When you do this, you are not actively listening. Instead, listen carefully when your customers speak, and let them finish their thoughts.

You asked the customer to explain the problem to you. This is an open-ended question. An open-ended question rarely has a simple answer. Usually, it involves information about what the customer was doing, what they were trying to do, and why they are frustrated.

After you have listened to the customer explain the whole problem, summarize what the customer has said. This helps convince the customer that you have heard and understand the situation. A good practice for clarification is to paraphrase the customer's explanation by beginning with the words, "Let me see if I understand what you have told me". This is a very effective tool that demonstrated to the customer that you have listened and that you understand.

After you have assured the customer that you understand the problem, you will probably have to ask some follow-up questions. Make sure that these questions are pertinent. Do not ask questions that the customer has already answered while describing the problem. Doing this only irritated the customer and shows that you were not listening. 

Follow-up questions should be targeted closed-ended questions based on the information that you have already gathered. Closed-ended questions should focus on obtaining specific information. The customer should be able to answer a closed-ended question with a simple "yes" or "no" with a factual response, such as "Windows 10".

Use all the information that you have gathered from the customer to complete the trouble ticket. 

Document the user-provided information in the trouble ticket. Include anything that you think might be important for you or another technician. The small details often lead to the solution of a difficult or complicated problem.

When the ticket has been completed, the technician should repeat their understanding of the problem to the user to ensure that you both agree on the problem being reported.

--

The ticketing system often includes sections for entering host-related information. Some of the information that can be captured from a host include:

  • Beep codes
  • Event Viewer logs
  • Device Manager settings
  • Task Manager data
  • Diagnostic tool results

Troubleshoot Endpoint Connectivity

Some firewalls such as Windows Firewall, will block pings by default. It is important this is part of your network documentation and be aware of these settings when testing and verifying network connectivity.

The Linux ip address command is used to display addresses and their properties. It can also be used to add or delete IP addresses. The output displayed may vary depending on the Linux distribution. There are several other Linux command line tools that are available for most Linux distributions including speedtest and ncat.

In the GUI of a Mac host, open Network Preferences > Advanced to get the IP addressing information. The ifconfig can also be used. Other useful macOS commands to verify the host IP settings include networksetup -listallnetworkservices and the networksetup -getinfo <net_service>. macOS shares same Linux commands. The System Information tool displays also Wi-Fi information, etc. The macOS Wireless Diagnostics application can be used to troubleshoot and monitor Wi-Fi connectivity.

Third-party network analysis apps that have various functions are available for Android (providing ping, trace, etc)


Troubleshoot a Network

show version

Displays uptime, version information, and licensing information for the device software and hardware.

show ip interface [brief]

show ipv6 interface [brief]

Displays all the configuration options that are set on an interface. Use the brief keyword to only display status and IP address of the IP interfaces.

show interfaces

Displays detailed output for each interface. To display detailed output for only a single interface, include the intf type and number in the command (G0/0/0).

show ip route

show ipv6 route

Displays the routing table content listing directly connected networks and learned remote networks.

show cdp neighbors detail

Displays detailed information about directly connected Cisco neighbor devices. Useful to validate that L1 and L2 are operational.

show arp

show ipv6 neighbors

Displays the contents of the ARP table (IPv4) and the neighbor table (IPv6).

show running-config

Displays the current device configuration.

show vlan

Displays the status of VLANs on a switch.

show port

Displays the status of ports on a switch.

show mac-address table

Displays the content’s of switch’s MAC address table.

show interface status

Displays statistics and status information for network interfaces.

show inventory

Displays inventory information about the specific components in a Cisco device.

show switch

Displays the switch stack status when switches are grouped together using Cisco Stackwise.

show tech-support

This command is useful for collecting a large amount of information about the device for troubleshooting purposes. It executes multiple show commands which can be provided to technical support representatives when reporting a problem.

Some of these show commands would require privilege EXEC mode access. As a security feature, the Cisco IOS software separates management access into two privilege level:

  • User EXEC mode: This is privilege level 1. It provides access to limited commands useful to a technician when verifying the basic operation of a device.

  • Privileged EXEC mode: This is privilege level 15. It is the highest level available and should only be accessible by a network administrator. In this mode, all device commands are available including the ability to configure or change the configuration settings on the device.

The Cisco IOS also provides command syntax check and context-sensitive help. If you enter a command incorrectly, the IOS will identify where you made an entry error. Context-sensitive help enables the user quickly find answers to these questions:

  • Which commands are available in each command mode?
  • Which commands start with specific characters or group of characters?
  • Which arguments and keywords are available to particular commands?

 To access context-sensitive help, simply enter a question mark (?) while typing in a command. 

Cisco IOS also does not require the entire command, argument, or keyword to be entered. The partial command entry must just be long enough to uniquely identify the full command. To be sure the proper command is being entered, the tab key can also be used to complete the partial entry of a command, argument, or keyword.

--

Bandwidth and throughput are two terms that are commonly used when describing the amount of traffic flowing between two devices. 

Bandwidth is the theoretical amount of data that can be transmitted from one device to another in an amount of time. Bandwidth is typically measured in the number of bits per second.

Throughput is the measurement of the actual number of bits per second that are being transmitted across the media. Throughput is always lower than the specified bandwidth because traffic can encounter latency or delay during transmission.

Latency may be caused by any number of issues specifically the physical distance between the source and destination. There are other factors as well, including the number of network devices encountered between source and destination. As data crosses multiple networks, it must be processed and forwarded by switches and routers.

A technician may need to verify the throughput of a link to verify its operation. There are many sites on the internet that we can use to do so. These sites typically use preselected servers and report both your downloading and uploading "speeds".

iPerf is a downloadable Windows tool to measure throughput between a client and a server. iPerf is required to be running on both end devices. 


Troubleshoot Connectivity Remotely

Remote desktop applications introduce potential security vulnerabilities because they offer complete control of computer by someone other than the authorized user. For example, threat actors could exploit open remote desktop application ports or who use social engineering techniques to trick a user into providing them with remote desktop access. It is important that users understand that only authorized support technicians should be granted remote access to systems.

RDP servers and clients are included with Windows, and are available for OS X, Linux, and Unix via xrdp, which is a free and open-source implementation of the Microsoft RDP server. Other operating systems can also perform these functions. For example, in macOS, remote access functionality is provided by the Screen Sharing feature, which is based on Virtual Network Computing (VNC). Any VNC client can connect to a Screen Sharing server. VNC is a freeware product that is similar in functionality to RDP and works over port 5900.

--

A site-to-site VPN is created when VPN terminating devices, also called VPN gateways, are preconfigured with information to establish a secure tunnel. VPN traffic is only encrypted between these devices. Internal hosts have no knowledge that a VPN is being used.

A remote-access VPN is dynamically created to establish a secure connection between a client and a VPN terminating device. For example, a remote access SSL VPN is used when you check your banking information online

Remote-access users must install a VPN client on their computers to form a secure connection with the corporate private network. Special routers can also be used to connect computers to the corporate private network. The VPN software encrypts data before sending it over the internet to the VPN gateway at the corporate private network. VPN gateways establish, manage, and control VPN connections, also known as VPN tunnels. Windows supports several VPN types, however, for some VPNs, third-party software may be required.

Using VPNs to access remote and cloud-based virtual computer workstations ensures greater security when this solution is in use. Microsoft Azure and Amazon Web Services provide remote workspace solutions. IT support personnel will be required to help workers access and operate these virtual resources.

--

Network management refers to two related concepts. First is the process of configuring, monitoring, and managing the performance of a network. Second is the platform that IT and network operations teams use to complete these tasks. Modern NMS platforms provide advanced analytics, machine learning, and intelligent automation to continually optimize network performance. As organizations adapt to a more distributed workforce, these network management systems are increasingly deployed in cloud and hosted environments.

NMS collect data from connected network devices such as switches, routers, APs, and client devices. They also give network administrators control over how those devices operate and interact with one another. The data captured from these devices is used to proactively identify performance issues, monitor security and segmentation, and accelerate troubleshooting. NMS typically use SNMP and Remote Network Monitoring (RMON) to gather information from network devices. Host OSs have management platforms that allow monitoring and configuration of many host computers.

NMS are deployed using two operational models:

Cloud-based

  • Designed to provide flexibility and wide-ranging access to networks that are geographically dispersed.
  • Easy access and monitoring across highly distributed networks and simple provisioning of remote sites.
  • High level of configurability and customization through open APIs and robust application ecosystems.
  • These platforms also support advanced analytics, automation, and optimization use cases, through large data lakes and the power of cloud computing to support sophisticated ML applications.

On-prem

  • Can be used for large CANs that require greater performance and scalability.
  • Also provide advanced features such as assurance, AI/ML
  • Some organizations must keep close control over their data assets and are prohibited from housing data in dispersed locations.
  • On-prem NMS avoid such compliance issues because all data can be stored onsite.
  • Because large networks can generate a lot of management data, on-prem systems are usually larger servers that have enough power to process the data so that it can be used to provide the insights IT needs to manage the network.
  • This is one reason an on-prem server is usually located in the core of the network.
  • Although it can be accessed from the internet, remote access requires a VPN connection.

Cisco Meraki is a leading cloud-based network management platform that provides powerful network management capabilities without consuming user bandwidth. It is secure, flexible, and easy to deploy. With it, networks can be managed from anywhere. It can manage a diverse range of both Meraki and non-Meraki network devices securely. It provides detailed views of large, dispersed, and complex networks down to the individual desktop computer or phone.

--

Large and complex networks are extremely difficult and time-consuming to manage. It is labor-intensive and requires many highly trained personnel. A single organization could have thousands of network devices in hundreds of locations. It is clearly not practicable to manually monitor and configure this large number of devices.

Automation involves creating systems that operates themselves. Network automation is the process of automating the configuring, managing, testing, deploying, and operating of physical and virtual devices within a network. By automating everyday network tasks, functions, and repetitive processes, network service availability and operational efficiency improve.

Any type of network can use network automation. Hardware- and software-based solutions enable data centers, service providers, and enterprises to implement network automation to improve efficiency, reduce human error, and lower operating expenses. 

The operation of network devices can be monitored and controlled via software. Network controllers or other devices provide access to external software processes that can automatically change how a network device functions. Frequently this is done with automation scripts that enable programming the network to behave in specified ways according to network and external conditions. Network automation also enables the configuration and management of large numbers of devices simultaneously.

A scripting language such as Python can be used to create programs that automate network management processes, thus creating management and operational efficiencies while saving on the costs associated with manual network management.