To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The Internet is made of many separate routing domains called Autonomous Systems (ASs), each of which runs an IGP such as IS–IS or OSPF. The IGP handles routes to destinations within the AS, but does not calculate routes beyond the AS boundary. Internet Gateway Protocol engineering (or traffic engineering or IGP optimization) is the tuning of local IS–IS or OSPF metrics to improve performance within the AS. Today, IGP engineering is an ad-hoc process where metric tuning is performed by each AS in isolation. That is, each AS optimizes paths within its local network for traffic traversing it without coordinating these changes with neighboring ASs. The primary assumption behind such an assertion is that there is sufficient separation between intra-domain and inter-domain routing.
Beyond the AS boundary, the choice of AS hops is determined by the BGP,; BGP engineering is a less developed and less understood process compared to IGP engineering. In addition to whether there is a physical link between two ASs over which routes and traffic can flow, there are several BGP policies that determine which inter-domain paths are exposed to a neighboring AS. Business peering policies can directly translate into which routes are exported to each AS. After all these policies are applied, the remaining feasible paths are subjected to the “hot-potato” routing policy. Hot-potato routing occurs when there are multiple egress points to reach a destination.
In order to compile flow statistics, each router maintains a table of records indexed by flow key, e.g. 5-tuple of the flow. A flow is said to be active at a given time if there exists a record for its key. When a packet arrives at the router, the router determines if a flow is active for that key. If not, it instantiates a new record for that key. The statistics for the flow are updated for the packet, typically including counters for packets and bytes and arrival times of the first and most recent packet of the flow. Due to the fact that the router does not have knowledge of application-level flow structure, it must terminate the flow according to some criteria. The most commonly used criteria are the following: (i) inter-packet timeout, e.g. the time since the last packet observed for the flow exceeds some threshold; (ii) protocol syntax, e.g. observation of a FIN or RST packet of the TCP flow; (iii) aging, e.g. flows are terminated after a given elapsed time since the arrival of the first packet of the flow; (iv) memory management, e.g. flows might be terminated at any point in time to release memory. When a flow is terminated, its statistics are flushed for export and the associated memory is released for use by new flows.
Before embarking on the exploration of techniques to assist operators in the management and design of IP networks, this chapter lays the foundation of the terms and concepts that will be used in the rest of the book. We describe the Internet architecture and the elements and protocols guiding its behavior. We then outline issues associated with the design, management, optimization and security of such a complex infrastructure, topics that will be the focal points in the following chapters.
What is the Internet?
The Internet is a diverse collection of independent networks, interlinked to provide its users with the appearance of a single, uniform network. Two factors shield the user from the complex realities that lie behind the illusion of seamlessness: (i) the use of a standard set of protocols to communicate across networks and (ii) the efforts of the companies and organizations that operate the Internet's different networks to keep its elements interconnected.
The networks that comprise the Internet share a common architecture (how the components of the networks inter-relate) and software protocols (standards governing the exchange of data), which enable communication within and among the constituent networks. The nature of these two abstract elements – architecture and protocols – is driven by the set of fundamental design principles adopted by the early builders of the Internet. It is important to distinguish between the public Internet and the Internet's core technology (standard protocols and routers), which are frequently called “IP technology.”
Classifying traffic flows according to the application that generates them is an important task for (a) effective network planning and design and (b) monitoring the trends of the applications in operational networks. However, an accurate method that can reliably identify the generating application of a flow is still to be developed. In this chapter and the next, we look into the problem of traffic classification; the ultimate goal is to provide network operators with algorithms that will provide a meaningful classification per application, and, if this is infeasible, with useful insight into the traffic behavior. The latter may facilitate the detection of abnormalities in the traffic, malicious behavior or the identification of novel applications.
State of the art and context
Currently, application classification practices rely to a large extent on the use of transport-layer port numbers. While this practice may have been effective in the early days of the Internet, port numbers currently provide limited information. Often, applications and users are not cooperative and, intentionally or not, use inconsistent ports. Thus, “reliable” traffic classification requires packet-payload examination, which is scarcely an option due to: (a) hardware and complexity limitations, (b) privacy and legal issues, (c) payload encryption by the applications.
Taking into account empirical application trends and the increasing use of encryption, we conjecture that traffic classifiers of the future will need to classify traffic “in the dark.”
As networks continue to grow rapidly in size and complexity, it has become increasingly clear that their evolution is closely tied to a detailed understanding of network traffic. Large IP networks are designed with the goal of providing high availability and low delay/loss while keeping operational complexity and cost low. Meeting these goals is a highly challenging task and can only be achieved through a detailed knowledge of the network and its dynamics.
No matter how surprising this may seem, IP network management today is primarily reactive in nature and relies on trial and error when problems arise. Network operators have limited visibility into the traffic that flows on top of their network, the operational state of the network elements and the behavior of the protocols responsible for the routing of traffic and the reliable transmission of packets from end to end. Furthermore, design and planning decisions only partially rely on actual usage patterns. There are a few reasons behind such a phenomenon.
First, the designers of IP networks have traditionally attached less importance to network monitoring and resource accounting than to issues such as distributed management, robustness to failures and support for diverse services and protocols. Thus, IP network elements (routers and end hosts) have not been designed to retain detailed information about the traffic flowing through them, and IP protocols typically do not provide detailed information about the state of the underlying network.
The traffic matrix (TM) of a telecommunications network measures the total amount of traffic entering the network from any ingress point and destined to any egress point. The knowledge captured in the TM constitutes an essential input for optimal network design, traffic engineering and capacity planning. Despite its importance, however, the TM for an IP network is a quantity that has remained elusive to capture via direct measurement. The reasons for this are multiple. First, the computation of the TM requires the collection of flow statistics across the entire edge of the network, which may not be supported by all the network elements. Second, these statistics need to be shipped to a central location for appropriate processing. The shipping costs, coupled with the frequency with which such data would be shipped, translate to communications overhead, while the processing cost at the central location translates to computational overhead. Lastly, given the granularity at which flow statistics are collected with today's technology on a router, the construction of the TM requires explicit information on the state of the routing protocols, as well as the configuration of the network elements. The storage overhead at the central location thus includes routing state and configuration information. It has been widely believed that these overheads would be so significant as to render computation of backbone TMs, through measurement alone, not viable using today's flow monitors.
The convergence of traditional network services to a common IP infrastructure has resulted in a major paradigm shift for many service providers. Service providers are looking for profitable ways to deliver value-added, bundled, or personalized IP services to a greater number of broadband users. As cable operators and Digital Subscriber Line (DSL) providers capitalize on IP networks, they need to create higher-margin, higher-value premium services, such as interactive gaming, Video-on-Demand (VoD), Voice-over-IP (VoIP) and broadband TV (IPTV). The missing element of the current strategy is service differentiation, i.e. the ability to understand at a granular level how subscribers are using the network, identify what applications or services are being consumed, and then intelligently apply network resources to applications and attract subscribers that promise the highest return on investment. Operators need to manage and control subscriber traffic. This can be accomplished by introducing more intelligence into the network infrastructure, which enhances the transport network with application and subscriber awareness. Such unique visibility into the types of bits carried allows the network to identify, classify, guarantee performance and charge for services based on unique application and subscriber criteria. Instead of underwriting the expenses associated with random and unconstrained data capacity, deployment and consumption, this new wave of network intelligence allows operators to consider Quality-of-Service (QoS) constraints while enabling new possibilities for broadband service creation and new revenue-sharing opportunities. The same is true with third-party service providers, who may, in fact, be riding an operator's network undetected.
End-to-end packet delay is an important metric to measure in networks, both from the network operation and application performance points of view. An important component of this delay is the time for packets to traverse the different switching elements along the path. This is particularly important for network providers, who may have SLAs specifying allowable values of delay across the domains they control. A fundamental building block of the path delay experienced by packets in IP networks is the delay incurred when passing through a single IP router. In this chapter we go through a detailed description of the operations performed on an IP packet when transitting an IP router and measurements of their respective time to completion, as collected on an operational high-end router. Our discussion focuses on the most commonly found router architecture, which is based on a cross-bar switch.
To quantify the individual components of through-router delay, we present results obtained through a unique set of measurements that captures all packets transmitted on all links of an operational access router for a duration of 13 hours. Using this data set, this chapter studies the behavior of those router links that experienced congestion and reports on the magnitude and temporal structure of the resulting packet delays. Such an analysis reveals that cases of overload in operational IP links in the core of an IP network do exist, but tend to be of small magnitude and low frequency.
This appendix lists the payload bit-strings used by a payload classifier.
Numbers in parentheses denote the beginning byte in the payload where each string is found. If there is no number the string is found at the beginning of the payload. Note that: “plen” denotes the size of the payload; \x denotes hex; && denotes AND; ∥ denotes OR; plen - 2 = (1) denotes that the payload length minus 2 is given by the first byte in the payload
Open, any-to-any connectivity is clearly one of the fundamentally great properties of the Internet. Unfortunately, the openness of the Internet also enables an expanding and everevolving array of malicious activity. During the early 1990s, when malicious attacks first emerged on the Internet, only a few systems at a time were typically compromised, and those systems were rarely used to continue or broaden the attack activity. At first, the attackers were seemingly motivated simply by the sport of it all. But then, as would seem to be the natural order of things, the miscreants were seized by the profit motive. Today, network infrastructure and end systems are constantly attacked with an increased level of sophistication and virulence.
In this chapter, we discuss and face two of the most dangerous threats known by the Internet community: Denial of Service (DoS) and computer worms. In the following we refer to them simply by DoS and Computer Worms. Those two families of threats have different goals, forms and effects than most of the attacks that are launched at networks and computers. Most attackers involved in cyber-crime seek to break into a system, extract its secrets, or fool it into providing a service without the approprite authorization. Attackers commonly try to steal credit card numbers or proprietary information, gain control of machines to install their software or save their data, deface Web pages, or alter important content on victim machines.