Design and Implementation of Network Monitoring and Scheduling Architecture Based on P4 Junjie Geng Jinyao Yan* Yangbiao Ren Communication University of China Beijing, China [email protected] Communication University of China Beijing, China [email protected] Communication University of China Beijing, China [email protected] Yuan Zhang Communication University of China Beijing, China [email protected] ABSTRACT Network monitoring1 is an important part of network operation. Administrators can use these monitoring data to learn about the network operation status, user behavior, network security status, and traffic development trends. Although various traffic monitoring technologies have been born so far, software-defined networking (SDN) provides more convenience for traffic monitoring and it is easier to introduce new functionalities. However, most exiting methods achieve network status monitoring through extra the probe packets, polling, etc., making the network monitoring costs too much. In this work, we propose a network monitoring and scheduling architecture based on P4 which monitors and visualizes the network state information. We evaluate the proposed scheme based on INT. Preliminary results show that the congestion can be avoided by our scheduling method in the experimental settings. CCS CONCEPTS • Networks → Network architectures KEYWORDS Networks status information, INT, Traffic scheduling, P4 1 INTRODUCTION A variety of monitoring technologies are developed to monitor network traffic, including SNMP/RMON [1], NetFlow/SFlow [2], protocol analyzers, and network traffic probes. These technologies have their own shortcomings. The features of 1 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CSAE '18, October 22–24, 2018, Hohhot, China © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-6512-3/18/10…$15.00 https://doi.org/10.1145/3207677.3278059 SNMP/RMON are simple, and there is little information to monitor, and detailed analysis and historical analysis of flows and packets cannot be performed. NetFlow/sFlow consumes a large amount of resources of network devices and has a great impact on the forwarding performance of devices. The traffic probe needs to deploy a large number of new devices in the network. The protocol analyzers capture a small amount of data and cannot perform long-term monitoring and historical analysis. Almost all monitoring technologies in traditional networks require a separate hardware installation or software configuration, so this is a cumbersome and expensive task. Every kind of monitoring technologies requires a separate hardware installation or software configuration, making it a tedious and expensive task to implement. Due to the tight coupling of the traditional network structure, there is no better network traffic monitoring technology was proposed for a long time until the emergence of software-defined network technology (SDN). In the SDN architecture, the control and data planes are decoupled. The network administrator can use the controller to manage the network control plane and program the control plane to deploy new network functions. The author in [3] presented a traffic matrix estimation system for OpenFlow networks – OpenTM, which uses built-in features provided in OpenFlow switches to directly and accurately measure the traffic matrix with a low overhead. The paper [4] proposed that OpenFlow allows the construction of a network monitoring solution adapted to a specific network requirement and shows how to use the OpenFlow function to obtain traffic statistics from network devices. Paper [5] designed a loading balancing network based on traffic statistics in OpenFlow. From these all, we can see that the research on the field of network traffic monitoring is ecologically vital after the emergence of SDN. However, new problems still arise. Paper [6] pointed out the limitations of OpenFlow wildcard rules and paper [7] pointed out that the controller needs to frequently collect flow statistics measured on data plane switches for different applications, such as traffic engineering, flow rerouting, and attack detection. However, the existing traffic statistics solution causes the increase of the bandwidth cost of the control channel and the long processing delay of the switches, which seriously CSAE2018, October 2018, Hohhot, China interferes with basic functions such as packet forwarding and route update. Paper [8] first developed a system architecture that judiciously combines software packet inspection with hardware flow-table counters to identify and monitor heavy flows. Among them, SDN is a new way for network monitoring and management, however, it is still far from what we want. Most exiting SDN-based methods achieve network status monitoring through extra probe packets, polling, etc., making the network monitoring costs too much. To address the heavy overhead in SDN, P4 [9,10] is proposed. P4 is a programming language mainly used for data planes to provide instructions to the data forwarding plane equipment (such as switches, network cards, firewalls, filters, etc.) in indicate how to handle data packets. Inband Network Telemetry (INT) [11] is a framework designed to allow the collection and reporting state of the data plane, without requiring intervention or work by the control plane. In this paper we design and implement a network monitoring and traffic scheduling architecture based on P4. We realize many functions in our P4 monitoring and scheduling system, such as network status detection and positioning, and network status information visualization. Furthermore, we have proposed and implemented a traffic scheduling scheme based on the network status information and the P4 routing architecture. Evaluations show that the basic functions we proposed have been verified in the experiment. 2 P4 AND INT Junjie Geng et al. Programmers can use P4 to declare how to process packets, and the compiler generates a json file to configure a programmable switch chip. Programmers can define network devices as top-ofrack switches, firewalls, or load balancers through programming with P4. We use P4 language programming to implement threelayer forwarding functions and INT functions in this paper. 2.2 Inband Network Telemetry Inband Network Telemetry (INT) is a framework designed to allow the collection and reporting state of the data plane, without requiring intervention or work by the control plane. It is a powerful new network-diagnostics mechanism implemented in P4. The transmitted packets have an INT header containing telemetry instructions, and the network device inserts its own related information into the data packet according to the instructions when processing the packets in this framework. The INT framework includes devices such as INT Source, INT Sink, and INT transit [11]. Among them, the INT source is used to construct an INT header including telemetry instructions. INT transit is a device supporting INT along the way, which is used to insert its own status information according to the instruction. INT sink retrieve the collected information and sends them to the monitor of administrator. The schematic diagram of the INT framework is shown in Fig. 2, in which the Monitor/Controller is used to receive the network status information provided by the INT Sink. p4 switches(INT Transit) 2.1 P4 Language The P4 language was proposed by some universities and companies in 2014. The paper [10] introduced it in detail. It is a high-level language for programming protocol-independent packet processors. With the P4 language, network developers can directly define the format of data packets that need to be processed by network devices. Without the participation of vendors, the network configuration cycle is greatly shortened. The P4 abstract model is shown in Fig. 1. ... INT Source INT Sink runtimeCLI Monitor/Controller Figure 2: The schematic diagram of the INT framework. 3 NETWORK MONITORING AND SCHEDULING ARCHITECTURE Figure 1: Abstract forwarding model. 2 We designed a network monitoring architecture which integrates network status monitoring and network management based on P4. As shown in Fig. 3, some functions such as network fault location, network status information visualization, routing protocol configuration and optimization, traffic scheduling can be easily implemented through this architecture. Mainly includes custom status monitoring module and network management module. The implementation method and function of each module are as follows: Software defined monitoring module: Obtain the status information of each switch in the data plane through the P4 INT architecture. These information include switch id, ingress port id, Design and Implementation of Network Monitoring and Scheduling Architecture Based on P4 hop latency, queue length, etc. And we can get more network status information through extending the INT function according to the metadata information provided by the P4 switch. Through the status monitoring module, we can realize the functions including network fault monitoring and positioning, visualization of network status information. CSAE2018, October 2018, Hohhot, China be matched in the table int_inst_0407, and the corresponding action will also be executed according to 16 different results. After matching the telemetry instruction, switch determines whether it is the first hop in the table int_bos. the field “bos” of the INT metadata will first be inserted into the packet if it is the first hop. Finally the switch examines the maximum number of hops in the table int_meta_header_update. We can get the state information of the switches on the link at INT Sink devices through the INT framework. The monitor collects and process the state information at the same time. We can obtain the information defined in the INT frame such as the switch id, ingress port id, hop latency, and queue length. Meanwhile, we have modified the INT framework source code and got the status information such as queue length change (DeqEnq) according to the metadata information provided by the P4 switch. The network developer can get what they want monitor by configuring it in the INT Source. Previous table No max_hop_cnt = total_hop_cnt? Table int_insert Table int_inst_0003 Yes Figure 3: Network monitoring and scheduling architecture. The network management module includes three submodules: forwarding logic design, traffic scheduling module, and customize network management tools. The functions of each submodule are as follows: The forwarding logic design module can deliver the forwarding table to implement the forwarding logic we designed. The traffic scheduling module uses the network status information provided by the network status monitoring module to design a traffic scheduling mechanism, generates a JSON configuration file through the P4 program by P4 compiler, and directly configures the P4 switch to implement the traffic scheduling function. For the customize network management tools modules, network developers can develop network management tools such as visualizations and troubleshooting through the open interfaces. 3.1 Software Defined Monitoring Module We implemented a software defined monitoring module based on the INT framework and introduced the implementation of the INT framework in this scenario at the first. Tables of the INT frame mainly include int_insert, int_inst_0003,int_inst_0407,int_bos,int_meta_header_update and other tables, and the matching order is shown in Fig. 4. At first, the switch examines the INT header in the table int_insert. If it exists, the action int_transit will be executed. After the action of int_transit is completed, the switch needs to determine whether it should insert a new INT metadata. If the result is positive, continue to match the following table. The upper four bits of the telemetry instruction will be matched in the table int_inst_0003 and the corresponding action will be executed according to 16 different results. Then the 4-7 bits of telemetry instructions will Table int_inst_0407 Table int_bos Table int_meta_header_u pdate Next table Figure 4: Matching order of INT Table. 3.2 The Network Management Module 3.2.1 Forwarding logic design: we have implemented a simplest L3 route in this scenario by defining three match action tables including ipv4_lpm, forward, and send_frame in the control flow. The matching sequence of these three tables is shown in Fig. 5. Previous table Table ipv4_lpm Table forward Table send_frame Next table Figure 5: Matching order of L3 routing table. The table ipv4_lpm and table forward are the two matching action tables in the Ingress. At first, the table ipv4_lpm modifies the next hop address and the egress port via set_nhop action after matching the longest prefix of the destination IP address. The table forward makes exact match to next hop address and modify 3 CSAE2018, October 2018, Hohhot, China Junjie Geng et al. the destination address of the Ethernet frame via set_dmac action. After the execution of the table ipv4_lpm and table forward is completed, the control flow enters the egress. The table send_frame is a matching action table in the egress. Make exact match to egress port and modify the source mac address via rewrite_smac action. Through the definition of the three matching action tables, we compile the P4 program to generate a JSON configuration file and import it into the P4 switch. We can achieve the simplest L3 routing through configure the flow entries by controlling the command line. 3.2.2 The traffic scheduling submodule: The traffic scheduling submodule determines the network operating status through the obtained network status information. When the network is congested, the traffic scheduling module will change the routing protocol to implement traffic scheduling. In this scenario, we comprehensively determine the status of each link of the network through monitoring information, and send new matching entries to the network device through the runtime_CLI to properly schedule the network data flow. 3.2.3 Customize network management tools: Through open interfaces, network developers can develop network management tools, such as visualizing network status monitoring, troubleshooting and so on. Four hosts (host1 to host4) are INT Sources/Sinks. VTEP running on host is responsible for encapsulating and deencapsulating Vxlan GPE headers. Switches are INT transit inserting INT metadata. We send a UDP flow with 4M/s from host1 to host3 via iperf. There are two paths from host1 to host3: host1leaf1spine1leaf2host3 with 4Mbps available bandwidth and host1leaf1spine2leaf2host3 with 5Mbps available bandwidth. Fig. 7 shows that network congestion has occurred and monitored by the software defined monitoring module. Both hop latency (left figure) and queue occupancy (right figure) are very high when packets passing through host1leaf1spine1leaf2host3 with available bandwidth of 4Mbps. 4 EXPERIMENT DESIGN AND RESULTS Figure 7: Network congestion. 4.1 Network Status Monitoring and Scheduling Architecture Verification We use mininet to create the experiment topology with bmv2 as the software switch. Minnet is a Linux kernel-based network simulation tool that uses lightweight virtualization technology to simulate a complete network of hosts, links, and switches on a single machine. Bmv2 is a P4 software switch that is integrated into the Mininet and can be built using Mininet. It should be noted that the performance of analog devices may be affected by the performance of the local machine. Our experiment topology is shown in Fig. 6. We use fat-tree like data center network topology as the experimental topology in this paper. spine1 spine2 4Mbps 4Mbps host1 leaf1 9Mbps host2 Figure 6: Network test topology. 4 9Mbps leaf2 host3 Figure 8: Congestion eliminated. 5Mbps 5Mbps 9Mbps Then, the scheduling module in our proposed architecture schedules the flow to the other path with 5Mbps available bandwidth. Fig, 8 shows that congestion has been effectively avoided. Both hop latency and queue occupancy are close to zero. 9Mbps host4 In addition, we visualize the transmission path of the packets by capturing the switch id. As shown in Fig. 9, it can be directly seen that the transmission path is changed from host1leaf1spine1leaf2host3 to host1leaf1spine2leaf2host3 (from red to blue links in the spine). Through the experiments, we can see that we have obtained the network state information (switch id, hop latency and queue occupancy) without introducing additional detection packets, and real-time control traffic scheduling) based on the network state. Design and Implementation of Network Monitoring and Scheduling Architecture Based on P4 spine2 spine1 leaf2 leaf1 host1 CSAE2018, October 2018, Hohhot, China host2 host3 host4 Figure 9: The path before/after scheduling. 4.2 Performance Testing on bmv2 Switch Our experimental program was conducted in mininet simulation environment which use the bmv2 soft switch. After functional verification of the proposed network status monitoring and scheduling architecture, we tested the performance of the bmv2 switch. The test process is as follows: (1) We set the link bandwidth to 10M, and then use iperf to send UDP streams from h1 to h3 at 5M, 6M, and 7M rates respectively. The path from h1 to h3 is: h1leaf1spine1leaf2h3. The test results are as follows: From Fig. 10, it can be seen that when sending UDP streams at 5M, 6M, and 7M rates respectively, the packet loss rates are 0%, 0.98%, and 18%. When the packet is transmitted at a rate of 6M, more serious packet loss begins. As a result, in the experimental environment, when the link bandwidth is set to be more than 6M, the processing capability of the bmv2 itself becomes a bottleneck. Figure 11: Performance testing. Then, we continue to do the h1 ping h3, h2 ping h4, and send a 5M rate UDP packet from h1 to h3 through iperf. At this time, the observed ping delay is as shown in the following (Fig. 11 with Iperf): From the experimental results, it can be seen that there is congestion after passing through iperf from h1 to h3. Therefore, the h1 ping h3 delay significantly increases, but at the same time the h2 ping h4 delay also increases significantly. The impact of the link from h2 to h4 indicates that there is a certain correlation between the computing performance of the leaf1 switch and the link bandwidth. It can be concluded that the performance of the bmv2 soft switch in the simulation environment will have a certain bottleneck. 5 CONCLUSIONS AND FUTURE WORK In this work, we proposed a network monitoring and scheduling architecture based on P4 which monitors the network state information (switch id, hop latency and queue occupancy) without introducing additional detection packets, implements visualization using these state information, and real-time control/schedule traffic according to the network state. In the future, we will conduct experiments for realistic applications on hardware P4 switches. ACKNOWLEDGMENTS The paper is partially supported by CUC GuangZhou Institute (Project No.2014-10-05) and CERNET Innovation Project (Project No.NGII20170202). Figure 10: UDP testing via Iperf. (2) We set the link bandwidth to 5M. The path from h1 to h3 is h1leaf1spine1leaf2h3, and the path from h2 to h4 is: h2leaf1spine2leaf2h4. First, execute h1 ping h3, h2 ping h4. The result is shown in the Fig. 11(without Iperf). It can be seen that the delay is at normal level. REFERENCES [1] Gerald A. Winters, Daniel A. Muntz, and Toby J. 1998. Teorey.Using RMON Matrix Group Extensions to Analyze Internetworking Problem. Journal of Network and Systems Management, 6(2), 179-196. [2] B Li, J Springer, G Bebis, and M Hadi Gunes. 2013. A survey of network flow applications. Journal of Network and Computer Applications, 36(2), 567-581. [3] A Tootoonchian, M Ghobadi, and Y Ganjali. 2010. OpenTM: Traffic Matrix Estimator for OpenFlow Networks. International Conference on Passive & Active Measurement, 6032, 201-210. [4] DJ Hamad, KG Yalda, and IT Okumus. 2015. Getting traffic statistics from network devices in an SDN environment using OpenFlow. Information Technology and Systems. [5] K Kaur, S Kaur, and V Gupta. 2016. Flow Statistics Based Load Balancing in 5 CSAE2018, October 2018, Hohhot, China Openflow. Conference on Advances in Computing, Communications and informations(ICACCI), Sept. 21-24, Jaipur, India. [6] S Shirali-Shahreza, and Y Ganjali. 2014. Traffic Statistics Collection with FleXam. ACM SIGCOMM. 44(4), 117-118. [7] H Xu, Z Yu, C Qian, XY Li, and Z Liu. 2017. Minimizing Flow Statistics Collection Cost of SDN Using Wildcard Requests. IEEE Infocom -ieee Conference on Computer Communication, 1-9. [8] Sharat Chandra Madanapalli, Minzhao Lyu, Himal Kumar, Hassan Habibi Gharakheili, and Vijay Sivaraman. 2018. Real-time Detection, Isolation and Monitoring of Elephant Flows using Commodity SDN System. IEEE/IFIP Network Operations and Management Symposium, 2018. 6 Junjie Geng et al. [9].Pat Bosshart, Dan Daly, Glen Gibb, Martin lzzard, Nick Mckeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. 2014. P4: Programming Protocol-Independent Packet Processors. Acm Sigcomm Computer Communication Review, 44(3), 87-95. [10] The P4 Language Consortium. The P4 Language Specification. 2016. https://www.p4.org. [11] C Kim, A Sivaraman, N Katta, A Bas, A Dixit Parag, and LJ Wobker. 2016. Inband Network Telemetry. https://www.p4.org.