Advanced Computer Networks EN.601.714

Administrivia

  • Instructor: Soudeh Ghorbani
  • Lecture time: Wednesday, 4:30-7pm
  • Location: Homewood Campus, Hodson 305
  • Office hours: by appointment
  • Paper discussions on HotCRP (email the instructor to be added to the site)

Course Description

This graduate-level course on computer networks provides an in-depth overview of advanced topics in network protocols and networked systems. The curriculum covers both seminal papers on Internet protocols and recent research findings. Students will explore a wide range of subjects, including large-scale networks for AI and ML, routing, congestion control, network architectures, datacenter networks, cloud infrastructures, and Internet access and equity. Emphasis will be placed on core networking concepts and principles essential for building large-scale AI networks and cloud computing infrastructures. The course includes paper discussions and a research project.

Required Course Background: One undergraduate course in computer networks (e.g., EN.601.414/614 Computer Network Fundamentals or the equivalent), or permission of the instructor. The course projects assume students to be comfortable with programming.

Acknowledgment: This course is based on UIUC CS538 and influenced by Princeton COS561.

Topics and Schedule


Grading Policy

The class is graded as follows:
  • Research project (50%)
    • Proposal (5%)
    • Project overview (5%)
    • Midterm presentation (10%)
    • Final presentation and report (30%)

  • Topic presentations (25%)
  • Paper reviews (15%)
  • Participation (10%)
The overall score will be converted to a letter grade. The minimum score needed for each grade will be at most the following values:
  • 93: A
  • 90: A-
  • 87: B+
  • 83: B
  • 80: B-
  • 77: C+
  • 73: C
  • 70: C-
  • 67: D+
  • 63: D
  • 60: D-
  • 0: F

Research Project

The research project is a main component of this course. The goal is to conduct high-quality novel research related to networking that, by the end of the semester, would be publishable as a paper in a top-quality workshop like HotNets, and when expanded to a full paper would be publishable in a top-quality conference. You may work alone or in groups of 2. The steps in the research project are as follows: During the first few weeks of the course, you should think about projects you might like to do. The instructor will suggest some topics (you need to set an appointment), but it's even better if you have ideas of your own.
  • Project proposal: During the first few weeks of the class, you should think about the topic you want to work on and optionally find a partner. You are welcome (and encouraged) to explore your own ideas. However, you can also talk with the instructor who will suggest some topics. Submit a project proposal to the instructor via email (plain text email with no attachments). The proposal should be at most one page and inlcude each of the following:
    • the problem you plan to address
    • what will be your first steps to attack the problem
    • what is the most closely related work, with at least 3 full academic paper citations (title, authors, publication venue, year) plus paper URLs, and why your proposed problem is different than those or why your proposed solution is better. You should actively search for related work, not just cite papers that the instructor mentions.
    • if there are multiple people on your project team, who they are and how you plan to partition the work among the team
  • The proposal can be short. It should simply demonstrate that you have a plausible project and know how to attack it. The instructor will give a grade for the proposal, and either approve the project or ask for a revision.
  • Project overview and midterm presentations: About one month after the proposal, give an overview presentation in class describing what problem you are solving, why existing approaches will not solve your problem, your solution approach, and your progress in your solution. You must demonstrate progress in your solution. Midterm presentations (roughly 1 month after the project overview) should show your progress and plan for closing the project.
  • Final report: This is a short paper suitable for submission to a workshop. It should clearly state the problem being solved, importance of problem, related work, Your approach, evaluation, and results, summary of conclusions, discussion of limitations, and future work. The paper should be at most 8 pages for one-person projects, and at most 12 pages for two-person projects. But you will be judged on results, not pagecount!
  • Final presentation: At the end of the course (the last day of our class), we will have final project presentations. This will be an opportunity for other students and the instructor to ask questions about your project.
Dates for the above steps will be announced on the class schedule. In general, you are encouraged to meet with the instructor and seek advice on the project as often as you like.
Can a project be shared with another course's project or independent research? It is OK, and often a good idea, to work on a class project that complements your other ongoing projects and has a related topic. However, you should identify the piece of the larger project that you are working on for this course, with separate pieces for other courses. Check with your other instructors as well.

Paper Reviews

For each class, we will have one assigned paper that you should read prior to class and be ready to discuss during the class. You should submit one paper review for each class on HotCRP by 4:00pm ET the day before the session for which the paper was assigned. This review should be relatively short. It should summarize the paper in your own words, at least three comments on the paper that supply information not in the paper itself. For example, a comment might be:
  • a suggestion to build on or extend the paper's ideas in future work
  • a criticism of the paper
  • an advantage of the paper (not discussed in the paper)
  • an alternative solution for the solutions discussed in the paper
  • a response to another student's comment
You are encouraged to read and comment on the other students' reviews. However, please write down notes on your own thoughts independently prior to reading other students' reviews. Collaborating with other students to write reviews is not permitted. Your reviews should ideally include original ideas that do not appear in the other students' reviews. However, If you independently make similar points, that is acceptable. You may skip any 2 paper reviews without affecting your grade. You will receive a deduction of one letter grade for missing more than 2 reviews. The overall review grade for the course will be calculated based on a random sample of 5-7 reviews over the semester.

Topic Presentations

Each student will give a few presentations on different topics during the semester. The goals are for you to learn more about particular areas of interest related to our assigned readings, and give an overview of what you learned to the rest of the class. Here is what you should do:
  • By the deadline mentioned on the class schedule, specify your topic preferences (up to 5 topics) from the list of topics posted on Pizza. For each topic, you can find the related readings and the presentation date on the class schedule. The instructor will take your preferences into account while assigning the topics. The topic assignment will be announced on the class schedule.
  • To prepare for your presentation, pick one "primary paper" to cover in depth, and a related paper. You can choose these papers from the optional readings for your topic on the course web site (not the required reading!) or others that you find. You can also contact the instructor for paper suggestions.
  • At least one week before your presentation date, tell the instructor what papers you plan to cover, and arrange a meeting time with the instructor to go over your draft presentation. The instructor will then approve the papers or suggest other papers you should cover instead or in addition to what you pick.
  • Prepare a presentation on your topic. The presentation should do two things. First, it should describe the primary paper and how it relates to the required reading for that day (this should take roughly 20-30 minutes of your presentation). Second, it should summarize the related paper (this should take roughly 10-15 minutes of your presentation) and compare it to your primary paper, the required reading, and/or other research in this area.
  • Prepare 5 (or more) discussion questions. You should lead a 10-15 minute discussion during or after your presentation.
  • Send your draft presentation to the instructor at least 2 days before your presentation.

Participation

You are expected to attend all sessions of the class. The general policy is that a student will automatically receive a deduction of one letter grade for missing more than 2 lectures. Class sessions combine lectures, discussions of reading, and presentations by students. In all cases, the class is focused around discussion. Please comment, question, and interact! I ask that you do not use laptops during class. This way, we will all be maximally engaged.

Academic Honesty and Cheating

The author of all writing, ideas, and other work must be clearly credited. For example, if your presentation of a past paper uses some slides from the author, you must credit the author. The standard penalty for a first instance of cheating is a grade of zero on the task in question, plus a reduction of one full letter grade in your final course grade. For details, please see the departmental honor code.

Background

If you have not taken an undergraduate networking course recently, or if you need a refresher, you might take a look at Peterson and Davie, Computer Networks: A Systems Approach. Additionally, some conferences such as SIGCOMM provide "topic review" sessions in recent years:

Project Ideas

How can you pick a good research project topic? Your taste for projects will evolve over years, but to get started, here are a few places to look.

Workshops and Conferences

Browse programs at top conferences to see current research topics. Workshops often contain early work on "hot" new directions, raising more questions than answers. These are good conferences and workshops to check out when looking for papers to present on a certain topic, or to see current areas of research when looking for project inspiration:

Survey Papers

Readings

The required papers are listed on the schedule. There is no required textbook.

Tentative Schedule

Topics and papers

  • AI Networks and Remote Direct Memory Access (RDMA)
    • Alibaba HPN: A Data Center Network for Large Language Model Training, SIGCOMM 2024
    • Resiliency at Scale: Managing Google's TPUv4 Machine Learning Supercomputer, NSDI 2024
    • Characterization of Large Language Model Development in the Datacenter, NSDI 2024
    • Revisiting Congestion Control for Lossless Ethernet, NSDI 2024
    • MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, NSDI 2024
    • CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters, NSDI 2024
    • Swing: Short-cutting Rings for Higher Bandwidth Allreduce, NSDI 2024
    • Towards Domain-Specific Network Transport for Distributed DNN Training, NSDI 2024
    • Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem, SIGCOMM 2024
    • MCCS: A Service-based Approach to Collective Communication for Multi-Tenant Cloud, SIGCOMM 2024
    • Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs, SIGCOMM 2024
    • Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems, SIGCOMM 2023
    • Network Load Balancing with In-network Reordering Support for RDMA, SIGCOMM 2023
    • Empowering Azure Storage with RDMA, NSDI 2023
    • Revisiting Network Support for RDMA, NSDI 2018
    • When Cloud Storage Meets RDMA, NSDI 2021
    • Re-architecting Congestion Management in Lossless Ethernet, NSDI 2020
    • FileMR: Rethinking RDMA Networking for Scalable Persistent Memory, NSDI 2020
    • TCP ~ RDMA: CPU-efficient Remote Storage Access with i10, NSDI 2020
    • Congestion detection in lossless networks, SIGCOMM 2021
    • DCQCN: Congestion Control for Large-Scale RDMA Deployments, SIGCOMM 2015
    • RDMA over Commodity Ethernet at Scale, SIGCOMM 2016
    • [background] HPCC: High precision congestion control, SIGCOMM 2019

  • Internet Architecture, Access, Equity
    • The Eternal Tussle: Exploring the Role of Centralization in IPFS, NSDI 2024
    • The Efficacy of the Connect America Fund in Addressing US Internet Access Inequities, SIGCOMM 2024
    • Ten years of the Venezuelan crisis - An Internet perspective, SIGCOMM 2024
    • Decoding the Divide: Analyzing Disparities in Broadband Plans Offered by Major US ISPs, SIGCOMM 2023
    • A Framework for Improving Web Affordability and Inclusiveness, SIGCOMM 2023
    • Destination Unreachable: Characterizing Internet Outages and Shutdowns, SIGCOMM 2023

  • ML for managing networks
    • NetLLM: Adapting Large Language Models for Networking, SIGCOMM 2024
    • NetAssistant: Dialogue Based Network Diagnosis in Data Center Networks, NSDI 2024
    • TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs, NSDI 2023
    • SelfTune: Tuning Cluster Managers, NSDI 2023

  • Operating System, Host Networking, Hardware
    • Making Kernel Bypass Practical for the Cloud with Junction, NSDI 2024
    • Understanding Routable PCIe Performance for Composable Infrastructures, NSDI 2024
    • Understanding the Host Network, SIGCOMM 2024
    • SRNIC: A Scalable Architecture for RDMA NICs, NSDI 2023
    • Hostping: Diagnosing Intra-host Network Bottlenecks in RDMA Servers, NSDI 2023
    • Understanding Host Network Stack Overhead, SIGCOMM 2021
    • AccelTCP: Accelerating Network Applications with Stateful TCP Offloading, NSDI 2020
    • Enabling Programmable Transport Protocols in High-Speed NICs, NSDI 2020
    • Programmable Calendar Queues for High-speed Packet Scheduling, NSDI 2020
    • Packet Transactions: High-Level Programming for Line-Rate Switches, SIGCOMM 2016
    • Gimbal: enabling multi-tenant storage disaggregation on SmartNIC JBOFs, SIGCOMM 2021

  • Verification and Reliability
    • Toward formally verifying congestion control behavior, SIGCOMM 2021
    • Test coverage metrics for the network, SIGCOMM 2021
    • Experiences with Modeling Network Topologies at Multiple Levels of Abstraction, NSDI 2020
    • Rex: Preventing Bugs and Misconfiguration in Large Services Using Correlated Change Analysis, NSDI 2020
    • Meaningful Availability, NSDI 2020
    • Plankton: Scalable network configuration verification through model checking, NSDI 2020
    • Config2Spec: Mining Network Specifications from Network Configurations, NSDI 2020
    • Evolve or Die: High-Availability Design Principles Drawn from Google's Network Infrastructure, SIGCOMM 2016
    • Metha: Network Verifiers Need To Be Correct Too!, NSDI 2021
    • APKeep: Realtime Verification for Real Networks, NSDI 2020
    • Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure, NSDI 2020
    • Probabilistic Verification of Network Configurations, SIGCOMM 2020
    • A General Approach to Network Configuration Verification, SIGCOMM 2017
    • [background] Header Space Analysis: Static Checking For Networks, NSDI 2012
    • [background] VeriFlow: Verifying Network-Wide Invariants in Real Time, NSDI 2013

  • Congestion Control
    • A large-scale deployment of DCTCP, NSDI 2024
    • Pudica: Toward Near-Zero Queuing Delay in Congestion Control for Cloud Gaming, NSDI 2024
    • Towards provably performant congestion control, NSDI 2024
    • SUSS: Improving TCP Performance by Speeding Up Slow-Start, SIGCOMM 2024
    • Keeping an Eye on Congestion Control in the Wild with Nebby, SIGCOMM 2024
    • Principles for Internet Congestion Management, SIGCOMM 2024
    • CCAnalyzer: An Efficient and Nearly-Passive Congestion Control Classifier, SIGCOMM 2024

  • Scheduling, Buffer Sizing, Buffer Sharing
    • Flow Scheduling with Imprecise Knowledge, NSDI 2024
    • BBQ: A Fast and Scalable Integer Priority Queue for Hardware Packet Scheduling, NSDI 2024
    • Credence: Augmenting Datacenter Switch Buffer Sharing with ML Predictions, NSDI 2024
    • Reverie: Low Pass Filter-Based Switch Buffer Sharing for Datacenters with RDMA and TCP Traffic, NSDI 2024

  • Reliability: emulation, simulation, verificationb>
    • Crescent: Emulating Heterogeneous Production Network at Scale, NSDI 2024
    • Reasoning about Network Traffic Load Property at Production Scale, NSDI 2024
    • ExChain: Exception Dependency Analysis for Root Cause Diagnosis, NSDI 2024
    • Scalable Tail Latency Estimation for Data Center Networks, NSDI 2023
    • CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation, NSDI 2023

    Important Dates

    • February 12, 2025, 5pm ET: Project proposal due
    • March 31, 2025, 5pm ET: Midterm slides due
    • April 21, 2025, 5pm ET: Final presentation slides due
    • May 9, 2025, 5pm ET: Final report due