计算机系统设计

Computer System Engineering

专栏索引专栏目录 / Introduction to Computer System Engineering / 章节页

Publish Date: 2026-01-13

Word Count: 3.5k

Lesson 12 网络分层设计

在上一讲中，我们确立了互联网的公理：底层网络是不可靠的（Best-Effort）。
这一讲，我们将通过分层的方法，从物理线路开始，一步步向上构建出可靠的应用体验。每一层都有其特定的职责、原语和挑战。

说白了就是计算机网络课的一个减缩版。正因为是减缩版，所以里面多多少少有些为了理解瞎说的部分，但是问题不大。

1. 链路层：物理传输的基石

链路层的任务看似简单——将比特从一根线的一端传到另一端，但为了在物理世界中实现这一点，我们需要解决大量底层问题。

1.1 物理传输机制

如何在模拟的电信号中表示数字的 0 和 1？

同步传输：
- 机制：发送方和接收方共享一个极其精确的时钟，约定好在特定的时间片段读取电压。
- 场景：通常用于芯片内部或距离极短的传输。
- 局限：在长距离传输中，保持毫秒级的时钟同步极其困难且昂贵。
  
  这个很好理解，因为同步时钟依赖于相等的线路长度，长距离很难控制误差精细程度。
异步传输：
- 机制：没有全局时钟。数据随时可能到达，通常依靠这就握手信号（如 3 线协议）或特定的起始/停止位来同步。
- 场景：模块间或长距离通信。
编码方案：曼彻斯特编码：
- 问题：如果发送连续的“1”（持续高电平），接收方很难判断到底是发了 10 个 1 还是 11 个 1（时钟漂移）。
- 解法：不直接用高/低电平表示值，而是用电平的跳变（Transition）表示。
  - 例如：低→高表示 0，高→低表示 1。
- 收益：自带时钟信号（Self-clocking），接收方可以根据跳变同步节奏。
- 代价：带宽利用率减半（传输 1bit 需要 2 个信号周期）。

1.2 关键权衡：MTU 与 BER

这是链路层设计中最核心的数学权衡。

MTU (Maximum Transmission Unit)：一次传输的最大数据帧的大小。
BER (Bit Error Rate)：物理线路的误码率。

权衡逻辑：

高误码率环境 (如无线电) $\rightarrow$ 减小 MTU。
- 原因：包越大，其中某一位出错的概率就越高。一旦出错，整个包都要重传。在大包高误码率下，主要带宽都会被重传浪费。
低误码率环境 (如光纤) $\rightarrow$ 增大 MTU。
- 原因：线路很稳，几乎不出错。增大 MTU 可以减少头部的占比，减少处理次数，提高有效吞吐量。
  
  我们前面有提到，现代的网络都是分组转发（没有单独的对端连接），所以每一组数据在IP层传输的时候都要携带一个头部（包含收发端的地址），这里可以简单把数据帧理解为有效信息（哈哈就先这么理解吧），在IP传输的时候就是：头部+数据帧。MTU越大，有效信息的传递效率就越高。
  
  实际上数据帧也不全是有效信息，因为数据链路层也需要冗余的控制头部。

2. 网络层：全球互联

链路层只能解决“邻居”间的通信，而网络层要解决的是跨越千万个节点的寻址与路由。

2.1 核心设计哲学

KIS (Keep It Simple)：网络核心（路由器）应该尽可能简单，只负责傻瓜式转发。
E2E 原则的体现：不在底层做复杂的功能（如可靠性），因为底层做不好，也做不全。

就是上节论文的 End To End 啦

2.2 路由算法：如何建立转发表？

路由器如何知道要把包发给谁？

静态路由：人工手动配置。缺点是无法应对断网或拓扑变化，所以我们不这么干。
自适应路由：
- 距离向量算法：
  1. 广播：每个节点告诉邻居“我去哪里需要几跳”。
  2. 合并与计算：邻居收到信息后，加上自己到那个邻居的距离，如果发现比现有路径更短，就更新路由表。
  3. 收敛：经过多次迭代，全网路由表最终稳定。
- 应对故障：当某条链路断开时，相关节点会将其标记为“无穷大”，并广播给邻居，触发全网重新计算（收敛）。
  
  实际有很多路由选择算法，计算机网络会学

2.3 层级结构

如果全世界每一台电脑都在路由表中占一行，路由表会大到无法查询。

解决方案：分层编址（区域 + 站点）。
- 类似邮政系统：先看“省/州”，到了省内再看“市”。
- IP 地址的设计正是如此（网络号 + 主机号）。
权衡：
- 优点：可扩展性。路由器只需要存“去往某个区域”的路径，大大减小了路由表体积。
- 缺点：非最优路径。可能从A村到B村有一条小路，但因为分层路由，数据包可能不得不先绕到“省会”路由器再分发下来。

2.4 错误报告与探测

网络层虽不保证可靠，但它会尽力反馈错误。（基于Ping来完成）

ICMP 协议
- **TTL 超限 **：防止数据包在环路中无限转圈。每经过一个路由器 TTL 减 1，减到 0 则丢弃并报错。
- 应用：Traceroute。
  - 原理：故意发送 TTL=1 的包，第一跳路由器丢弃并回传 ICMP；再发 TTL=2，第二跳回传…以此类推，探测出整条路径上的所有 IP。
Path MTU Discovery：
- 设置 IP 头部的 DF (Don’t Fragment) 位。如果包太大过不去，路由器会报错并告知该段链路的 MTU。源端据此调整包大小，避免分片带来的性能损耗。

3. 端到端层

传输层负责将下层不可靠的服务，转换为上层应用所需的可靠或实时服务。

3.1 端口与多路复用

IP 地址只定位到了主机。
端口号定位到主机上的具体进程/服务（如 80 对应 Web，22 对应 SSH）。这实现了应用层面的复用。

3.2 构造可靠传输

重新整理这节给自己看笑了，把计网的一堆知识点强行压缩也是不容易哈哈哈

如何在不可靠网络上实现 100% 可靠传输？

防丢包与重传：
- ACK 机制：收到了由于要发确认。
- 计时器：发出去多久没收到 ACK 算丢包？
- 挑战：RTT是动态变化的。
- 算法：EWMA (指数加权移动平均)。不要只看最近一次的 RTT，而是维护一个历史加权平均值，平滑掉抖动，算出合理的超时时间（RTO）。
防重复：
- Nonce / 序列号：给每个包编个号。接收方维护一个“最大已收到的序号”，小于此序号的重复包直接丢弃。
流量控制- 滑动窗口：
- 停-等协议：发一个包，等一个 ACK。效率极低，链路大部分时间是空的。
- 滑动窗口：
  - 允许发送方在未收到 ACK 的情况下连续发送多个包（填满流水线）。
  - 窗口大小：由接收方的处理能力（通过接收窗口体现）和网络的拥塞程度共同决定。
  - 动态调整：接收方处理不过来时，会通知发送方缩小窗口，避免把接收端淹没。

3.3 端到端校验的必要性

课件中特别提到了一个经典问题：既然链路层已经有了 CRC 校验，为什么 TCP 还需要 Checksum？

答案：链路层只能保证数据在“网线”上传输时没错。但是数据在路由器内部排队、内存拷贝、总线传输时，可能会发生比特翻转。如果端到端层不进行最终校验，这些软件或硬件错误就会被漏过，导致文件损坏。
结论：只有通信的终点（应用程序/传输层）进行的检查才是真正可靠的。

4. 对照

Lesson 12 Network Layered Design

In the previous lesson, we established the axiom of the Internet: The underlying network is unreliable (Best-Effort).
This lesson uses layering to build a reliable application experience step-by-step from the physical wire up. Each layer has its specific duties, primitives, and challenges.

To be honest, this is a condensed version of a Computer Networks course. Because it is a condensed version, there is some “hand-waving” for the sake of understanding, but it’s not a big deal.

1. Link Layer: The Cornerstone of Physical Transmission

The task of the Link Layer seems simple—move bits from one end of a wire to the other—but to realize this in the physical world, we need to solve massive low-level problems.

1.1 Physical Transmission Mechanisms

How do we represent digital 0s and 1s in analog electrical signals?

Synchronous Transmission:
- Mechanism: The sender and receiver share an extremely precise clock and agree to read the voltage at specific time slots.
- Scene: Usually used inside chips or for very short-distance transmission.
- Limitation: Maintaining millisecond-level clock synchronization over long-distance transmission is extremely difficult and expensive.
This is easy to understand because synchronous clocks depend on equal line lengths, and it is hard to control error precision over long distances.
Asynchronous Transmission:
- Mechanism: There is no global clock. Data can arrive at any time, usually relying on handshake signals (like a 3-wire protocol) or specific start/stop bits for synchronization.
- Scene: Between modules or long-distance communication.
Encoding Scheme: Manchester Encoding:
- Problem: If continuous “1s” (sustained high voltage) are sent, it is hard for the receiver to judge whether 10 “1s” or 11 “1s” were sent (due to clock drift).
- Solution: Do not represent values with high/low voltage directly, but use voltage transitions.
  - For example: Low $\rightarrow$ High represents 0, High $\rightarrow$ Low represents 1.
- Benefit: Self-clocking. The receiver can synchronize its rhythm based on the transitions.
- Cost: Bandwidth utilization is halved (transmitting 1 bit requires 2 signal cycles).

1.2 Key Trade-off: MTU vs. BER

This is the core mathematical trade-off in Link Layer design.

MTU (Maximum Transmission Unit): The maximum size of a data frame in a single transmission.
BER (Bit Error Rate): The error rate of the physical line.

Trade-off Logic:

High BER Environment (e.g., Wireless) $\rightarrow$ Reduce MTU.
- Reason: The larger the packet, the higher the probability that a bit inside it will fail. Once an error occurs, the entire packet must be retransmitted. With large packets and high BER, most bandwidth would be wasted on retransmissions.
Low BER Environment (e.g., Fiber) $\rightarrow$ Increase MTU.
- Reason: The line is very stable and rarely errors. Increasing MTU reduces the proportion of headers, reduces processing frequency, and improves effective throughput.
We mentioned before that modern networks are packet-switched (no dedicated peer connection), so every data group needs to carry a header (containing sender/receiver addresses) during IP transmission. Here, we can simply understand the data frame as “effective information” (haha, let’s just understand it this way for now). In IP transmission, it is: Header + Data Frame. The larger the MTU, the higher the transmission efficiency of effective information.

Actually, data frames aren’t entirely effective information because the Link Layer also needs redundant control headers.

2. Network Layer: Global Interconnection

The Link Layer only solves communication between “neighbors,” while the Network Layer solves addressing and routing across millions of nodes.

2.1 Core Design Philosophy

KIS (Keep It Simple): The network core (routers) should be as simple as possible, responsible only for “dumb” forwarding.
Embodiment of E2E Principle: Do not implement complex functions (like reliability) at the bottom layer, because the bottom layer can’t do it well, nor can it do it completely.

This refers to the End-to-End paper from the last lesson.

2.2 Routing Algorithms: How to Build Forwarding Tables?

How does a router know who to send a packet to?

Static Routing: Manual configuration. The downside is it cannot handle outages or topology changes, so we don’t do this.
Adaptive Routing:
- Distance Vector Algorithm:
  1. Broadcast: Each node tells its neighbors “how many hops I need to get somewhere.”
  2. Merge & Calculate: After receiving information, neighbors add their distance to that neighbor; if a shorter path is found, the routing table is updated.
  3. Convergence: After multiple iterations, the network-wide routing tables eventually stabilize.
- Handling Faults: When a link breaks, relevant nodes mark it as “infinity” and broadcast to neighbors, triggering a network-wide recalculation (convergence).
In reality, there are many routing algorithms, which you will learn in Computer Networks.

2.3 Hierarchy

If every computer in the world occupied a row in the routing table, the table would be too large to query.

Solution: Hierarchical Addressing (Region + Station).
- Similar to the postal system: First look at “State/Province,” then look at “City” within the state.
- The design of IP Addresses is exactly this (Network ID + Host ID).
Trade-off:
- Pros: Scalability. Routers only need to store the path “to a certain region,” greatly reducing the size of routing tables.
- Cons: Sub-optimal paths. There might be a shortcut from Village A to Village B, but due to hierarchical routing, packets might have to detour to the “Provincial Capital” router before being distributed down.

2.4 Error Reporting and Probing

Although the Network Layer does not guarantee reliability, it tries its best to report errors (completed via Ping).

ICMP Protocol:
- TTL Exceeded: Prevents packets from circling infinitely in a loop. Each router decrements TTL by 1; when it hits 0, it is dropped and an error is reported.
- Application: Traceroute.
  - Principle: Deliberately send a packet with TTL=1, the first router drops it and returns ICMP; then send TTL=2, the second router returns… and so on, probing all IPs along the path.
Path MTU Discovery:
- Set the DF (Don’t Fragment) bit in the IP header. If the packet is too big to pass, the router reports an error and informs the MTU of that link segment. The source adjusts the packet size accordingly to avoid performance loss from fragmentation.

3. End-to-End Layer

The Transport Layer is responsible for converting the unreliable service of the lower layers into the reliable or real-time service required by upper-layer applications.

3.1 Ports and Multiplexing

IP Addresses only locate the Host.
Port numbers locate specific Processes/Services on the host (e.g., 80 for Web, 22 for SSH). This achieves Application-level Multiplexing.

3.2 Constructing Reliable Transmission

Reorganizing this section made me laugh; forcibly compressing a pile of Computer Network knowledge isn’t easy, haha.

How to achieve 100% reliable transmission on an unreliable network?

Anti-loss & Retransmission:
- ACK Mechanism: Send a confirmation upon receipt.
- Timer: How long after sending without an ACK counts as a loss?
- Challenge: RTT (Round-Trip Time) is dynamic.
- Algorithm: EWMA (Exponential Weighted Moving Average). Don’t just look at the most recent RTT; maintain a historical weighted average to smooth out jitter and calculate a reasonable timeout (RTO).
Anti-duplication:
- Nonce / Sequence Number: Number every packet. The receiver maintains a “maximum received sequence number” and directly drops duplicate packets smaller than this number.
Flow Control - Sliding Window:
- Stop-and-Wait: Send one packet, wait for one ACK. Efficiency is extremely low; the line is empty most of the time.
- Sliding Window:
  - Allows the sender to continuously send multiple packets (filling the pipeline) without receiving an ACK.
  - Window Size: Determined jointly by the receiver’s processing capability (reflected by the Receive Window) and network congestion.
  - Dynamic Adjustment: When the receiver can’t keep up, it notifies the sender to shrink the window, avoiding overwhelming the receiving end.

3.3 The Necessity of End-to-End Checksums

The slides specifically mentioned a classic question: Since the Link Layer already has CRC checks, why does TCP still need a Checksum?

Answer: The Link Layer only guarantees that data is correct while on the “wire.” However, during queuing, memory copying, and bus transmission inside routers, bit flips may occur. If the End-to-End layer does not perform a final check, these software or hardware errors will slip through, leading to file corruption.
Conclusion: Only checks performed at the communication endpoints (Application/Transport Layer) are truly reliable.

linda1729

https://linda1729-blog.netlify.app/posts/2026-01-13-system-engineering-network-layering/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source linda1729 !

计算机系统设计

Lesson 13：作为总体的网络系统

把网络当作一个整体系统，重新理解协议、运维与路由之间的协作。

2026-01-13 Computer System Engineering

计算机系统设计

Lesson 11：网络设计思想

抓住端到端原则与健壮性目标，理解网络为何会被这样设计。

2026-01-13 Computer System Engineering

计算机系统设计

Lesson 12：网络分层设计

Lesson 12 网络分层设计

1. 链路层：物理传输的基石

1.1 物理传输机制

1.2 关键权衡：MTU 与 BER

2. 网络层：全球互联

2.1 核心设计哲学

2.2 路由算法：如何建立转发表？

2.3 层级结构

2.4 错误报告与探测

3. 端到端层

3.1 端口与多路复用

3.2 构造可靠传输

3.3 端到端校验的必要性

4. 对照

Lesson 12 Network Layered Design

1. Link Layer: The Cornerstone of Physical Transmission

1.1 Physical Transmission Mechanisms

1.2 Key Trade-off: MTU vs. BER

2. Network Layer: Global Interconnection

2.1 Core Design Philosophy

2.2 Routing Algorithms: How to Build Forwarding Tables?

2.3 Hierarchy

2.4 Error Reporting and Probing

3. End-to-End Layer

3.1 Ports and Multiplexing

3.2 Constructing Reliable Transmission

3.3 The Necessity of End-to-End Checksums

你的赏识是我前进的动力