Lesson 08:虚拟链路(并发与锁)


Lesson 08 虚拟链路(并发与锁)

在并发的世界里,唯一可以确定的就是不确定性。我们将链路抽象为虚拟链路,但在实现多任务通信(如本地 RPC)时存在一个棘手的问题:如果多个线程同时读写同一块内存区域(缓冲区),数据就会乱套。

这一讲的核心在于协调,我们将看到系统如何从理想化的“正确性假设”一步步走向现实,并引入这一关键机制来对抗竞态条件(Race Condition)

1. 6大正确性假设

在初级版本的虚拟链路设计中,我们默认系统是完美的,并基于以下 6 个假设编写代码:

  1. 单线程收发:每个缓冲区只有一个写者和一个读者(单一写原则)。
  2. 独占处理器:每个线程都有自己独立的物理处理器,不涉及复杂的抢占。
  3. 不溢出inout 指针的增加不会溢出,算法逻辑永远正确。
  4. 读写连贯性 (Coherence):一旦数据写入内存,后续的读操作能立刻看到最新值。
  5. 时间先后原子性 (Atomicity/Isolation):即使是跨越多个机器字长的数据读写,也是不可分割的(不会读到一半的新数据和一半的旧数据)。
  6. 指令不重排:编译器和 CPU 不会为了优化性能而交换指令的执行顺序。

2. 竞态条件:当假设破灭

一旦去掉这些假设(尤其是假设 1 和 5),系统就会陷入竞态条件

  • Heisenbug:这类并发错误难以检测,因为它们只在特定的时序下出现,一旦你加入调试代码,时序改变,错误可能就消失了。
  • “1 vs 5”的噩梦:如果数据的长度超过了机器字长(例如 64 位系统写 128 位数据),在没有原子性保证的情况下,读者可能在写者更新了一半时就读取了数据,导致数据损坏。

3. 锁 (Lock):强制建立原子性

为了对抗竞态条件,我们引入了。它是一种人为制造的时间先后原子性。

  • 策略与颗粒度
    • 粗粒度锁:一个锁保护所有资源(如 Python 的 GIL)。简单安全,但并发性差,本质上是将并行变成了串行。
    • 细粒度锁:多个锁保护不同的变量。并发性高,但复杂度飙升,极易引发死锁
  • 死锁与消除
    • 成因:循环等待(A 等 B,B 等 A)。
    • 方案:静态方案(全局统一加锁顺序)或动态方案(维护等待图 Wait-for Graph,检测到回路则回滚)。

4. 锁的底层实现:自举

锁不能靠简单的 if 语句实现,它需要硬件和软件的共同配合。

  • 硬件锁 (RSM 指令)
    • 核心是 Read-Set-Modify (RSM)。硬件通过总线仲裁确保一条指令能同时完成读和写,且在执行期间抢占总线,不让其他处理器插足。
    • 常见形式:test-and-set, compare_and_swap (CAS)
  • 软件锁 (Lamport 方案)
    • 在没有 RSM 指令时,依靠标志位 (flag) 和轮转位 (turn) 实现。前提是必须保证读写连贯且指令不重排。
  • 高级技巧:RCU
    • Linux 内核常用。读操作完全不加锁,写操作通过创建副本并原子替换指针。

5. 对照

In the world of concurrency, the only certainty is uncertainty. We abstract links into virtual links, but a thorny problem arises when implementing multi-task communication (such as local RPC): if multiple threads read from and write to the same memory area (buffer) simultaneously, the data becomes a mess.

The core of this lesson is coordination. We will see how systems transition step-by-step from idealized “correctness assumptions” to reality, and how the Lock is introduced as a key mechanism to combat Race Conditions.

1. The 6 Correctness Assumptions

In the primary design of virtual links, we assume the system is perfect and write code based on the following six assumptions:

  1. Single-threaded Transmission/Reception: Each buffer has only one writer and one reader (the Single Writer Principle).
  2. Dedicated Processors: Each thread has its own independent physical processor, involving no complex preemption.
  3. No Overflow: The incrementing of in and out pointers never overflows; the algorithmic logic is always correct.
  4. Read/Write Coherence: Once data is written to memory, subsequent read operations can see the latest value immediately.
  5. Temporal Atomicity/Isolation: Even for data reads and writes spanning multiple machine word lengths, the operation is indivisible (you won’t read half-new and half-old data).
  6. No Instruction Reordering: The compiler and CPU will not swap the execution order of instructions to optimize performance.

2. Race Conditions: When Assumptions Break

Once these assumptions are removed (especially assumptions 1 and 5), the system falls into a Race Condition.

  • Heisenbug: These types of concurrency errors are difficult to detect because they only appear under specific timings. Once you add debugging code, the timing changes, and the error may disappear.
  • The “1 vs. 5” Nightmare: If the data length exceeds the machine word length (e.g., writing 128-bit data on a 64-bit system), without an atomicity guarantee, a reader might access the data while the writer has only updated half of it, leading to data corruption.

3. Locks: Enforcing Atomicity

To combat race conditions, we introduce Locks. A lock is an artificially manufactured form of temporal atomicity.

  • Strategy and Granularity:
    • Coarse-grained Lock: One lock protects all resources (e.g., Python’s GIL). It is simple and safe but offers poor concurrency, essentially turning parallel execution into serial execution.
    • Fine-grained Lock: Multiple locks protect different variables. This offers high concurrency but causes complexity to skyrocket, easily leading to Deadlocks.
  • Deadlock and Elimination:
    • Cause: Cyclic waiting (A waits for B, B waits for A).
    • Solutions: Static schemes (enforcing a global locking order) or dynamic schemes (maintaining a Wait-for Graph and rolling back when a cycle is detected).

4. Bottom-level Implementation of Locks: Bootstrapping

Locks cannot be implemented with simple if statements; they require the cooperation of both hardware and software.

  • Hardware Locks (RSM Instructions):
    • The core is Read-Set-Modify (RSM). The hardware uses Bus Arbitration to ensure that a single instruction can complete both a read and a write simultaneously, preempting the bus during execution so no other processor can intervene.
    • Common forms: test-and-set, compare_and_swap (CAS).
  • Software Lock (Lamport’s Scheme):
    • In the absence of RSM instructions, locks are implemented using flags and “turn” variables. This requires guaranteed read/write coherence and no instruction reordering.
  • Advanced Technique: RCU (Read-Copy Update):
    • Commonly used in the Linux kernel. Read operations are completely lock-free; write operations create a copy and then atomically replace the pointer.

Author: linda1729
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source linda1729 !
评论
  TOC