A process migrated from CPU 0 to CPU 1 would find its L1 cache cold. It would run 3x slower for the first 10ms.
The PDF introduced mb() (memory barrier) macros to Unix kernel headers for the first time. Chapter 3: Scalable Synchronization – Spinlocks vs. Sleep Locks Modern architectures had hundreds of cycles of cache-coherency traffic. A spinlock on a bus-based system was fine. On a NUMA (Non-Uniform Memory Access) machine? Suicide. unix systems for modern architectures -1994- pdf
Today, as we run workloads on 192-core ARM servers and GPUs with 18,000 threads, we are still fighting the same war. The architectures are more "modern," but the PDF from 1994 remains the Rosetta Stone. A process migrated from CPU 0 to CPU
This article is written for systems engineers, retrocomputing enthusiasts, and students of operating system design. It treats the search query as a gateway to a specific, pivotal moment in computing history. Introduction: A Phrase Frozen in Time In the vast, ephemeral archive of the internet, certain keyword strings act as time capsules. The search query "unix systems for modern architectures -1994- pdf" is one of them. Chapter 3: Scalable Synchronization – Spinlocks vs
In 1994, a systems engineer had to understand the difference between a store buffer and a write combine buffer. They had to know that a branch mispredict on an R4000 cost the same as 30 NOPs on a 386. They learned that a global lock was a moral failure.