<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Blogs on</title><link>https://topology2333.github.io/blog/categories/blogs/</link><description>Recent content in Blogs on</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Tue, 14 Jan 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://topology2333.github.io/blog/categories/blogs/index.xml" rel="self" type="application/rss+xml"/><item><title>L3 cache mapping on Sandy Bridge CPUs, Mark Seaborn</title><link>https://topology2333.github.io/blog/posts/l3-cache/</link><pubDate>Tue, 14 Jan 2025 00:00:00 +0000</pubDate><guid>https://topology2333.github.io/blog/posts/l3-cache/</guid><description>&lt;h2 id="前言">前言&lt;/h2>
&lt;p>出自这篇 &lt;a href="https://lackingrhoticity.blogspot.com/2015/04/l3-cache-mapping-on-sandy-bridge-cpus.html">博客&lt;/a>, by Mark Seaborn, on Monday, 27 April 2015&lt;/p>
&lt;h2 id="简介">简介&lt;/h2>
&lt;p>Sandy Bridge 处理器的 L3 缓存（三级缓存）是多个核心共享的，通常位于每个处理器模块内。每个模块包含两个核心，多个模块构成一个处理器。L3 缓存的大小通常为 3MB、6MB 或 8MB，根据处理器型号的不同而有所不同。&lt;br>
在一些测试中，Sandy Bridge 的 L3 缓存使用的是分布式环形结构（NUCA, Non-Uniform Cache Architecture），不同核心之间可以共享缓存。由于这种架构，缓存访问的延迟会根据物理地址的映射和访问的核心而有所不同。&lt;/p>
&lt;p>L3 缓存映射：Sandy Bridge 的 L3 缓存被划分为多个缓存切片，每个核心对应一个缓存切片，处理器通过物理地址哈希算法决定每个地址映射到哪个缓存切片。这一机制是对内存访问的优化，减少了不同核心之间访问共享缓存的延迟。&lt;br>
行锤攻击：Sandy Bridge 的缓存架构和内存控制器特性可能被用于行锤攻击（Row Hammering）。在这种攻击中，攻击者可以通过频繁访问内存中的某些行，诱使 DRAM 出现位翻转，这可能导致数据损坏或安全漏洞。&lt;/p>
&lt;blockquote>
&lt;p>然而我的老电脑 Intel Core i5-7200U 不属于这个结构，而是 &lt;a href="https://www.intel.com/content/www/us/en/products/sku/95443/intel-core-i57200u-processor-3m-cache-up-to-3-10-ghz/specifications.html">Kaby Lake&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;h2 id="原文概述">原文概述&lt;/h2>
&lt;p>2013 年，一些研究人员逆向工程了 Intel Sandy Bridge CPU 如何将物理地址映射到 L3 缓存（最后一级缓存）中的缓存集合&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>。他们对缓存映射感兴趣，因为它可以用来绕过内核的 ASLR&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>。博客作者感兴趣的原因是，the cache mapping can be used to test whether cached memory accesses can do row hammering.&lt;/p>
&lt;h3 id="some-background">Some background&lt;/h3>
&lt;p>在 Sandy Bridge CPU 上，L3 缓存被划分为多个切片。物理地址通过哈希函数决定它们将存储在哪个 L3 缓存切片中。&lt;/p>
&lt;p>L3 缓存是分布式的，并且基于环形结构。每个核心有一个切片，但 CPU 中的所有核心都可以通过环形总线访问所有的缓存切片，环形总线将所有核心及其缓存连接在一起。&lt;/p>
&lt;p>当一个核心访问内存位置时，如果该位置映射到另一个核心的缓存切片上，访问速度会稍微变慢，因为需要绕过环形总线进行一到两次跳跃才能访问该位置。环形总线上使用的协议基于 QPI&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>&lt;sup id="fnref:4">&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref">4&lt;/a>&lt;/sup>&lt;/p>
&lt;p>每个缓存切片包含 2048 个缓存集合。在低端 CPU 上，缓存集合是 12 路关联的，因此一个缓存切片的大小为 1.5MB（2048 个集合，12 路 每个缓存行 64 字节 = 1.5MB）；在高端 CPU 上，缓存集合是 16 路关联的，因此一个缓存切片的大小为 2MB。&lt;/p>
&lt;h3 id="cache-mapping">Cache mapping&lt;/h3>
&lt;p>研究人员（Hund 等人）发现，L3 缓存使用物理地址的位如下：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>位 0-5&lt;/strong>：这 6 位表示在 64 字节缓存行内的字节偏移。&lt;/li>
&lt;li>&lt;strong>位 6-16&lt;/strong>：这 11 位表示缓存切片内的缓存集编号。&lt;/li>
&lt;li>&lt;strong>位 17-31&lt;/strong>：这些位经过哈希运算，决定使用哪个缓存切片。&lt;/li>
&lt;li>&lt;strong>位 32 及以后&lt;/strong>：未使用。&lt;/li>
&lt;/ul>
&lt;p>选择缓存切片的哈希函数如下：&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>在 4 核 CPU 上&lt;/strong>，有 4 个缓存切片，因此切片号是 2 位。切片号的两个位分别是 h1 和 h2，其中：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>h1&lt;/strong> 是物理地址位 18、19、21、23、25、27、29、30、31 的 XOR。&lt;/li>
&lt;li>&lt;strong>h2&lt;/strong> 是物理地址位 17、19、20、21、22、23、24、26、28、29、31 的 XOR。&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>在 2 核 CPU 上&lt;/strong>，有 2 个缓存切片，因此切片号是 1 位。切片号是物理地址位 17、18、20、22、24、25、26、27、28、30 的 XOR。这等同于 &lt;strong>h1&lt;/strong> 和 &lt;strong>h2&lt;/strong> 的 XOR。（位 19、21、23、29 和 31 在 XOR 计算时会相互抵消，这部分是博客作者发现的内容）&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="verifying-the-cache-mapping">Verifying the cache mapping&lt;/h3>
&lt;p>步骤如下：&lt;/p>
&lt;ul>
&lt;li>选择 N 个物理内存地址，这些地址根据我们猜测的缓存映射应该映射到同一个缓存集。&lt;/li>
&lt;li>使用 Linux 的 /proc/PID/pagemap 接口来确定我们可以访问哪些物理地址。&lt;/li>
&lt;li>测量访问这些 N 个地址所需的时间。具体来说，程序首先访问前 N-1 个地址，然后测量访问第 N 个地址的时间。&lt;/li>
&lt;li>程序针对多个 N 值进行测试。&lt;/li>
&lt;/ul>
&lt;p>如果正确猜测了缓存映射，那么，在具有 12 路缓存的 CPU 上，我们应该会看到在 N=13 时，内存访问时间大幅上升。这是因为，在 N=13 时，我们访问的内存位置已经不再适合 12 路缓存集，导致 L3 cache miss。内存访问时间将从 L3 缓存的延迟增加到 DRAM 的延迟。&lt;/p>
&lt;blockquote>
&lt;p>注意： 这也假设缓存使用 LRU or Pseudo-LRU eviction policy（Sandy Bridge 使用的策略）。然而，Ivy Bridge 的 cache eviction policy 发生了变化。&lt;/p>
&lt;/blockquote>
&lt;p>如果我们猜错了缓存映射，内存访问时间将以 N 的较高值逐渐上升。在一个 2 缓存片 CPU 上，如果我们得到的地址到片散列函数错误，我们将看到访问时间达到 DRAM 延迟 N = 13 * 2，平均，因为 N 个物理地址将分布在 2 个片上，所以在片上的 2 个缓存集溢出并产生缓存丢失之前，平均需要 13 * 2 个地址。&lt;/p>
&lt;h3 id="ivy-bridge">Ivy Bridge&lt;/h3>
&lt;p>这种 L3 缓存映射似乎同样适用于 Ivy Bridge 系列的 CPU。作者在配有 Ivy Bridge CPU 的机器上运行了相同的测试（2-core, 4-hyperthread），最初得到了相同的图形结果。然而，这些结果在该机器上并没有稳定复现。后续的测试显示，在 N&amp;lt;=12 时，内存访问时间更高。&lt;/p>
&lt;p>这与报告一致，说明 Ivy Bridge 的 L3 缓存使用了 DIP (Dynamic Insertion Policy)&lt;sup id="fnref:5">&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref">5&lt;/a>&lt;/sup> 作为其 cache eviction policy，以避免 cache thrashing。DIP 会在 LRU 和 BIP 之间动态切换：LRU 更适用于较小的工作集（可以完全装入缓存），而 BIP 更适用于较大的工作集（无法完全装入缓存）。对于 N&amp;gt;12，作者的测试可能会产生足够的缓存未命中，从而导致缓存切换到 BIP 模式。这意味着测试 N 值的顺序可能会影响最终结果。&lt;/p>
&lt;h3 id="thanks">Thanks&lt;/h3>
&lt;p>Thanks to Yossef Oren for pointing me to the paper by Hund et al, which is referenced by the paper he coauthored, &amp;ldquo;The Spy in the Sandbox &amp;ndash; Practical Cache Attacks in Javascript&amp;rdquo; (Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, Angelos D. Keromytis).&lt;/p>
&lt;blockquote>
&lt;p>附原作者致谢&lt;/p>
&lt;/blockquote>
&lt;h2 id="源码httpsgithubcomgooglerowhammer-testblob9a426c30ac2cc1bad0a6714e3e75e763bfdee4eacache_analysiscache_test_physaddrcc阅读">&lt;a href="https://github.com/google/rowhammer-test/blob/9a426c30ac2cc1bad0a6714e3e75e763bfdee4ea/cache_analysis/cache_test_physaddr.cc">源码&lt;/a>阅读&lt;/h2>
&lt;h3 id="frame_number_from_pagemap-init_pagemap-get_physical_addr">frame_number_from_pagemap, init_pagemap, get_physical_addr&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Extract the physical page number from a Linux /proc/PID/pagemap entry.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kt">uint64_t&lt;/span> &lt;span class="nf">frame_number_from_pagemap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">value&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">value&lt;/span> &lt;span class="o">&amp;amp;&lt;/span> &lt;span class="p">((&lt;/span>&lt;span class="mi">1ULL&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&lt;/span> &lt;span class="mi">54&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="c1">// 保留低 54 位
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kt">void&lt;/span> &lt;span class="nf">init_pagemap&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">g_pagemap_fd&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">open&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;/proc/self/pagemap&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">O_RDONLY&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">assert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">g_pagemap_fd&lt;/span> &lt;span class="o">&amp;gt;=&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kt">uint64_t&lt;/span> &lt;span class="nf">get_physical_addr&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">virtual_addr&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">value&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">off_t&lt;/span> &lt;span class="n">offset&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">virtual_addr&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="n">page_size&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="k">sizeof&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="c1">// 页表偏移
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">got&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pread&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">g_pagemap_fd&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="k">sizeof&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">offset&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="c1">// 读 8 个字节
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">assert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">got&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">8&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1">// Check the &amp;#34;page present&amp;#34; flag.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">assert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span> &lt;span class="o">&amp;amp;&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="mi">1ULL&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&lt;/span> &lt;span class="mi">63&lt;/span>&lt;span class="p">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">frame_num&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">frame_number_from_pagemap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">value&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">frame_num&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">page_size&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">virtual_addr&lt;/span> &lt;span class="o">&amp;amp;&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">page_size&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">));&lt;/span> &lt;span class="c1">// 物理页号，偏移量
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="get_cache_slice-in_same_cache_set">get_cache_slice, in_same_cache_set&lt;/h3>
&lt;p>哈希相关缓存位，计算物理地址对应的 cache slice&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="kt">int&lt;/span> &lt;span class="nf">get_cache_slice&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">phys_addr&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">bad_bit&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">static&lt;/span> &lt;span class="k">const&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">bits&lt;/span>&lt;span class="p">[]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="mi">17&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">18&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">20&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">22&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">24&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">25&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">26&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">27&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">28&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">30&lt;/span> &lt;span class="p">};&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">count&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">sizeof&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bits&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="k">sizeof&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">bits&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">hash&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">count&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="o">++&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">hash&lt;/span> &lt;span class="o">^=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">phys_addr&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">bits&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">])&lt;/span> &lt;span class="o">&amp;amp;&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">bad_bit&lt;/span> &lt;span class="o">!=&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">hash&lt;/span> &lt;span class="o">^=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">phys_addr&lt;/span> &lt;span class="o">&amp;gt;&amp;gt;&lt;/span> &lt;span class="n">bad_bit&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">&amp;amp;&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">hash&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>检查两个物理地址是否属于相同的 cache set，对比低 17 位是否相等 &amp;amp;&amp;amp; 所处的 cache slice 是否一样&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="kt">bool&lt;/span> &lt;span class="nf">in_same_cache_set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">phys1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">phys2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">bad_bit&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">mask&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">((&lt;/span>&lt;span class="kt">uint64_t&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="mi">1&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&lt;/span> &lt;span class="mi">17&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">((&lt;/span>&lt;span class="n">phys1&lt;/span> &lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">mask&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">phys2&lt;/span> &lt;span class="o">&amp;amp;&lt;/span> &lt;span class="n">mask&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">&amp;amp;&amp;amp;&lt;/span> &lt;span class="n">get_cache_slice&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">phys1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bad_bit&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">get_cache_slice&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">phys2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bad_bit&lt;/span>&lt;span class="p">));&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="time_access-timing">time_access, timing&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Execute a CPU memory barrier. This is an attempt to prevent memory
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// accesses from being reordered, in case reordering affects what gets
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// evicted from the cache. It&amp;#39;s also an attempt to ensure we&amp;#39;re
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// measuring the time for a single memory access.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">//
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// However, this appears to be unnecessary on Sandy Bridge CPUs, since
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// we get the same shape graph without this. （这是为什么呢？）
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kr">inline&lt;/span> &lt;span class="kt">void&lt;/span> &lt;span class="nf">mfence&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">asm&lt;/span> &lt;span class="k">volatile&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;mfence&amp;#34;&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">// Measure the time taken to access the given address, in nanoseconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="kt">int&lt;/span> &lt;span class="nf">time_access&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">ptr&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">struct&lt;/span> &lt;span class="nc">timespec&lt;/span> &lt;span class="n">ts0&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">rc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">clock_gettime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">CLOCK_MONOTONIC&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="n">ts0&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">assert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">rc&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">g_dummy&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">volatile&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="n">ptr&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">mfence&lt;/span>&lt;span class="p">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">struct&lt;/span> &lt;span class="nc">timespec&lt;/span> &lt;span class="n">ts&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">rc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">clock_gettime&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">CLOCK_MONOTONIC&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">assert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">rc&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">tv_sec&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">ts0&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">tv_sec&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">1000000000&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">ts&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">tv_nsec&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">ts0&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">tv_nsec&lt;/span>&lt;span class="p">);&lt;/span> &lt;span class="c1">// 合成秒&amp;amp;纳秒差
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;blockquote>
&lt;p>关于单调时钟 CLOCK_MONOTONIC：A nonsettable system-wide clock that represents monotonic time since—as described by POSIX—&amp;ldquo;some unspecified point in the past&amp;rdquo;. On Linux, that point corresponds to the number of seconds that the system has been running since it was booted.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;p>测量多个内存地址的访问时间。通过在给定地址集上进行多次内存访问，测量缓存是否被命中，以及时间的变化。&lt;br>
取到第一个物理地址之后，筛选所有和他在同一个 cache set 的物理地址。做 10 次测量，取中位数时间。&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-cpp" data-lang="cpp">&lt;span class="line">&lt;span class="cl">&lt;span class="kt">int&lt;/span> &lt;span class="nf">timing&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span> &lt;span class="n">addr_count&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="n">bad_bit&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">size_t&lt;/span> &lt;span class="n">size&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">16&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&lt;/span> &lt;span class="mi">20&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="c1">// 分配 16MB 内存
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">buf&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">uintptr_t&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="n">mmap&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="nb">NULL&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">size&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">PROT_READ&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">PROT_WRITE&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">MAP_PRIVATE&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">MAP_ANONYMOUS&lt;/span> &lt;span class="o">|&lt;/span> &lt;span class="n">MAP_POPULATE&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">assert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">buf&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">addrs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">addr_count&lt;/span>&lt;span class="p">];&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">addrs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">buf&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">phys1&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">get_physical_addr&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">addrs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">next_addr&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">buf&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">page_size&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">end_addr&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">buf&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">size&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">found&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">while&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">found&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">addr_count&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">uintptr_t&lt;/span> &lt;span class="n">addr&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">next_addr&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">next_addr&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="n">page_size&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">uint64_t&lt;/span> &lt;span class="n">phys2&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">get_physical_addr&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">addr&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="n">in_same_cache_set&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">phys1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">phys2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">bad_bit&lt;/span>&lt;span class="p">))&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">addrs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">found&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">addr&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">found&lt;/span>&lt;span class="o">++&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">runs&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">10&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">times&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">runs&lt;/span>&lt;span class="p">];&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span> &lt;span class="n">run&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="n">run&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">runs&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="n">run&lt;/span>&lt;span class="o">++&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">g_dummy&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">volatile&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="n">addrs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">];&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">mfence&lt;/span>&lt;span class="p">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="p">(&lt;/span>&lt;span class="kt">int&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="o">&amp;lt;&lt;/span> &lt;span class="n">addr_count&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="o">++&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="p">{&lt;/span> &lt;span class="c1">// 访问一轮
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="n">g_dummy&lt;/span> &lt;span class="o">+=&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="k">volatile&lt;/span> &lt;span class="kt">int&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="n">addrs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">i&lt;/span>&lt;span class="p">];&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">mfence&lt;/span>&lt;span class="p">();&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">times&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">run&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">time_access&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">addrs&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]);&lt;/span> &lt;span class="c1">// 重新访问 addrs[0]
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">&lt;/span> &lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">std&lt;/span>&lt;span class="o">::&lt;/span>&lt;span class="n">sort&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">times&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="o">&amp;amp;&lt;/span>&lt;span class="n">times&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">runs&lt;/span>&lt;span class="p">]);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">median_time&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">times&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">runs&lt;/span> &lt;span class="o">/&lt;/span> &lt;span class="mi">2&lt;/span>&lt;span class="p">];&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="kt">int&lt;/span> &lt;span class="n">rc&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">munmap&lt;/span>&lt;span class="p">((&lt;/span>&lt;span class="kt">void&lt;/span> &lt;span class="o">*&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="n">buf&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">size&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">assert&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">rc&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">);&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">median_time&lt;/span>&lt;span class="p">;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="todo">TODO&lt;/h2>
&lt;ul>
&lt;li>添加本机测试&lt;/li>
&lt;li>下一篇相关 &lt;a href="https://blog.stuffedcow.net/2013/01/ivb-cache-replacement/">博客&lt;/a>&lt;/li>
&lt;/ul>
&lt;div class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1">
&lt;p>&lt;a href="https://ieeexplore.ieee.org/document/6547110?reload=true&amp;amp;arnumber=6547110">https://ieeexplore.ieee.org/document/6547110?reload=true&amp;amp;arnumber=6547110&lt;/a>&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2">
&lt;p>&lt;a href="https://en.wikipedia.org/wiki/Address_space_layout_randomization">Address_space_layout_randomization&lt;/a>&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3">
&lt;p>Intel 的 &lt;a href="https://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect">QuickPath Interconnect&lt;/a>，其设计目标是替代之前的“前端总线”技术，以实现快速路径互连。QPI 是一种用于在高端多插槽系统中连接多个 CPU 的协议。后于 2017 年，在 Skylake-SP Xeon 平台上，QPI 被 Intel Ultra Path Interconnect（UPI）替代。&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:4">
&lt;p>非常遗憾的是，我并没有找到 Intel Core i5-7200U 的相关说明。只找到了 i7 某些型号采用了 QPI 的文档 &lt;a href="https://www.intel.com/content/dam/develop/external/us/en/documents/performance-analysis-guide-181827.pdf">Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors&lt;/a>，pg5 有配图，说明了不同 LLC 通过 QPI 的联系。（还找到一篇相关&lt;a href="https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Core-to-Core-Communication-Latency-in-Skylake-Kaby-Lake/m-p/1061658">博客&lt;/a>，待阅读）&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:5">
&lt;p>找到了一篇有关于 DIP 的 &lt;a href="https://www.cs.cmu.edu/afs/cs/academic/class/15740-f18/www/papers/isca07-qureshi-dip.pdf">论文&lt;/a> 以及 &lt;a href="https://www.eecg.utoronto.ca/~moshovos/000/lib/exe/fetch.php?media=wiki:aca2017:cache_insertion_policies.pptx">PPT&lt;/a>，还有 &lt;a href="https://medium.com/@arpitguptarag/adaptive-insertion-policies-for-high-performance-caching-a741c52f515c">博客&lt;/a>&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/div></description></item></channel></rss>