You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

250 lines
14 KiB

1 week ago
<!DOCTYPE HTML>
<html lang="zh-CN" class="light sidebar-visible" dir="ltr">
<head>
<!-- Book generated using mdBook -->
<meta charset="UTF-8">
<title>警惕 UTF-8 引发的性能隐患 - Rust语言圣经(Rust Course)</title>
<!-- Custom HTML head -->
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#ffffff">
<link rel="icon" href="../../favicon.svg">
<link rel="shortcut icon" href="../../favicon.png">
<link rel="stylesheet" href="../../css/variables.css">
<link rel="stylesheet" href="../../css/general.css">
<link rel="stylesheet" href="../../css/chrome.css">
<link rel="stylesheet" href="../../css/print.css" media="print">
<!-- Fonts -->
<link rel="stylesheet" href="../../FontAwesome/css/font-awesome.css">
<link rel="stylesheet" href="../../fonts/fonts.css">
<!-- Highlight.js Stylesheets -->
<link rel="stylesheet" id="highlight-css" href="../../highlight.css">
<link rel="stylesheet" id="tomorrow-night-css" href="../../tomorrow-night.css">
<link rel="stylesheet" id="ayu-highlight-css" href="../../ayu-highlight.css">
<!-- Custom theme stylesheets -->
<link rel="stylesheet" href="../../theme/style.css">
<!-- Provide site root and default themes to javascript -->
<script>
const path_to_root = "../../";
const default_light_theme = "light";
const default_dark_theme = "navy";
</script>
<!-- Start loading toc.js asap -->
<script src="../../toc.js"></script>
</head>
<body>
<div id="body-container">
<!-- Work around some values being stored in localStorage wrapped in quotes -->
<script>
try {
let theme = localStorage.getItem('mdbook-theme');
let sidebar = localStorage.getItem('mdbook-sidebar');
if (theme.startsWith('"') && theme.endsWith('"')) {
localStorage.setItem('mdbook-theme', theme.slice(1, theme.length - 1));
}
if (sidebar.startsWith('"') && sidebar.endsWith('"')) {
localStorage.setItem('mdbook-sidebar', sidebar.slice(1, sidebar.length - 1));
}
} catch (e) { }
</script>
<!-- Set the theme before any content is loaded, prevents flash -->
<script>
const default_theme = window.matchMedia("(prefers-color-scheme: dark)").matches ? default_dark_theme : default_light_theme;
let theme;
try { theme = localStorage.getItem('mdbook-theme'); } catch(e) { }
if (theme === null || theme === undefined) { theme = default_theme; }
const html = document.documentElement;
html.classList.remove('light')
html.classList.add(theme);
html.classList.add("js");
</script>
<input type="checkbox" id="sidebar-toggle-anchor" class="hidden">
<!-- Hide / unhide sidebar before it is displayed -->
<script>
let sidebar = null;
const sidebar_toggle = document.getElementById("sidebar-toggle-anchor");
if (document.body.clientWidth >= 1080) {
try { sidebar = localStorage.getItem('mdbook-sidebar'); } catch(e) { }
sidebar = sidebar || 'visible';
} else {
sidebar = 'hidden';
}
sidebar_toggle.checked = sidebar === 'visible';
html.classList.remove('sidebar-visible');
html.classList.add("sidebar-" + sidebar);
</script>
<nav id="sidebar" class="sidebar" aria-label="Table of contents">
<!-- populated by js -->
<mdbook-sidebar-scrollbox class="sidebar-scrollbox"></mdbook-sidebar-scrollbox>
<noscript>
<iframe class="sidebar-iframe-outer" src="../../toc.html"></iframe>
</noscript>
<div id="sidebar-resize-handle" class="sidebar-resize-handle">
<div class="sidebar-resize-indicator"></div>
</div>
</nav>
<div id="page-wrapper" class="page-wrapper">
<div class="page">
<div id="menu-bar-hover-placeholder"></div>
<div id="menu-bar" class="menu-bar sticky">
<div class="left-buttons">
<label id="sidebar-toggle" class="icon-button" for="sidebar-toggle-anchor" title="Toggle Table of Contents" aria-label="Toggle Table of Contents" aria-controls="sidebar">
<i class="fa fa-bars"></i>
</label>
<button id="theme-toggle" class="icon-button" type="button" title="Change theme" aria-label="Change theme" aria-haspopup="true" aria-expanded="false" aria-controls="theme-list">
<i class="fa fa-paint-brush"></i>
</button>
<ul id="theme-list" class="theme-popup" aria-label="Themes" role="menu">
<li role="none"><button role="menuitem" class="theme" id="default_theme">Auto</button></li>
<li role="none"><button role="menuitem" class="theme" id="light">Light</button></li>
<li role="none"><button role="menuitem" class="theme" id="rust">Rust</button></li>
<li role="none"><button role="menuitem" class="theme" id="coal">Coal</button></li>
<li role="none"><button role="menuitem" class="theme" id="navy">Navy</button></li>
<li role="none"><button role="menuitem" class="theme" id="ayu">Ayu</button></li>
</ul>
<button id="search-toggle" class="icon-button" type="button" title="Search. (Shortkey: s)" aria-label="Toggle Searchbar" aria-expanded="false" aria-keyshortcuts="S" aria-controls="searchbar">
<i class="fa fa-search"></i>
</button>
</div>
<h1 class="menu-title">Rust语言圣经(Rust Course)</h1>
<div class="right-buttons">
<a href="../../print.html" title="Print this book" aria-label="Print this book">
<i id="print-button" class="fa fa-print"></i>
</a>
<a href="https://github.com/sunface/rust-course" title="Git repository" aria-label="Git repository">
<i id="git-repository-button" class="fa fa-github"></i>
</a>
<a href="https://github.com/sunface/rust-course/edit/main/src/compiler/pitfalls/utf8-performance.md" title="Suggest an edit" aria-label="Suggest an edit">
<i id="git-edit-button" class="fa fa-edit"></i>
</a>
</div>
</div>
<div id="search-wrapper" class="hidden">
<form id="searchbar-outer" class="searchbar-outer">
<input type="search" id="searchbar" name="searchbar" placeholder="Search this book ..." aria-controls="searchresults-outer" aria-describedby="searchresults-header">
</form>
<div id="searchresults-outer" class="searchresults-outer hidden">
<div id="searchresults-header" class="searchresults-header"></div>
<ul id="searchresults">
</ul>
</div>
</div>
<!-- Apply ARIA attributes after the sidebar and the sidebar toggle button are added to the DOM -->
<script>
document.getElementById('sidebar-toggle').setAttribute('aria-expanded', sidebar === 'visible');
document.getElementById('sidebar').setAttribute('aria-hidden', sidebar !== 'visible');
Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) {
link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1);
});
</script>
<div id="content" class="content">
<main>
<h1 id="警惕-utf-8-引发的性能隐患"><a class="header" href="#警惕-utf-8-引发的性能隐患">警惕 UTF-8 引发的性能隐患</a></h1>
<p>大家应该都知道, 虽然 Rust 的字符串 <code>&amp;str</code><code>String</code> 在底层是通过 <code>Vec&lt;u8&gt;</code> 实现的:字符串数据以字节数组的形式存在堆上,但在使用时,它们都是 UTF-8 编码的,例如:</p>
<pre><pre class="playground"><code class="language-rust edition2021">fn main() {
let s: &amp;str = "中国人";
for c in s.chars() {
println!("{}", c) // 依次输出:中 、 国 、 人
}
let c = &amp;s[0..3]; // 1. "中" 在 UTF-8 中占用 3 个字节 2. Rust 不支持字符串索引,因此只能通过切片的方式获取 "中"
assert_eq!(c, "中");
}</code></pre></pre>
<p>从上述代码可以很清晰看出Rust 的字符串确实是 UTF-8 编码的,这就带来一个隐患:可能在某个转角,你就会遇到来自糟糕性能的示爱。</p>
<h2 id="问题描述--解决"><a class="header" href="#问题描述--解决">问题描述 &amp; 解决</a></h2>
<p>例如我们尝试写一个词法解析器,里面用到了以下代码 <code>self.source.chars().nth(self.index).unwrap();</code> 去获取下一个需要处理的字符,大家可能会以为 <code>.nth</code> 的访问应该非常快吧?事实上它确实很快,但是并不妨碍这段代码在循环处理 70000 长度的字符串时,需要消耗 5s 才能完成!</p>
<p>这么看来,唯一的问题就在于 <code>.chars()</code> 上了。</p>
<p>其实原因很简单,简单到我们不需要用代码来说明,只需要文字描述即可传达足够的力量:每一次循环时,<code>.chars().nth(index)</code> 都需要对字符串进行一次 UTF-8 解析,这个解析实际上是相当昂贵的,特别是当配合循环时,算法的复杂度就是平方级的。</p>
<p>既然找到原因,那解决方法也很简单:只要将 <code>self.source.chars()</code> 的迭代器存储起来就行,这样每次 <code>.nth</code> 调用都会复用已经解析好的迭代器,而不是重新去解析一次 UTF-8 字符串。</p>
<p>当然,我们还可以使用三方库来解决这个问题,例如 <a href="https://crates.io/crates/str_indices">str_indices</a></p>
<h2 id="总结"><a class="header" href="#总结">总结</a></h2>
<p>最终的优化结果如下:</p>
<ul>
<li>保存迭代器后: 耗时 <code>5s</code> -&gt; <code>4ms</code></li>
<li>进一步使用 <code>u8</code> 字节数组来替换 <code>char</code>,最后使用 <code>String::from_utf8</code> 来构建 UTF-8 字符串: 耗时 <code>4ms</code> -&gt; <code>400us</code></li>
</ul>
<p><strong>肉眼可见的巨大提升12500 倍!</strong></p>
<p>总之,我们在热点路径中使用字符串做 UTF-8 的相关操作时,就算不提前优化,也要做到心里有数,这样才能在问题发生时,进退自如。</p>
</main>
<nav class="nav-wrapper" aria-label="Page navigation">
<!-- Mobile navigation buttons -->
<a rel="prev" href="../../compiler/pitfalls/main-with-channel-blocked.html" class="mobile-nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
<i class="fa fa-angle-left"></i>
</a>
<a rel="next prefetch" href="../../profiling/intro.html" class="mobile-nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
<i class="fa fa-angle-right"></i>
</a>
<div style="clear: both"></div>
</nav>
</div>
</div>
<nav class="nav-wide-wrapper" aria-label="Page navigation">
<a rel="prev" href="../../compiler/pitfalls/main-with-channel-blocked.html" class="nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
<i class="fa fa-angle-left"></i>
</a>
<a rel="next prefetch" href="../../profiling/intro.html" class="nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
<i class="fa fa-angle-right"></i>
</a>
</nav>
</div>
<script>
window.playground_copyable = true;
</script>
<script src="../../ace.js"></script>
<script src="../../editor.js"></script>
<script src="../../mode-rust.js"></script>
<script src="../../theme-dawn.js"></script>
<script src="../../theme-tomorrow_night.js"></script>
<script src="../../elasticlunr.min.js"></script>
<script src="../../mark.min.js"></script>
<script src="../../searcher.js"></script>
<script src="../../clipboard.min.js"></script>
<script src="../../highlight.js"></script>
<script src="../../book.js"></script>
<!-- Custom JS scripts -->
<script src="../../assets/custom2.js"></script>
<script src="../../assets/bigPicture.js"></script>
</div>
</body>
</html>