You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

250 lines
14 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

<!DOCTYPE HTML>
<html lang="zh-CN" class="light sidebar-visible" dir="ltr">
<head>
<!-- Book generated using mdBook -->
<meta charset="UTF-8">
<title>警惕 UTF-8 引发的性能隐患 - Rust语言圣经(Rust Course)</title>
<!-- Custom HTML head -->
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#ffffff">
<link rel="icon" href="../../favicon.svg">
<link rel="shortcut icon" href="../../favicon.png">
<link rel="stylesheet" href="../../css/variables.css">
<link rel="stylesheet" href="../../css/general.css">
<link rel="stylesheet" href="../../css/chrome.css">
<link rel="stylesheet" href="../../css/print.css" media="print">
<!-- Fonts -->
<link rel="stylesheet" href="../../FontAwesome/css/font-awesome.css">
<link rel="stylesheet" href="../../fonts/fonts.css">
<!-- Highlight.js Stylesheets -->
<link rel="stylesheet" id="highlight-css" href="../../highlight.css">
<link rel="stylesheet" id="tomorrow-night-css" href="../../tomorrow-night.css">
<link rel="stylesheet" id="ayu-highlight-css" href="../../ayu-highlight.css">
<!-- Custom theme stylesheets -->
<link rel="stylesheet" href="../../theme/style.css">
<!-- Provide site root and default themes to javascript -->
<script>
const path_to_root = "../../";
const default_light_theme = "light";
const default_dark_theme = "navy";
</script>
<!-- Start loading toc.js asap -->
<script src="../../toc.js"></script>
</head>
<body>
<div id="body-container">
<!-- Work around some values being stored in localStorage wrapped in quotes -->
<script>
try {
let theme = localStorage.getItem('mdbook-theme');
let sidebar = localStorage.getItem('mdbook-sidebar');
if (theme.startsWith('"') && theme.endsWith('"')) {
localStorage.setItem('mdbook-theme', theme.slice(1, theme.length - 1));
}
if (sidebar.startsWith('"') && sidebar.endsWith('"')) {
localStorage.setItem('mdbook-sidebar', sidebar.slice(1, sidebar.length - 1));
}
} catch (e) { }
</script>
<!-- Set the theme before any content is loaded, prevents flash -->
<script>
const default_theme = window.matchMedia("(prefers-color-scheme: dark)").matches ? default_dark_theme : default_light_theme;
let theme;
try { theme = localStorage.getItem('mdbook-theme'); } catch(e) { }
if (theme === null || theme === undefined) { theme = default_theme; }
const html = document.documentElement;
html.classList.remove('light')
html.classList.add(theme);
html.classList.add("js");
</script>
<input type="checkbox" id="sidebar-toggle-anchor" class="hidden">
<!-- Hide / unhide sidebar before it is displayed -->
<script>
let sidebar = null;
const sidebar_toggle = document.getElementById("sidebar-toggle-anchor");
if (document.body.clientWidth >= 1080) {
try { sidebar = localStorage.getItem('mdbook-sidebar'); } catch(e) { }
sidebar = sidebar || 'visible';
} else {
sidebar = 'hidden';
}
sidebar_toggle.checked = sidebar === 'visible';
html.classList.remove('sidebar-visible');
html.classList.add("sidebar-" + sidebar);
</script>
<nav id="sidebar" class="sidebar" aria-label="Table of contents">
<!-- populated by js -->
<mdbook-sidebar-scrollbox class="sidebar-scrollbox"></mdbook-sidebar-scrollbox>
<noscript>
<iframe class="sidebar-iframe-outer" src="../../toc.html"></iframe>
</noscript>
<div id="sidebar-resize-handle" class="sidebar-resize-handle">
<div class="sidebar-resize-indicator"></div>
</div>
</nav>
<div id="page-wrapper" class="page-wrapper">
<div class="page">
<div id="menu-bar-hover-placeholder"></div>
<div id="menu-bar" class="menu-bar sticky">
<div class="left-buttons">
<label id="sidebar-toggle" class="icon-button" for="sidebar-toggle-anchor" title="Toggle Table of Contents" aria-label="Toggle Table of Contents" aria-controls="sidebar">
<i class="fa fa-bars"></i>
</label>
<button id="theme-toggle" class="icon-button" type="button" title="Change theme" aria-label="Change theme" aria-haspopup="true" aria-expanded="false" aria-controls="theme-list">
<i class="fa fa-paint-brush"></i>
</button>
<ul id="theme-list" class="theme-popup" aria-label="Themes" role="menu">
<li role="none"><button role="menuitem" class="theme" id="default_theme">Auto</button></li>
<li role="none"><button role="menuitem" class="theme" id="light">Light</button></li>
<li role="none"><button role="menuitem" class="theme" id="rust">Rust</button></li>
<li role="none"><button role="menuitem" class="theme" id="coal">Coal</button></li>
<li role="none"><button role="menuitem" class="theme" id="navy">Navy</button></li>
<li role="none"><button role="menuitem" class="theme" id="ayu">Ayu</button></li>
</ul>
<button id="search-toggle" class="icon-button" type="button" title="Search. (Shortkey: s)" aria-label="Toggle Searchbar" aria-expanded="false" aria-keyshortcuts="S" aria-controls="searchbar">
<i class="fa fa-search"></i>
</button>
</div>
<h1 class="menu-title">Rust语言圣经(Rust Course)</h1>
<div class="right-buttons">
<a href="../../print.html" title="Print this book" aria-label="Print this book">
<i id="print-button" class="fa fa-print"></i>
</a>
<a href="https://github.com/sunface/rust-course" title="Git repository" aria-label="Git repository">
<i id="git-repository-button" class="fa fa-github"></i>
</a>
<a href="https://github.com/sunface/rust-course/edit/main/src/compiler/pitfalls/utf8-performance.md" title="Suggest an edit" aria-label="Suggest an edit">
<i id="git-edit-button" class="fa fa-edit"></i>
</a>
</div>
</div>
<div id="search-wrapper" class="hidden">
<form id="searchbar-outer" class="searchbar-outer">
<input type="search" id="searchbar" name="searchbar" placeholder="Search this book ..." aria-controls="searchresults-outer" aria-describedby="searchresults-header">
</form>
<div id="searchresults-outer" class="searchresults-outer hidden">
<div id="searchresults-header" class="searchresults-header"></div>
<ul id="searchresults">
</ul>
</div>
</div>
<!-- Apply ARIA attributes after the sidebar and the sidebar toggle button are added to the DOM -->
<script>
document.getElementById('sidebar-toggle').setAttribute('aria-expanded', sidebar === 'visible');
document.getElementById('sidebar').setAttribute('aria-hidden', sidebar !== 'visible');
Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) {
link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1);
});
</script>
<div id="content" class="content">
<main>
<h1 id="警惕-utf-8-引发的性能隐患"><a class="header" href="#警惕-utf-8-引发的性能隐患">警惕 UTF-8 引发的性能隐患</a></h1>
<p>大家应该都知道, 虽然 Rust 的字符串 <code>&amp;str</code><code>String</code> 在底层是通过 <code>Vec&lt;u8&gt;</code> 实现的:字符串数据以字节数组的形式存在堆上,但在使用时,它们都是 UTF-8 编码的,例如:</p>
<pre><pre class="playground"><code class="language-rust edition2021">fn main() {
let s: &amp;str = "中国人";
for c in s.chars() {
println!("{}", c) // 依次输出:中 、 国 、 人
}
let c = &amp;s[0..3]; // 1. "中" 在 UTF-8 中占用 3 个字节 2. Rust 不支持字符串索引,因此只能通过切片的方式获取 "中"
assert_eq!(c, "中");
}</code></pre></pre>
<p>从上述代码可以很清晰看出Rust 的字符串确实是 UTF-8 编码的,这就带来一个隐患:可能在某个转角,你就会遇到来自糟糕性能的示爱。</p>
<h2 id="问题描述--解决"><a class="header" href="#问题描述--解决">问题描述 &amp; 解决</a></h2>
<p>例如我们尝试写一个词法解析器,里面用到了以下代码 <code>self.source.chars().nth(self.index).unwrap();</code> 去获取下一个需要处理的字符,大家可能会以为 <code>.nth</code> 的访问应该非常快吧?事实上它确实很快,但是并不妨碍这段代码在循环处理 70000 长度的字符串时,需要消耗 5s 才能完成!</p>
<p>这么看来,唯一的问题就在于 <code>.chars()</code> 上了。</p>
<p>其实原因很简单,简单到我们不需要用代码来说明,只需要文字描述即可传达足够的力量:每一次循环时,<code>.chars().nth(index)</code> 都需要对字符串进行一次 UTF-8 解析,这个解析实际上是相当昂贵的,特别是当配合循环时,算法的复杂度就是平方级的。</p>
<p>既然找到原因,那解决方法也很简单:只要将 <code>self.source.chars()</code> 的迭代器存储起来就行,这样每次 <code>.nth</code> 调用都会复用已经解析好的迭代器,而不是重新去解析一次 UTF-8 字符串。</p>
<p>当然,我们还可以使用三方库来解决这个问题,例如 <a href="https://crates.io/crates/str_indices">str_indices</a></p>
<h2 id="总结"><a class="header" href="#总结">总结</a></h2>
<p>最终的优化结果如下:</p>
<ul>
<li>保存迭代器后: 耗时 <code>5s</code> -&gt; <code>4ms</code></li>
<li>进一步使用 <code>u8</code> 字节数组来替换 <code>char</code>,最后使用 <code>String::from_utf8</code> 来构建 UTF-8 字符串: 耗时 <code>4ms</code> -&gt; <code>400us</code></li>
</ul>
<p><strong>肉眼可见的巨大提升12500 倍!</strong></p>
<p>总之,我们在热点路径中使用字符串做 UTF-8 的相关操作时,就算不提前优化,也要做到心里有数,这样才能在问题发生时,进退自如。</p>
</main>
<nav class="nav-wrapper" aria-label="Page navigation">
<!-- Mobile navigation buttons -->
<a rel="prev" href="../../compiler/pitfalls/main-with-channel-blocked.html" class="mobile-nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
<i class="fa fa-angle-left"></i>
</a>
<a rel="next prefetch" href="../../profiling/intro.html" class="mobile-nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
<i class="fa fa-angle-right"></i>
</a>
<div style="clear: both"></div>
</nav>
</div>
</div>
<nav class="nav-wide-wrapper" aria-label="Page navigation">
<a rel="prev" href="../../compiler/pitfalls/main-with-channel-blocked.html" class="nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
<i class="fa fa-angle-left"></i>
</a>
<a rel="next prefetch" href="../../profiling/intro.html" class="nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
<i class="fa fa-angle-right"></i>
</a>
</nav>
</div>
<script>
window.playground_copyable = true;
</script>
<script src="../../ace.js"></script>
<script src="../../editor.js"></script>
<script src="../../mode-rust.js"></script>
<script src="../../theme-dawn.js"></script>
<script src="../../theme-tomorrow_night.js"></script>
<script src="../../elasticlunr.min.js"></script>
<script src="../../mark.min.js"></script>
<script src="../../searcher.js"></script>
<script src="../../clipboard.min.js"></script>
<script src="../../highlight.js"></script>
<script src="../../book.js"></script>
<!-- Custom JS scripts -->
<script src="../../assets/custom2.js"></script>
<script src="../../assets/bigPicture.js"></script>
</div>
</body>
</html>