04_unum_format.html

<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    
    <title>4. 完整的unum格式定义 &#8212; THE END of ERROR - Unum Computing 0.1 documentation</title>

    <link rel="stylesheet" href="_static/material-design-lite-1.3.0/material.blue-deep_orange.min.css" type="text/css" />
    <link rel="stylesheet" href="_static/sphinx_materialdesign_theme.css" type="text/css" />
    <link rel="stylesheet" href="_static/fontawesome/all.css" type="text/css" />
    <link rel="stylesheet" href="_static/fonts.css" type="text/css" />
    <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="_static/basic.css" />
    <link rel="stylesheet" type="text/css" href="_static/d2l.css" />
    <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
    <script src="_static/jquery.js"></script>
    <script src="_static/underscore.js"></script>
    <script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="_static/doctools.js"></script>
    <script src="_static/sphinx_highlight.js"></script>
    <script src="_static/d2l.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="5. 隐藏的草稿本和三个层次" href="05_hidden_scratchpads_3_layers.html" />
    <link rel="prev" title="3. 计算机算术的原罪" href="03_TheOriginalSin.html" /> 
  </head>
<body>
    <div class="mdl-layout mdl-js-layout mdl-layout--fixed-header mdl-layout--fixed-drawer"><header class="mdl-layout__header mdl-layout__header--waterfall ">
    <div class="mdl-layout__header-row">
        
        <nav class="mdl-navigation breadcrumb">
            <a class="mdl-navigation__link is-active">4. 完整的unum格式定义</a>
        </nav>
        <div class="mdl-layout-spacer"></div>
        <nav class="mdl-navigation">
        
<form class="form-inline pull-sm-right" action="search.html" method="get">
      <div class="mdl-textfield mdl-js-textfield mdl-textfield--expandable mdl-textfield--floating-label mdl-textfield--align-right">
        <label id="quick-search-icon" class="mdl-button mdl-js-button mdl-button--icon"  for="waterfall-exp">
          <i class="material-icons">search</i>
        </label>
        <div class="mdl-textfield__expandable-holder">
          <input class="mdl-textfield__input" type="text" name="q"  id="waterfall-exp" placeholder="Search" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </div>
      </div>
      <div class="mdl-tooltip" data-mdl-for="quick-search-icon">
      Quick search
      </div>
</form>
        
<a id="button-show-source"
    class="mdl-button mdl-js-button mdl-button--icon"
    href="_sources/04_unum_format.rst.txt" rel="nofollow">
  <i class="material-icons">code</i>
</a>
<div class="mdl-tooltip" data-mdl-for="button-show-source">
Show Source
</div>
        </nav>
    </div>
    <div class="mdl-layout__header-row header-links">
      <div class="mdl-layout-spacer"></div>
      <nav class="mdl-navigation">
          
              <a  class="mdl-navigation__link" href="https://github.com/jszheng/TheEndOfError">
                  <i class="fab fa-github"></i>
                  Github
              </a>
      </nav>
    </div>
</header><header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="index.html">
              <span class="title-text">
                  THE END of ERROR - Unum Computing
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            <nav class="mdl-navigation">
                <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="Preface.html">Preface</a></li>
<li class="toctree-l1"><a class="reference internal" href="00_how_to_read.html">如何读这本书</a></li>
<li class="toctree-l1"><a class="reference internal" href="Part1.html">Part 1 一种新的数字格式Unum</a></li>
<li class="toctree-l1"><a class="reference internal" href="01_Overview.html">1 概论</a></li>
<li class="toctree-l1"><a class="reference internal" href="02_BuildUpUnumFormat.html">2. 构造unum的格式</a></li>
<li class="toctree-l1"><a class="reference internal" href="03_TheOriginalSin.html">3. 计算机算术的原罪</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">4. 完整的unum格式定义</a></li>
<li class="toctree-l1"><a class="reference internal" href="05_hidden_scratchpads_3_layers.html">5. 隐藏的草稿本和三个层次</a></li>
<li class="toctree-l1"><a class="reference internal" href="06_info_per_bit.html">6 每个比特的信息</a></li>
<li class="toctree-l1"><a class="reference internal" href="07_fixed_size_unum_storage.html">7 定长的unum存储</a></li>
<li class="toctree-l1"><a class="reference internal" href="08_comparison_operations.html">8 比较操作</a></li>
<li class="toctree-l1"><a class="reference internal" href="09_add_sub_unbias_round.html">9 加减法和无偏差舍入的迷</a></li>
<li class="toctree-l1"><a class="reference internal" href="10_mul_div.html">10 乘法和除法</a></li>
<li class="toctree-l1"><a class="reference internal" href="11_power.html">11 求幂</a></li>
<li class="toctree-l1"><a class="reference internal" href="12_other_important_unary_ops.html">12 其他重要的一元运算</a></li>
<li class="toctree-l1"><a class="reference internal" href="13_fused_operations.html">13 融合操作（一次性表达式）</a></li>
<li class="toctree-l1"><a class="reference internal" href="14_trial_runs.html">14 试运行：Unums 面临计算挑战</a></li>
<li class="toctree-l1"><a class="reference internal" href="part1_summary.html">小结</a></li>
<li class="toctree-l1"><a class="reference internal" href="Part2.html">Part 2 - 一种新的解决方法 Ubox</a></li>
<li class="toctree-l1"><a class="reference internal" href="15_TheOtherKindOfError.html">15. 另外一种误差</a></li>
<li class="toctree-l1"><a class="reference internal" href="16_avoid_interval_arith_pitfalls.html">16 避免区间算术陷阱</a></li>
<li class="toctree-l1"><a class="reference internal" href="17_meaning_of_solve_equ.html">17 “解”方程到底是什么意思？</a></li>
<li class="toctree-l1"><a class="reference internal" href="18_permission_to_guess.html">18 准许猜测</a></li>
<li class="toctree-l1"><a class="reference internal" href="19_pendulums_done_correctly.html">19 摆的正确计算</a></li>
<li class="toctree-l1"><a class="reference internal" href="20_two_body_problem.html">20 二体问题(以及多体问题)</a></li>
<li class="toctree-l1"><a class="reference internal" href="21_calculus_evil.html">21 微积分被认为是邪恶的：离散物理</a></li>
<li class="toctree-l1"><a class="reference internal" href="22_end_of_error.html">22 错误的终结</a></li>
<li class="toctree-l1"><a class="reference internal" href="Glossary.html">词汇表</a></li>
</ul>

            </nav>
        
        </div>
    
</header>
        <main class="mdl-layout__content" tabIndex="0">

	<script type="text/javascript" src="_static/sphinx_materialdesign_theme.js "></script>
    <header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="index.html">
              <span class="title-text">
                  THE END of ERROR - Unum Computing
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            <nav class="mdl-navigation">
                <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="Preface.html">Preface</a></li>
<li class="toctree-l1"><a class="reference internal" href="00_how_to_read.html">如何读这本书</a></li>
<li class="toctree-l1"><a class="reference internal" href="Part1.html">Part 1 一种新的数字格式Unum</a></li>
<li class="toctree-l1"><a class="reference internal" href="01_Overview.html">1 概论</a></li>
<li class="toctree-l1"><a class="reference internal" href="02_BuildUpUnumFormat.html">2. 构造unum的格式</a></li>
<li class="toctree-l1"><a class="reference internal" href="03_TheOriginalSin.html">3. 计算机算术的原罪</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">4. 完整的unum格式定义</a></li>
<li class="toctree-l1"><a class="reference internal" href="05_hidden_scratchpads_3_layers.html">5. 隐藏的草稿本和三个层次</a></li>
<li class="toctree-l1"><a class="reference internal" href="06_info_per_bit.html">6 每个比特的信息</a></li>
<li class="toctree-l1"><a class="reference internal" href="07_fixed_size_unum_storage.html">7 定长的unum存储</a></li>
<li class="toctree-l1"><a class="reference internal" href="08_comparison_operations.html">8 比较操作</a></li>
<li class="toctree-l1"><a class="reference internal" href="09_add_sub_unbias_round.html">9 加减法和无偏差舍入的迷</a></li>
<li class="toctree-l1"><a class="reference internal" href="10_mul_div.html">10 乘法和除法</a></li>
<li class="toctree-l1"><a class="reference internal" href="11_power.html">11 求幂</a></li>
<li class="toctree-l1"><a class="reference internal" href="12_other_important_unary_ops.html">12 其他重要的一元运算</a></li>
<li class="toctree-l1"><a class="reference internal" href="13_fused_operations.html">13 融合操作（一次性表达式）</a></li>
<li class="toctree-l1"><a class="reference internal" href="14_trial_runs.html">14 试运行：Unums 面临计算挑战</a></li>
<li class="toctree-l1"><a class="reference internal" href="part1_summary.html">小结</a></li>
<li class="toctree-l1"><a class="reference internal" href="Part2.html">Part 2 - 一种新的解决方法 Ubox</a></li>
<li class="toctree-l1"><a class="reference internal" href="15_TheOtherKindOfError.html">15. 另外一种误差</a></li>
<li class="toctree-l1"><a class="reference internal" href="16_avoid_interval_arith_pitfalls.html">16 避免区间算术陷阱</a></li>
<li class="toctree-l1"><a class="reference internal" href="17_meaning_of_solve_equ.html">17 “解”方程到底是什么意思？</a></li>
<li class="toctree-l1"><a class="reference internal" href="18_permission_to_guess.html">18 准许猜测</a></li>
<li class="toctree-l1"><a class="reference internal" href="19_pendulums_done_correctly.html">19 摆的正确计算</a></li>
<li class="toctree-l1"><a class="reference internal" href="20_two_body_problem.html">20 二体问题(以及多体问题)</a></li>
<li class="toctree-l1"><a class="reference internal" href="21_calculus_evil.html">21 微积分被认为是邪恶的：离散物理</a></li>
<li class="toctree-l1"><a class="reference internal" href="22_end_of_error.html">22 错误的终结</a></li>
<li class="toctree-l1"><a class="reference internal" href="Glossary.html">词汇表</a></li>
</ul>

            </nav>
        
        </div>
    
</header>

    <div class="document">
        <div class="page-content" role="main">
        
  <div class="section" id="unum">
<h1>4. 完整的unum格式定义<a class="headerlink" href="#unum" title="Permalink to this heading">¶</a></h1>
<div class="section" id="id1">
<h2>4.1反抗固定存储大小的暴政<a class="headerlink" href="#id1" title="Permalink to this heading">¶</a></h2>
<p>伊万·萨瑟兰（Ivan
Sutherland）谈到过“时钟的暴政”，他指的是传统的计算机设计要求操作适应时钟周期的方式，而不是让操作尽可能快地进行。同样的目前我们被迫对数字使用固定的存储大小，而不是让它们尽可能地适应较小的空间。这两个概念是相辅相成的。较小的数字应允许尽快进行算术运算，较大的数字要求更多时间完成计算。</p>
<p>在使用铅笔和纸进行算术运算时，根据需要增加或减少位数是很自然的。而计算机设计师觉得固定大小（通常为2的幂）更方便实现。他们对数值可能具有可变位数的想法感到非常不舒服。他们接受程序可以声明不同的类型，这意味着支持不同类型的不同指令，例如字节，整数，实数，双精度等。然而，处理器使用的大多数计算机数据的大小都是可变的：字符串，数据文件，程序，网络消息，图形命令等。</p>
<p>字符串的长度为一定数量的字节（字符），大容量存储中的文件为“一定数量的”块”，但是用于管理可变大小数据的技术与数据构成单位的大小无关。有许多用于扩展精度算术的库，它们使用程序用固定精度数列来构建运算。那为什么不直接将比特作为最基本的存储单元呢？</p>
<p>假设英语使用的是如现在我们使用固定大小的数字，然后要求每个单词都适合分配给16个字母的空间。</p>
<div class="figure align-default" id="id11">
<img alt="_images/image-20200714151036975.png" src="_images/image-20200714151036975.png" />
<p class="caption"><span class="caption-number">Fig. 35 </span><span class="caption-text">image-20200714151036975</span><a class="headerlink" href="#id11" title="Permalink to this image">¶</a></p>
</div>
<p>也许设计文字处理程序的人会认为，如果每个单词总是16个字母长，那么排版非常容易和快捷。
由于平均单词长度远小于16个字母，因此浪费了大量空间。
另外，由于某些英语单词的长度超过16个字母，因此作家必须提防“溢出”，并使用同义词库来替换较短的单词。</p>
<p>摩尔斯电码是一种通信的二进制形式，可以灵活地编码到单个比特位级别。
它的设计的时代每个比特都是很宝贵的，1836年在全国范围内发送邮件是一项不小的壮举。
摩尔斯（Morse）估计了最常见字母的排名，并为其分配了最短的点划线序列。
相反，ASCII字符集使每个字符占用八位，比莫尔斯电码使用更多位的信息来发送英文文本。</p>
<table border="2"><tr><td bgcolor="lightblue"><p>以固定长度存储数字就像同步时钟一样，只是让硬件设计者而不是计算使用者更方便些．</p>
</td></tr></table><p>稍后我们将更详细地讨论如何实际构建使用可变长度格式的计算机，并将可变长度数字解包为固定长度的存储，以便更轻松，更快速地使用它。
但是现在，请考虑最后一点：硬件设计人员必须为半精度，单精度和双精度浮点数创建指令，因此它们事实上已经在应对可变大小数据了。</p>
</div>
<div class="section" id="ieee">
<h2>4.2 IEEE标准浮点数<a class="headerlink" href="#ieee" title="Permalink to this heading">¶</a></h2>
<p>我们一直在使用这么小的比特串来表示数字，看到用于现代浮点数学运算的比特串大小几乎令人震惊。
IEEE标准为二进制浮点数定义了四个标准大小：16、32、64和128位长。
它还定义了每种大小的指数和小数位数：</p>
<div class="figure align-default" id="id12">
<img alt="_images/image-20200714161936713.png" src="_images/image-20200714161936713.png" />
<p class="caption"><span class="caption-number">Fig. 36 </span><span class="caption-text">image-20200714161936713</span><a class="headerlink" href="#id12" title="Permalink to this image">¶</a></p>
</div>
<p>您可能想知道的第一件事是，什么是那些指数和分数的“<strong>正确</strong>”比特数？
指数位数是否有数学依据？ 其实<strong>没有</strong>。 他们是由委员会选出的。
对委员会的影响之一是建立指数硬件比分数硬件要容易得多，因此指数所允许的动态范围远远大于实际计算中通常所需的范围。</p>
<p>可以想象在1980年代，关于多长比特的浮点中指数应该多长是一个漫长而有争议的争论。字段的大小是根据艺术来选择的，而不是数学上确定的。选择大指数的另一个论点可能是：下溢和上溢造成的错误通常比舍入误差更大，因此最好使用宽的指数位，所以分数位会较少。</p>
<p>最初将比例因子与有效数字（如定点数字）分开，所有数字共享一个比例因子。最终设计人员意识到每个数字都需要自己的比例因子，因此他们将指数字段附加到分数字段，并将其称为浮点数。今天我们在下一个更高级别上会遇到同样的情况：尽管每个数字都需要自己的指数大小和分数大小，但是数字块被指定为具有特定的精度格式。下一步逻辑上的步骤与比例因子相似：如果将数字作为指数格式和分数大小的自描述，并将这些信息作为其格式的一部分，如何？</p>
<p>在进一步探讨该思想之前，这里是在撰写本文时对四种IEEE标准尺寸的状态的一般描述：</p>
<ul class="simple">
<li><p><strong>半精度</strong>（16位）相对较新引入标准的大小。
它是由显卡和电影公司于2002年左右开始推广的，该格式可以存储诸如光强度之类的视觉量，并能合理覆盖人眼的准确性和动态范围。它可以精确表示从–2048到+2048的所有整数。
它可以代表的最大幅度是±65,520，它可以代表的最小幅度约为<span class="math notranslate nohighlight">\(\pm6\times10^{-8}\)</span>。
它具有大约三个有效的十进制小数位精度。
直到最近，芯片设计人员才开始支持这种格式来存储数字，而现有的CPU处理器并没有半精度指令。
他们在内部提升为单精度计算，然后降级最终结果到半精度。</p></li>
<li><p><strong>单精度</strong>为32位。 使用8位指数，动态范围约为<span class="math notranslate nohighlight">\(10^{-45}\)</span> 至
<span class="math notranslate nohighlight">\(10^{-38}\)</span>（subnormal区间）和<span class="math notranslate nohighlight">\(10^{-38}\)</span>至<span class="math notranslate nohighlight">\(10^{38}\)</span>（normal区间）。
从动态范围来看，已知宇宙的大小与质子大小的比率约为<span class="math notranslate nohighlight">\(10^{40}\)</span>，因此单精度可以覆盖此范围，还多出43个数量级。
23位的小数部分提供了大约7位十进制小数的准确性。
单精度浮点通常用于图像和音频应用（医学扫描，地震测量，视频游戏），其精度和范围都完全足够了。</p></li>
<li><p><strong>双精度</strong>共64位，并且自1970年代后期以来已成为涉及物理模拟的严肃计算的实际标配使用的数据类型。
它具有大约十五位十进制数字的精度和广阔的动态范围：大约<span class="math notranslate nohighlight">\(10^{-324}\)</span>至<span class="math notranslate nohighlight">\(10^{-308}\)</span>（subnormal）和大约<span class="math notranslate nohighlight">\(10^{-308}\)</span>至<span class="math notranslate nohighlight">\(10^{+308}\)</span>（normal）双精度和单精度是处理器芯片中内建支持的最常见数据类型，有快速的硬连线实现的运算单元。</p></li>
<li><p><strong>四精度</strong>:
在撰写本文时，主要通过软件库提供128位四精度数的支持，而没有主流的商用处理器芯片设计用于在如此大的位串上进行算术运算。
对四精度的最常见需求是程序员发现双精度结果与单精度结果有很大不同，并希望确保双精度结果是足够的。
四精度数有34位十进制的小数位，动态范围几乎为<span class="math notranslate nohighlight">\(10^{-5000}\)</span>至<span class="math notranslate nohighlight">\(10^{+5000}\)</span>。
由于通常使用软件而不是本机硬件来执行四精度算术，因此它比双精度慢大约二十倍。
尽管它具有令人印象深刻的准确性和动态范围，但四精度却像它的低精度亲戚一样能够造成灾难性的数学错误。
而且，当然它与双精度相比更浪费内存，带宽，能源和功率。</p></li>
</ul>
<p>对于四种不同的二进制浮点数，程序员必须选择最佳的二进制浮点数，并且计算机系统必须支持所有四种类型的指令集。由于很少有程序员接受过培训或缺乏耐心地调整每个操作的精度，因此大多数人都倚赖使用超过需要的精度。即使输入值可能只有三到四个十进制有效位，而结果只需要大约三到四十进制位，通常的做法是使每个浮点数都具有双精度，并且含糊地希望对所有中间数据使用十五个十进制位就可以罩住中间舍入错误。几年前，要求处处使用双精度的唯一缺点是，程序消耗了更多的内存，并且运行速度稍慢。现在，这种过量的保险加剧了能源和功耗的问题，因为无论是手机中的电池还是数据中心的公用变电站，计算机都无法满足其供电极限。而且由于性能通常受系统带宽的限制，因此程序现在以高精度运行会慢得很多了。</p>
</div>
<div class="section" id="id2">
<h2>4.3 unum格式：弹性的幅度与精度<a class="headerlink" href="#id2" title="Permalink to this heading">¶</a></h2>
<p>实现“通用数”unum格式的最后一步是附加两个自描述字段：指数大小（es）字段和分数大小（fs）字段。
它们位于ubit的右侧，为清晰起见，我们将它们涂成绿色和灰色。</p>
<div class="figure align-default" id="id13">
<img alt="_images/image-20200714214032378.png" src="_images/image-20200714214032378.png" />
<p class="caption"><span class="caption-number">Fig. 37 </span><span class="caption-text">image-20200714214032378</span><a class="headerlink" href="#id13" title="Permalink to this image">¶</a></p>
</div>
<p>有时，此类字段称为“元数据”，即描述数据的数据。 新的字段应该有多大？
也就是说，需要多少位来指定指数大小和分数大小？</p>
<p>假设我们以二进制形式存储每种IEEE浮点类型的指数位数。指数大小5、8、11和15变成二进制的101、1000、1011和1111，因此四位足以覆盖所有IEEE指数大小的情况。
但是我们如果聪明的话，应该注意到总是至少有一个指数位。因此我们可以用size减一来编码，于是上面四个指数的值编码成100、111、1010和1110。这就是上图中指数大小字段标记为es-1而不是的es的原因。
四比特就足够指定的指数大小，可表达指数长度从１比特（表示形式完全类似于定点格式）到16比特（比IEEE四精度中使用的位数更多）。
类似地，四个IEEE分数大小10、23、52和112分别变成1010、10111、110100和1110000的二进制，因此七比特以覆盖所有IEEE情况。但是我们同样需要至少一个小数位，因此存储在fs字段中的数实际上比位数少一。使用7位fs字段，我们可以具有从1位到<span class="math notranslate nohighlight">\(2^7=128\)</span>
位长的小数部分。因此，可以根据应用程序和用户的需要自定义es和fs值，而无需由委员会来设置。</p>
<p><strong>定义</strong>: <strong>esizesize</strong>是unum数中<em>指数位长(exponent
size)域</em>的宽度，能定义的最大指数比特数为1到<span class="math notranslate nohighlight">\(2^{esizesize}\)</span>.</p>
<p><strong>定义</strong>: <strong>fsizesize</strong>是unum数中<em>分数位长(fraction
size)域</em>的宽度，能定义的最大分数比特数为1到<span class="math notranslate nohighlight">\(2^{fsizesize}\)</span></p>
<p>esizesize可能小到零，这意味着没有<em>指数位长域</em>，指数宽度是<span class="math notranslate nohighlight">\(2^0=1\)</span>。对于fsizesize来说也是如此。
将它们放在ubit右边的原因是，如果您只想要数字的浮点数部分，则可以容易地剥离所有新字段。
ubit放在先前的位置，它是分数的一个扩展位，并且像分数中的隐藏位一样创建了一个单调值序列。</p>
<p>在esizesize和fsizesize中两次看到“
size”看起来很奇怪，但是它们是一个尺度的尺度，因此这是它们的逻辑名称。
这就像取对数的对数一样，这就是为什么esizesize和fsizesize通常是很小的，个位数的数字。
例如，假设我们有</p>
<blockquote>
<div><p>分数值<span class="math notranslate nohighlight">\(=110101011\)</span> 那么 fraction_size = 9bit,
二进制表达为<span class="math notranslate nohighlight">\(1001\)</span> 那么 fraction_size_size=4bit
(用于表示分数尺寸为９所需要的比特数)</p>
</div></blockquote>
<p><strong>定义</strong>: <strong>utag</strong>是一组三个域，用于表示一个unum里的<strong>ubit</strong>,
<strong>指数位长</strong>和<strong>分数位长</strong>．</p>
<p>utag是我们为灵活性，紧凑性和准确性信息支付的“税”，就像将指数嵌入就是浮点数“税”，为了描述不同比例因子而付出的代价。
额外的自我描述性信息就是unum对浮点数的改进，就像浮点数对整数的改进。</p>
</div>
<div class="section" id="id3">
<h2>4.4 添加一个额外的比特如何能“节省”存储空间呢？<a class="headerlink" href="#id3" title="Permalink to this heading">¶</a></h2>
<p>这个支付的税的长度值得给个名称：<strong>utagsize</strong></p>
<p><strong>定义</strong>：utagsize是utag域的长度，其值等于<span class="math notranslate nohighlight">\(1+esizesize+fsizesize\)</span></p>
<p>如果我们需要创建一个IEEE浮点数的超集，<span class="math notranslate nohighlight">\(utagesize\)</span>需要是<span class="math notranslate nohighlight">\(1+4+7=12\)</span>．除了utag位，至少有一位符号位，最短一比特指数位和一比特分数位，所以最小的unum位数是utagsize+3.
最大可能的长度被称为 <span class="math notranslate nohighlight">\(maxubits\)</span>.</p>
<p><strong>定义</strong>：<span class="math notranslate nohighlight">\(maxubits\)</span>是最长可能的unum数所拥有的比特数．其值是<span class="math notranslate nohighlight">\(2+esizesize+fsizesize+2^{esizesize}+2^{fsizesize}\)</span></p>
<p>你可能会觉得奇怪了，unum加了这么多的位数，如何能比浮点减小存储容量和降低功耗呢？</p>
<p>主要原因是，与用“单一尺寸适合所有”精度选择相比，它经常使我们可以用更少的位来表示指数和分数。以后面的章节中的实验会详细阐述，但即使目前看来也似乎是合理的，因为大多数计算使用的指数位远少于分配给单精度和双精度浮点数的8位和11位（动态范围约为83和632个数量级，典型工程问题中完全用不到）。</p>
<p>在所有英语单词都必须填充16个字符的空间的示例中，该段落所占用的空间远远大于每个单词都只是需要的长度。同样，unum的分数和指数大小会根据需要增加和减少，并且平均大小比IEEE
float的最坏情况的大小要小得多，以至于节省下来的空间完全足以支付utag的开销。还记得最开始我们谈到的Avogadro阿伏加德罗常数的情况吗？与将其存储为巨大的整数相较而言，按比例缩放记录数字可以节省大量空间。而下一个问题可能是，程序员是否必须管理变量的分数和指数大小？</p>
<p><strong>不需要</strong>．这可以由计算机自动完成。自动范围和精度-调整是unum方法固有的功能，与float算法自动调整指数的方式相同。关键在于ubit，因为它告诉我们值的确定性或不确定性水平。指数大小和分数大小改变了ULP的含义，因此，这三个字段一起最终可以为计算机提供了所需的信息来自动控制精度损失。如果仅添加utag中的三个字段之一，将不会获得非常令人满意的数字系统。</p>
<p>例如，有一种叫做“重要性算术”（significance
arithmetic）的东西，用有效数字的数量注释每个值。这种方法就像仅仅具有utag分数位长的字段，并且假定每个答案都是不精确的。仅需几次操作，重要性算术方案通常给出的悲观评估要比区间算术(
interval arithmetic)多得多，并且错误地声明所有的重要性都已丢失。</p>
<table border="2"><tr><td bgcolor="lightblue"><p>必须将所有三个子字段都放在unum格式的utag中，以避免significance
arithmetic和 interval arithmetic的缺点。</p>
</td></tr></table><p>想象一下，不必指定用于变量的浮点数的类型，而是简单地说变量是实数，然后由计算机完成其余的工作。有一些符号编程环境可以做到这一点（例如，Mathematica和Maple），但这样做的话，它们使用的数据结构非常大，比IEEE浮点数使用的位数要多得多。我们将演示将所有值上都带有utag的开销，unum数可以得到比IEEE浮点数更好的答案，并且使用更少的比特位。</p>
</div>
<div class="section" id="id4">
<h2>4.5 超越想象的精度？unum数的巨大范围<a class="headerlink" href="#id4" title="Permalink to this heading">¶</a></h2>
<p>如果我们使用esizesize = 4和fsizesize =
7，我们简称为“一个<span class="math notranslate nohighlight">\(\lbrace 4，7\rbrace\)</span>
环境”，或者说一个unum使用“一个<span class="math notranslate nohighlight">\(\lbrace 4，7\rbrace\)</span>
utag”。虽然计算机可以通过检测不理想的结果并使用更大的utag重新计算来管理自己的环境大小，但程序员可能希望对此进行一些控制以提高效率。对于程序员而言，通常很容易对当前应用程序的<strong>总体精度需求</strong>做出较好的估算，然后再让计算机去担负更沉重的任务（动态地计算出每个操作的精度要求）。</p>
<p>例如，<span class="math notranslate nohighlight">\(\lbrace 3，4\rbrace\)</span>
unum环境看起来适合于图形和其他低精度需求。它提供的指数最长为<span class="math notranslate nohighlight">\(2^3 = 8\)</span>位，小数部分最高为<span class="math notranslate nohighlight">\(2^4 = 16\)</span>位。因此，它的最大动态范围与32位单精度浮点数相匹配，并且其分数具有超过五位十进制小数的精度，但总存储量不超过33位（通常花费的位数要少得多）。使用<span class="math notranslate nohighlight">\(\lbrace 3，4\rbrace\)</span>
环境而不是<span class="math notranslate nohighlight">\(\lbrace 4，7\rbrace\)</span>环境可将utag长度减少到8，而不是12，这虽然是很小的节省，但也是好的。</p>
<p>由于没有舍入，上溢和下溢误差，令人惊讶的是即使在<span class="math notranslate nohighlight">\(\lbrace 2，2\rbrace\)</span>
的环境中也可以完成多种的非常有用的计算（例如，用于地震信号处理）。还有<span class="math notranslate nohighlight">\(\lbrace 0，0\rbrace\)</span>的环境，它是如此有用我们将在第7章中专门进行讨论。关于<span class="math notranslate nohighlight">\(\lbrace 0，0\rbrace\)</span>　unum的有趣之处在于它们都只有４比特的大小。</p>
<p>在另一个极端，umum可以轻易超越人类对动态范围和精度的理解范围。用十进制表示的数字“1”后跟<em>一百个零</em>被称为“<strong>googol</strong>”：<span class="math notranslate nohighlight">\(10^{100}\)</span>。“数字1”后跟<em>googol个零</em>被称为“
<strong>googolplex</strong>”或<span class="math notranslate nohighlight">\(10^{10^{100}}\)</span>。需要多大一个utag才能表征一个最大可以到googolplex的的整数？</p>
<p>它的fsizesize和esizesize分别只有9位，utag的总大小为19位！只需少量的位，我们就可以创建具有超越想象的精度的unum数。尽管似乎不太可能有人证明使用这种精度对现实世界中的工程问题是合理的，但如果真有，unum格式肯定可以完成任务。此外，即使是最普通的计算机，这些数字也都在计算能力之内。具有此类utag的unum的最大大小仅为1044bit，大约等同于16个双精度浮点数的存储空间。</p>
<p>有些实际问题需要超高精度，例如电子商务加密安全性，当前需要具有数百位整数的整数。甚至一部手机都使用非常大的整数来加密和解密数据。顺便说一句，需要用于公钥加密的大整数乘法意味着，硬件设计人员已经在不知不觉中努力构建快速unum计算所需的电路类型。计算科学家<a class="reference external" href="https://www.davidhbailey.com/">戴维·H·贝利(David
H.Bailey)</a>
收集了许多示例，这些示例需要非常高的精度来解决问题，或是通过高精度来弄清它是无解的。</p>
</div>
<div class="section" id="id5">
<h2>4.6 在一个计算任务中改变环境设置<a class="headerlink" href="#id5" title="Permalink to this heading">¶</a></h2>
<p>可以在计算过程中更改环境，这实际上是非常有用的。
例如，计算机可能会决定某些任务需要更高的精度，并暂时增加fsize的大小，完成时再将其重置为的原始设置。
在这种情况下，程序员无需对已经计算出的值做任何事情。</p>
<p>另外一个可能是在前面计算的unum需要在一个具有不同设置的环境中用。
在那种情况下，程序必须知道要将先前计算的unum提升或降级为新的大小，就像将单精度浮点数提升为双精度一样。
更好的作法是当需要把一块数据打包传给另一个程序前附加上一个元数据<span class="math notranslate nohighlight">\(\lbrace esizesize，fsizesize\rbrace\)</span>
．这样做应该只占用一个8位字节，这是因为很难想象esizesize或fsizesize会占用四个以上的位。</p>
<table border="2"><tr><td bgcolor="lightgray"><p>给读者的练习:
如果esizesize和fsizesize的位长都被限制在0到15之间，则四个位足以表示每个值，那么unum中的maxreal（即｛15、15｝环境中的最大正实数）的最大位数是多少？
如果将字节分成3比特（用于esizesize）和五比特（用于fsizesize），那么在｛7、31｝环境中，最大的十进制数字位数是多少？?</p>
</td></tr></table><p>对于固定大小的浮点，您倾向于考虑一次加载或存储所有位。使用unum，您将分两步进行，就像读取可变长度的字符串一样。根据环境设置，计算机将加载es和fs字段的位，然后指向位于utag，然后加载其其左侧是符号位，指数和小数。然后指向下一个unum。这就是为什么esizesize和fsizesize像处理器控制值一样的原因。他们告诉处理器如何解释比特串。一组unum就像单词打包在段中一样的方式打包在一起。只要一组unum的长度足够长，就可以以常规的2的幂次长度移动unum数据块，例如缓存行或内存页，几乎没有浪费。通常只有最后一个块是“参差不齐”的(也就是说只有一部分是完整的)。打包unum集合也可以使随机访问更加容易。</p>
<p>这些都是经典的数据管理问题，与在磁盘驱动器上组织记录的问题没有什么不同。用数据结构的词汇讲，打包的unum会形成一个单链表。每个值都包含一个指向下一个值起始位置的指针，因为utag指定了位字符串的总长度。第7章将介绍如何将unum打包成固定大小的形式，这种形式实际上比float更快。这恢复了使用索引数组的能力，即浮点数的通用存储方式。</p>
<p>总而言之，处理一个计算中的多个环境设置，或以任何特定设置加载或存储的一组unum中的多个unum长度，都没有困难。这些都是已经解决了数十年的问题，解决方案已经是每个操作系统的一部分。就像对待许多其他类型的可变大小数据一样，只需将它们应用于unum。</p>
</div>
<div class="section" id="id6">
<h2>4.7 参考的原型<a class="headerlink" href="#id6" title="Permalink to this heading">¶</a></h2>
<p>您正在阅读的文本不是使用文字处理器或常规文本编辑器创建的。
<strong>您正在阅读的所有内容都是计算机程序的输入，输出或注释</strong>。</p>
<p>此处显示的图形是计算输出，以及数字示例。
这本书是用Mathematica编写的“Notebook”。
这样做的好处是消除了许多错误源，并使更改和添加变得更容易。
此处包含的代码是unum计算环境的参考原型。</p>
<p>尽管到目前为止，大多数讨论都是概念性的，但unum概念已被简化为实践，而本书就是这种简化。
该原型包含用于与unums和许多其他想法交互工作的工具集合。
在解释这些工具的工作方式时，存在两个挑战：一个是把书写成像某种用户手册，而不是一本有关全新的数值计算方法的书。
另一个是很难使读者不看到某些计算机代码。
3.3节的示例中有一个代码片段，该片段以初始值的平方根为准，直到“收敛”为1。结果是一系列的值实际上是计算出的，而不是人手工键入的。</p>
<p>如果您（像大多数人一样）不喜欢阅读别人编写的计算机代码，则这些代码段的背景为浅绿色，因此可以轻松地跳过它们。
它们中的大多数都是简短的，例如以下用于设置计算环境的命令：</p>
<div class="figure align-default" id="id14">
<img alt="_images/image-20200715125506280.png" src="_images/image-20200715125506280.png" />
<p class="caption"><span class="caption-number">Fig. 38 </span><span class="caption-text">image-20200715125506280</span><a class="headerlink" href="#id14" title="Permalink to this image">¶</a></p>
</div>
<p>setenv函数为esizesize和fsizesize的指定设置设置环境。它用<span class="math notranslate nohighlight">\(\lbrace esizesize，fsizesize \rbrace\)</span>对作为其参数。它计算所有有用的指针和比特位组合，用于解开任何给定的unum并解释其对应值。
setenv的功能的完整说明在附录C.1的代码清单中。上面显示的代码行将unum环境设置为最大指数大小为<span class="math notranslate nohighlight">\(2^3\)</span>位，最大分数大小为<span class="math notranslate nohighlight">\(2^4\)</span>位，并且还创建了与环境一起使用的所有有用的常量，如<em>maxreal</em>和<em>utagsize</em>。以<strong>setenv</strong>这样显示的函数也表明原型中存在一个函数。当诸如<em>maxreal</em>的值实际上必须参与计算时，它将变为<strong>maxreal</strong>。词汇表中列出了已定义的概念（例如maxreal），而附录中列出了对该详细程度感兴趣的已定义计算机变量和函数。</p>
<p>setenv计算的内容之一是位掩码。当您需要从一个数字中提取一组“位”时，可将每个位与仅在这些位中具有1个值的字符串“与”。例如，如果您将数与000110做“and”操作，​​您将仅获得第四和第五位置的位的比特值，而所有其他位将始终为零。以下是一些位掩码，这些位掩码在编写用于进行umum算术的软件时将非常有用：
* <strong>ubitmask</strong>挑选出ubit * <strong>fsizemask</strong>选择表示分数位的数量的位
* <strong>esizemask</strong>选择表明指数位的数量的位。</p>
<p>我们还将找到方便的组合： * <strong>efsizemask</strong>用于指数和分数大小字段， *
而<strong>utagmask</strong>选择整个utag。</p>
<p>为了清楚起见，我们显示了utag的三个位字段，每个字段下方都有注释，以提醒每个字段内容代表什么。如果最后一个小数位后面还有更多位，则第一个字段ubit用“…”注释，如果小数是精确的，则用“<span class="math notranslate nohighlight">\(\downarrow\)</span>”注释。第二和第三字段保存的是减过１的分数和指数大小es和fs。</p>
<div class="figure align-default" id="id15">
<img alt="_images/image-20200715132517867.png" src="_images/image-20200715132517867.png" />
<p class="caption"><span class="caption-number">Fig. 39 </span><span class="caption-text">image-20200715132517867</span><a class="headerlink" href="#id15" title="Permalink to this image">¶</a></p>
</div>
<p>对于浮点数，ULP位是数字的最右边。对于unum，ULP位是utag左侧的位。一个unum值可以具有一定范围的ULP大小，因为允许分数中的位数变化。可以很容易地计算unum中的ULP位掩码，就是通过将1左移utagsize位得到。
ULP位表示的值取决于utag位。表示的实际值是一个ULP，对应的的unum掩码我们称为<strong>ulpu</strong>。这里命名规则是unum的变量和产生unum的函数的名称将以字母<strong>u</strong>结尾。由于我们当前处于<span class="math notranslate nohighlight">\(\lbrace 3，5 \rbrace\)</span>环境中，因此utag的长度为9位，而<strong>ulpu</strong>是1后跟9个空比特位，即<span class="math notranslate nohighlight">\(1 \color{magenta}0 \color{green}000 \color{gray}00000\)</span>（二进制）。除了为清楚起见对utag位进行颜色编码外，我们还约定符号，指数和分数位均以粗体显示，而utag位则没有。</p>
<p>这是比较unum和float格式的一种直观方法：点状网格显示可能的指数位（水平）和小数位（垂直），并用颜色编码以与IEEE
float进行大小比较。该图显示了一个unum的分数精度比标准浮点数类型（水平线上方）更高的时间，或者一个unum的动态范围更大（垂直线右侧）。</p>
<table class="docutils align-default">
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>diagram</p></th>
<th class="head"><p>comments</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><img alt="image-20200715135428975" src="_images/image-20200715135428975.png" /></p></td>
<td><p>这是{3，4}环境的图。
如上所述，它对于图形
和其他低精度需求看起来足够好了。
utag的大小
仅为8位，它囊括了半精度的IEEE浮点
数，动态范围赶上了32位浮点数，比
16位浮点数多些有效位数。蓝点表示
unum所占空间比半精度浮点数大，但
不超过单精度浮点数所需要的空间。
暗红色的点表
示小于或等于半精度浮点数的unum。
暗橙色点是{3，4}环境中
唯一占用空间大于IEEE单精度的数；
最大的{3，4} unum占用33位。
最小的仅占用11位。</p></td>
</tr>
<tr class="row-odd"><td><p><img alt="image-20200715140525618" src="_images/image-20200715140525618.png" /></p></td>
<td><p>{4，7}环境的
图表太高了，以至于勉强适合页面。
在右上角，最大的unum具有比IEEE四
精度格式更大的动态范围和精度。有</p>
<div class="math notranslate nohighlight" id="equation-04-unum-format-0">
<span class="eqno">(17)<a class="headerlink" href="#equation-04-unum-format-0" title="Permalink to this equation">¶</a></span>\[2^4\cdot2^7= 2048\]</div>
<p>指数和小数的
大小的可能组合中四个与IEEE标准二
进制浮点数匹配，并且其中一个与原
始“英特尔数学协处理器”（有时称为
“扩展双精度”）中使用的格式匹配。
在1980年代，英特尔推出了一种名为
i8087的协处理器，该处理器<em>内部
*使用64位的小数和15位的指数，总
大小为80位。它具有产品编号以87结
尾的后继版本，因此此处的格式称为“
Intel
x
87”。80位浮点数最初是IEEE标准大小
之一，但暴露出一个有趣的技术社会
问题：大多数用户希望所有计算机都
能获得*一致</em>的结果，而不是某
个供应商的计算机上更准确的结果！
将x87的额外大小用于中间暂存结果可
减少舍入误差和下溢/上溢的发生，但
结果却神秘地与使用双精度浮点数的
结果不同。我们将在下一章中为该问
题提供一个通用的解决方案。浮点硬
件设计人员已经为要分配多少个指数
位和多少个分数位费了很多力气。想
象一下，如果要总是可以根据需要选
择合适的尺寸。注意：在请求指数字
段大小时要小心。甚至随便要一个es
izesize=6，都可以将您带入这个超标
的领地。特别是原型环境并非旨在处
理如此庞大的数量。带有这个{4，7}
utag的unum最
少只占用15位，这比半精度浮点数还
要紧凑。左下角是最小的unum格式，
这时候utag的左边只有三位浮点数。</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="id7">
<h2>4.8 灵活精度环境中的特殊值<a class="headerlink" href="#id7" title="Permalink to this heading">¶</a></h2>
<p>对于仅附加ubit的浮点数，我们可以使用最大指数字段和分数字段（所有位都设置为1）来表示无穷大（ubit
= 0）或NaN（ubit =
1）。使用灵活的精度unum，只有在使用最大可能的指数字段和最大可能的分数字段时，才会出现最大的数量级。不需要对于es和fs的每种可能宽度都允许表示无穷大和NaN，多种表示是个浪费。除非es和fs最大值（对应于前两个图中的右上角的点）表示无穷大或NaN，否则unum位字符串表示实数（或一个ULP宽开区间”）。</p>
<p>我们的约定是，unum变量名称以“
u”结尾，因此我们不会跟实数值混淆。例如，<strong>maxreal</strong>是最大的实数值，但是在特定环境中表示它的unum是<strong>maxrealu</strong>。</p>
<p>函数<strong>big [u]</strong> 返回由特定es和fs宽度unum表示的最大实数值。函数
<strong>bigu[u]</strong> 返回该值的unum字符串. 如果es和fs字段不是最大的尺寸，<strong>bigu
[u]</strong>
返回值的指数字段和分数字段全为1。如果es和fs字段是最大尺寸（且值全部为1），则我们必须后退一个ULP，因为bit该位模式保留给<span class="math notranslate nohighlight">\(\pm\infty\)</span>。在这种情况下，<strong>big
[u]</strong> 与<strong>maxreal</strong>相同。所以可以将<strong>maxreal</strong>称为“大中的最大”.
(同样的也想把big函数称为mightybig,
因为函数是定义成<strong>mightybig[u]</strong>）</p>
<p>每当调用setenv时，都会计算一组值。下表显示了当环境设置为<span class="math notranslate nohighlight">\(\lbrace3，2\rbrace\)</span>
时的特殊unum数，以及maxreal的和subsubnormal近似实数值。像maxreal一样，smallsubnormal的unum在es和fs达到最大值时发生：</p>
<div class="figure align-default" id="id16">
<img alt="_images/image-20200720140642044.png" src="_images/image-20200720140642044.png" />
<p class="caption"><span class="caption-number">Fig. 40 </span><span class="caption-text">image-20200720140642044</span><a class="headerlink" href="#id16" title="Permalink to this image">¶</a></p>
</div>
<p>较大的指数在两个方向上都扩展了表示幅度，因此最大的指数宽度表示的才是最小的幅度数。
smallsubnormal的表示形式仅将ULP位设置为1，左边的所有其他位均为0。只有你想要使用unum编写底层程序，才需要关注上述表中的值以及setenv计算的其他常数。
如果对当前环境有任何疑问，只需查看esizesize和fsizesize的值即可。</p>
<p>在Ｍathematica 程序原型中，这是显示这些值的一种方法：</p>
<div class="figure align-default" id="id17">
<img alt="_images/image-20200722153103529.png" src="_images/image-20200722153103529.png" />
<p class="caption"><span class="caption-number">Fig. 41 </span><span class="caption-text">image-20200722153103529</span><a class="headerlink" href="#id17" title="Permalink to this image">¶</a></p>
</div>
</div>
<div class="section" id="id8">
<h2>4.9 准确的unum数转为实数<a class="headerlink" href="#id8" title="Permalink to this heading">¶</a></h2>
<p>我们需要一种将unum的浮点部分转换为其数学值的方法。
首先，我们使用utag中的自描述位来确定分数和指数大小。
然后，我们可以使用位掩码提取符号，指数和分数位。
根据这些值，我们构建了一个函数，该函数使用IEEE二进制浮点规则将utag剩余的unum数部分转换为实数。</p>
<div class="figure align-default" id="id18">
<img alt="_images/image-20200722153405165.png" src="_images/image-20200722153405165.png" />
<p class="caption"><span class="caption-number">Fig. 42 </span><span class="caption-text">image-20200722153405165</span><a class="headerlink" href="#id18" title="Permalink to this image">¶</a></p>
</div>
<p>该公式看起来非常类似于浮点数的公式（不是严格的IEEE浮点数，而是经过改进的，不会浪费NaN上的大量位模式）。</p>
<div class="math notranslate nohighlight" id="equation-04-unum-format-1">
<span class="eqno">(18)<a class="headerlink" href="#equation-04-unum-format-1" title="Permalink to this equation">¶</a></span>\[\begin{split}x=(-1)^{s} \times\left\{\begin{array}{ll}
2^{1-2^{es-1}} \times\left(\frac{f}{2^{fs}}\right) &amp; \text { if } e=\text { all } 0 \text { bits, } \\
\infty &amp; \text { if } e, f, \text { es, and } fs \text { have all their bits set to } 1 \\
2^{e-(2^{es-1}-1)} \times\left(1+\frac{f}{2^{fs}}\right) &amp; \text { otherwise. }
\end{array}\right.\end{split}\]</div>
<p>对应代码是u2f [u]（“unum转换到浮点数”）。 该代码在附录C.3中。
为了加强unum和float之间的联系，以下是在<span class="math notranslate nohighlight">\(\lbrace3，5\rbrace\)</span>
环境中的IEEE单精度（8位指数，23位分数）的utag：</p>
<div class="figure align-default" id="id19">
<img alt="_images/image-20200722155109933.png" src="_images/image-20200722155109933.png" />
<p class="caption"><span class="caption-number">Fig. 43 </span><span class="caption-text">image-20200722155109933</span><a class="headerlink" href="#id19" title="Permalink to this image">¶</a></p>
</div>
<p>如果在该utag的左侧放置32位，则它将表示与IEEE单精度浮点数相同的值，除了它避免了将最大指数的数截断为NaN。</p>
<p>同样，这是在<span class="math notranslate nohighlight">\(\lbrace3，4\rbrace\)</span>环境中用于IEEE半精度的utag：</p>
<div class="figure align-default" id="id20">
<img alt="_images/image-20200722155439631.png" src="_images/image-20200722155439631.png" />
<p class="caption"><span class="caption-number">Fig. 44 </span><span class="caption-text">image-20200722155439631</span><a class="headerlink" href="#id20" title="Permalink to this image">¶</a></p>
</div>
<table border="2"><tr><td bgcolor="lightgray"><p>读者的练习：能够存储Intel
x87格式的最小环境是什么，其utag的二进制字符串是什么？</p>
</td></tr></table><p>答案：FP80 : 1.15.64，Env {4, 6}, String:
<span class="math notranslate nohighlight">\(\color{purple}{0} \color{green}{1110} \color{blue}{111111}\)</span></p>
<p><strong>utagview[ut]</strong> 函数显示如上所示的utag，其中ut是utag。
即使环境为<span class="math notranslate nohighlight">\(\lbrace0，0 \rbrace\)</span>，即es和fs没有位，也是允许的。
由于无法显示不存在的位，因此将最后两个正方形绘制出来但是为空。
utag中剩下的唯一东西是ubit：</p>
<div class="figure align-default" id="id21">
<img alt="_images/image-20200726120514450.png" src="_images/image-20200726120514450.png" />
<p class="caption"><span class="caption-number">Fig. 45 </span><span class="caption-text">image-20200726120514450</span><a class="headerlink" href="#id21" title="Permalink to this image">¶</a></p>
</div>
<p>将一个准确的unum值转换为IEEE标准的浮点数是很简单的，你只需要找到一个最小尺寸的IEEE浮点数能满足指数和小数都足够容纳对应的unum数(要求unum数的指数和小数部分的长度不得大于最大的IEEE浮点类型，即quad精度)，并且将unum的数添加额外比特补齐到IEEE浮点最坏情况下的配额．如果指数都是1且表达的是有限的数，你只需要丢弃所有的小数部分，代之以全0变成IEEE浮点的无穷.</p>
<p>没有办法做NaN的转换，因为在unum中这是不精确的数，ubit设置为１．当然最后就可以丢弃utag位了．</p>
<p>例如IEEE半精度．</p>
<div class="figure align-default" id="id22">
<img alt="_images/image-20200727141115611.png" src="_images/image-20200727141115611.png" />
<p class="caption"><span class="caption-number">Fig. 46 </span><span class="caption-text">image-20200727141115611</span><a class="headerlink" href="#id22" title="Permalink to this image">¶</a></p>
</div>
<p>最大表示的实数是131,008, （我们可以写成二进制unum:
0_11111_1111111111_0_100_1001，对应整数为0x1FFC0,
<span class="math notranslate nohighlight">\(1.1111111111\times2^{16}\)</span>)
但是IEEE半精度数只允许最大65,504(对应整数为0xFFE0,
<span class="math notranslate nohighlight">\(1.1111111111\times2^{15}\)</span>)
．所有大于这个比特串的数都是无限或是NaN.
用如此多的数表示NaN，特别是对与动态范围这么小的数是很让人疑惑的．</p>
<p>你也许会认为＂等等，这里全1表示的数不应该是无穷吗？我们保留了最大的数来表示的＂不对，因为这不是当前unum环境下可以表示的最大数，用同样的utag,我们最大可以表示的es和fs值是8和16，这是一个更大的动态范围与精度
<img alt="image-20200727143756193" src="_images/image-20200727143756193.png" /></p>
<p>最大的数可以到
680554349248159857271492153870877982720，约等于<span class="math notranslate nohighlight">\(6.8\times10^{38}\)</span>,
相当于IEEE单精度的maxreal的两倍大小，　因为我们没有把大量的模式浪费在NaN上．</p>
</div>
<div class="section" id="utag">
<h2>4.10 一个小的utag能表示的精确数的完整集<a class="headerlink" href="#utag" title="Permalink to this heading">¶</a></h2>
<p>对于一个很小的utag环境比如说是</p>
<div class="math notranslate nohighlight" id="equation-04-unum-format-2">
<span class="eqno">(19)<a class="headerlink" href="#equation-04-unum-format-2" title="Permalink to this equation">¶</a></span>\[\lbrace2,2\rbrace\]</div>
<p>,
能表示的精确数的量是如此小以至于我们可以把它们完全列出来．下面列出正数部分</p>
<div class="figure align-default" id="id23">
<img alt="_images/image-20200727145252948.png" src="_images/image-20200727145252948.png" />
<p class="caption"><span class="caption-number">Fig. 47 </span><span class="caption-text">image-20200727145252948</span><a class="headerlink" href="#id23" title="Permalink to this image">¶</a></p>
</div>
<p>其unum比特串共需要8到14比特．<strong>unum可以有不同方式表达同一个精确值，我们取长度最短的一个</strong>．</p>
<table border="2"><tr><td bgcolor="lightgray"><p>读者的练习:在一个{2,2}的环境中，unum位串0_00_1_0_01_00代表的是什么数，你能找到另外一个长度一样的位串表示同一个数值吗？</p>
</td></tr></table><p>答： 1/2, 0_0_10_0_00_01. ToDo <strong>那两个都一样短用哪个表示呢</strong>？！</p>
<p>除了修改utag的内容，我们也可以限制指数和小数的总比特数目，因为这限制了总比特位存储量．这个总数不可以小于2，因为我们至少需要一个指数位和一个小数位．</p>
<p>对于一个2到5位的的数（不含5位utag和1位符号位）总共可以表示的有66个非负准确数。</p>
<div class="figure align-default" id="id24">
<img alt="_images/image-20200727153909467.png" src="_images/image-20200727153909467.png" />
<p class="caption"><span class="caption-number">Fig. 48 </span><span class="caption-text">image-20200727153909467</span><a class="headerlink" href="#id24" title="Permalink to this image">¶</a></p>
</div>
<p>其中一些数有不止一种unum表示。这些就像没有化简的分数一样，比如<span class="math notranslate nohighlight">\(\frac{12}{8}\)</span>和<span class="math notranslate nohighlight">\(\frac{6}{4}\)</span>本质上表示的就是<span class="math notranslate nohighlight">\(\frac{3}{2}\)</span>.
但是需要的存储更少。 这就是unum比浮点更简洁的原因。
上面的数中有34个是有多种表示的</p>
<div class="figure align-default" id="id25">
<img alt="_images/image-20200727154352463.png" src="_images/image-20200727154352463.png" />
<p class="caption"><span class="caption-number">Fig. 49 </span><span class="caption-text">image-20200727154352463</span><a class="headerlink" href="#id25" title="Permalink to this image">¶</a></p>
</div>
<p>可能数字的对数的分布表明，对于给定的位长预算，数字在大小上越接近1，则精度越高。
(变长码选择短的表示同样数字，数轴上unum的分布是集中在1附近）这似乎是节省存储空间的实际选择。
这与浮点数的不同之处在于，它为接近1的数字指定更多位用于精度，为最大和最小数字指定更多的位用于动态范围，而浮点数则为所有值赋予相同的精度。</p>
<p>如果我们对上面生成的值进行排序并绘制其值的对数，则曲线在中间的斜率会降低，这表明无论是太大还是太小的数字的精度都更高。
这使我们仅用少量比特位就可以覆盖几乎五个数量级，但对于接近于1个单位的数，其精度约为一个十进制。</p>
<div class="figure align-default" id="id26">
<img alt="_images/image-20200727155819668.png" src="_images/image-20200727155819668.png" />
<p class="caption"><span class="caption-number">Fig. 50 </span><span class="caption-text">image-20200727155819668</span><a class="headerlink" href="#id26" title="Permalink to this image">¶</a></p>
</div>
<p>人们编写应用程序时，他们选择的度量单位
使数字易于掌握，这意味着偏离最小计数单位数量级不要太多。
药剂师使用毫克，宇宙学家使用光年，芯片设计人员使用微米，化学家测量反应时间
飞秒（<span class="math notranslate nohighlight">\(10^{-15}\)</span>秒)。 浮点数旨在处理例如数值范围
<span class="math notranslate nohighlight">\(10^{306}\)</span>到<span class="math notranslate nohighlight">\(10^{307}\)</span>，就好像它应该像数字1到10间一样精确的位数，它无视了人的因素。
与摩尔斯电码一样，unums可以用更少的位表示更常见的数据。</p>
</div>
<div class="section" id="id9">
<h2>4.11 不精确数<a class="headerlink" href="#id9" title="Permalink to this heading">¶</a></h2>
<p>ubit设置为1的unum表示在两个精确值之间的值范围，即一个开区间。 “ inexact”
unum与舍入的浮点数不同。
实际恰恰相反，因为IEEE浮点数将不精确的计算返回为精确（不正确）的数字。</p>
<table border="2"><tr><td bgcolor="lightblue"><p>不精确的unum表示的是在unum的浮点部分和距零更远一个ULP浮点部分之间的开区间中所有实数的集合。</p>
</td></tr></table><p>值的公式应该很简单，但是我们还需要注意表示无穷大和maxreal的值，并且记住通过置位ubit来“超越无穷大”得到NaN。
另外如果数字是不精确的零，则通过检查符号位来确定“远离零”的方向。</p>
<p>回想一下在原型实现程序中，<strong>u2f</strong>（unum to
float）是转换unum到浮点数表示的实数的函数。而<strong>ulpu</strong>是小数最后一位为1其他位为零的unum位串。
有时候我们希望将一个unum更改为一个接近于零的确切值，<strong>exact[u]</strong>
就是这个函数。 如果u已经精确，则函数将其保持不变。 <strong>inexQ [u]</strong>
函数返回一个布尔值，用于检验一个unum是否是不精确值，
如果是不精确的返回True，反之返回False。
出于完整性考虑考虑，引入函数<strong>exQ [u]</strong>
返回相反值，以减少程序中“<em>not</em>”符号的数量。</p>
<p>下图是您在思考不精确的unums时应该牢记的，圆边有助于提醒我们端点是不包含在边界中。
对于正unum：</p>
<div class="figure align-default" id="id27">
<img alt="_images/image-20200727173723395.png" src="_images/image-20200727173723395.png" />
<p class="caption"><span class="caption-number">Fig. 51 </span><span class="caption-text">image-20200727173723395</span><a class="headerlink" href="#id27" title="Permalink to this image">¶</a></p>
</div>
<p>测试原型中unum是正数还是负数的方法是<strong>sign [u]</strong>，正返回0，负返回1。
添加ULP总是使数字<strong>远离</strong>零，因此对于负数添加ULP会使表示的值更加负：</p>
<div class="figure align-default" id="id28">
<img alt="_images/image-20200727174152745.png" src="_images/image-20200727174152745.png" />
<p class="caption"><span class="caption-number">Fig. 52 </span><span class="caption-text">image-20200727174152745</span><a class="headerlink" href="#id28" title="Permalink to this image">¶</a></p>
</div>
<p>因此，<strong>u2f [u]</strong>
处理精确unum的情况，上面的图显示了如何处理不精确的情况。
我们还需要转换unum字符串以用于信令和安静的NaN值：<strong>sNaNu</strong>和<strong>qNaNu</strong>。
在原型中，我们使用<strong>exact[u]</strong>，<strong>big[u]</strong>，<strong>bigu[u]</strong>，<strong>signmask
[u]</strong> 和<strong>sign [u]</strong> 表示通用转换函数（其中<strong>signmask [u]</strong>
是符号位是1，其他位均为0）：</p>
<div class="math notranslate nohighlight" id="equation-04-unum-format-3">
<span class="eqno">(20)<a class="headerlink" href="#equation-04-unum-format-3" title="Permalink to this equation">¶</a></span>\[\begin{split}x=\left
\{\begin{array}{ll}
u2f[u] &amp; \text{if } exQ[u] \text{ (that is, the ubit of } u \text { is } 0 \text { ), else } \\
NaN &amp; \text { if } u=sNaNu  \text { or } u=qNaNu\text { , else } \\
(big[u], \infty) &amp; \text { if }exact[u]=bigu[u], \text { else } \\
(-\infty, -big[u]) &amp; \text { if }exact[u]=bigu[u]+signmask[u], \text { else } \\
(u2f[exact[u]], u2f[exact[u]+ulpu]) &amp; \text { if } sign[u]=0 \text { , else } \\
(u2f[exact[u]+ulpu], u2f[exact[u]]) &amp; sign[u]=1, \text { which covers all other cases. }
\end{array}
\right.\end{split}\]</div>
<p>计算机逻辑可能首先会测试所有<strong>e</strong>，<strong>f</strong>，<strong>es</strong>和<strong>fs</strong>位是否都置位为1，先梳理出<strong>信令NaN</strong>，<strong>静默NaN</strong>，<span class="math notranslate nohighlight">\(-\infty\)</span>和<span class="math notranslate nohighlight">\(+\infty\)</span>的四种特殊值。
接下来要测试的是<strong>±bigu
[u]</strong>，因为只有在这些情况下，加<strong>ulpu</strong>才会导致组合的指数和小数域的溢出。
处理完所有这些情况后，我们可以找到浮点开区间，如上两幅数轴图所示。
上述函数用了数学家通常写开区间的方式来表示: (a，b)其中 a&lt;b 。
前两行表示浮点数，后四行是开放间隔，这将需要新的内部数据结构，那是第5章的主题了。</p>
</div>
<div class="section" id="id10">
<h2>4.12 可视化unum字串<a class="headerlink" href="#id10" title="Permalink to this heading">¶</a></h2>
<p>不应要求读者学习解释unum串, 那就是计算机的工作。
了解浮点数的位串已经非常困难了。 回想前面我们可以使用<strong>utagview [u]</strong>
函数查看utag，如下所示（它描述了一个IEEE四精度浮点数）：</p>
<div class="figure align-default" id="id29">
<img alt="_images/image-20200727185547096.png" src="_images/image-20200727185547096.png" />
<p class="caption"><span class="caption-number">Fig. 53 </span><span class="caption-text">quad precision</span><a class="headerlink" href="#id29" title="Permalink to this image">¶</a></p>
</div>
<p>类似地当需要查看操作使用的机器表示时，<strong>unumview[u]</strong>
函数注释了unum数<em>u</em>的用于缩放比例（计算了偏移）指数字段和隐藏位的值。
也将值显示为分数和小数两种不同形式。因为有时小数更易于阅读，有时分数则更容易。
显示的十进制小数始终是精确的，因为分母中幂为2的分数始终可以用有限的小数表示。</p>
<p>例如下面是一个<span class="math notranslate nohighlight">\(\lbrace1,4\rbrace\)</span>环境unum格式表示的<span class="math notranslate nohighlight">\(\pi\)</span></p>
<div class="figure align-default" id="id30">
<img alt="_images/image-20200727222031562.png" src="_images/image-20200727222031562.png" />
<p class="caption"><span class="caption-number">Fig. 54 </span><span class="caption-text">image-20200727222031562</span><a class="headerlink" href="#id30" title="Permalink to this image">¶</a></p>
</div>
<p>您可以想到像<span class="math notranslate nohighlight">\(\pi\)</span>这样的无理数必然使用最大分数位数，并且它们始终设置ubit来显示该值在ULP范围内。
表达<span class="math notranslate nohighlight">\(\pi\)</span>需要utag的左边18位。而表达–1不需要在utag的左边超过三位</p>
<div class="figure align-default" id="id31">
<img alt="_images/image-20200727222454702.png" src="_images/image-20200727222454702.png" />
<p class="caption"><span class="caption-number">Fig. 55 </span><span class="caption-text">image-20200727222454702</span><a class="headerlink" href="#id31" title="Permalink to this image">¶</a></p>
</div>
<p>注意这种情况下由于指数为零，所以隐藏位为0 (显示在两个框间隙中的是数字)。
只要想想–1和2这样的某些浮点数在计算机程序中的使用频率,
就可以暗示出用如此简洁方法的unums表示可以节省好多存储空间和带宽。</p>
<table border="2"><tr><td bgcolor="lightgray"><p>读者的练习无论utag是什么，都有五个确切的数字始终可以用utag的左边仅三位来表示（并且utag设置为全0位）。
这些数是什么？
如果将ubit设置为1，则始终只能用三位表示的四个打开间隔是什么？</p>
</td></tr></table><p>TODO 答：</p>
<p>我们需要不精确的“正零”和不精确的“负零”来填补 <em>-smallsubnormal</em> 和
<em>smallsubnormal</em> 之间的空白。 ubit使我们有理由使用“负零”表示。
这是在{2，3}环境中打开间隔接近零的两个示例。
使用最大可能尺寸的指数和分数代表的不精确零，其表示的是最小ULP。</p>
<p>以下是unum算术使用的两个位字符串，而不是下溢为零：</p>
<div class="figure align-default" id="id32">
<img alt="_images/image-20200727223936064.png" src="_images/image-20200727223936064.png" />
<p class="caption"><span class="caption-number">Fig. 56 </span><span class="caption-text">image-20200727223936064</span><a class="headerlink" href="#id32" title="Permalink to this image">¶</a></p>
</div>
<p>零旁边最大的ULP是最小尺寸的指数和分数，各1比特</p>
<div class="figure align-default" id="id33">
<img alt="_images/image-20200727224111461.png" src="_images/image-20200727224111461.png" />
<p class="caption"><span class="caption-number">Fig. 57 </span><span class="caption-text">image-20200727224111461</span><a class="headerlink" href="#id33" title="Permalink to this image">¶</a></p>
</div>
<p>Unums提供丰富的‘词汇’表述数字是“小”的概念，即使数字不能表达为精确数字。</p>
</div>
</div>


        </div>
        <div class="side-doc-outline">
            <div class="side-doc-outline--content"> 
<div class="localtoc">
    <p class="caption">
      <span class="caption-text">Table Of Contents</span>
    </p>
    <ul>
<li><a class="reference internal" href="#">4. 完整的unum格式定义</a><ul>
<li><a class="reference internal" href="#id1">4.1反抗固定存储大小的暴政</a></li>
<li><a class="reference internal" href="#ieee">4.2 IEEE标准浮点数</a></li>
<li><a class="reference internal" href="#id2">4.3 unum格式：弹性的幅度与精度</a></li>
<li><a class="reference internal" href="#id3">4.4 添加一个额外的比特如何能“节省”存储空间呢？</a></li>
<li><a class="reference internal" href="#id4">4.5 超越想象的精度？unum数的巨大范围</a></li>
<li><a class="reference internal" href="#id5">4.6 在一个计算任务中改变环境设置</a></li>
<li><a class="reference internal" href="#id6">4.7 参考的原型</a></li>
<li><a class="reference internal" href="#id7">4.8 灵活精度环境中的特殊值</a></li>
<li><a class="reference internal" href="#id8">4.9 准确的unum数转为实数</a></li>
<li><a class="reference internal" href="#utag">4.10 一个小的utag能表示的精确数的完整集</a></li>
<li><a class="reference internal" href="#id9">4.11 不精确数</a></li>
<li><a class="reference internal" href="#id10">4.12 可视化unum字串</a></li>
</ul>
</li>
</ul>

</div>
            </div>
        </div>

      <div class="clearer"></div>
    </div><div class="pagenation">
     <a id="button-prev" href="03_TheOriginalSin.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="P">
         <i class="pagenation-arrow-L fas fa-arrow-left fa-lg"></i>
         <div class="pagenation-text">
            <span class="pagenation-direction">Previous</span>
            <div>3. 计算机算术的原罪</div>
         </div>
     </a>
     <a id="button-next" href="05_hidden_scratchpads_3_layers.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="N">
         <i class="pagenation-arrow-R fas fa-arrow-right fa-lg"></i>
        <div class="pagenation-text">
            <span class="pagenation-direction">Next</span>
            <div>5. 隐藏的草稿本和三个层次</div>
        </div>
     </a>
  </div>
        
        </main>
    </div>
  </body>
</html>