双精度浮点数

双精度浮点数（英语：Double-precision floating-point）是计算机使用的一种资料型别。比起单精度浮点数仅有 32 位元（4字节），双精度浮点数使用 64 位元（8字节）来储存一个浮点数^[1]。它可以表示二进位制的53位有效数字，其可以表示的数字的绝对值范围为 $[2^{-1024},2^{1024}]$ 。

格式

sign bit（符号）：用来表示正负号
exponent（指数）：用来表示次方数
mantissa（尾数）：用来表示精确度

符号

0代表数值为正，1代表数值为负。

指数

共有11个位元，使用“偏移表示法（英语：Exponent bias）”，有2个例外分别为

“11个位元皆为0”
“11个位元皆为1”

并且以1023为偏移标准，表示实际指数为0，因此指数范围为 -1022 到 +1023：

指数 000₁₆ 和 7ff₁₆ 具有特殊意义：

00000000000₂ = 000₁₆当尾数为0时为±0，尾数不为0时为非正规形式的浮点数。

11111111111₂ = 7ff₁₆当尾数为0时为∞，尾数不为0时为NaN。

尾数

在二进位的“科学记号”，数字被表示为：

${\text{1.mantissa}}\times {\text{2}}^{\text{exponent}}$

二进位的“科学记号”（a×2ⁿ）的a的范围是大于等于1而小于2，例如：

二进位制的 ${\text{11.101}}\times {\text{2}}^{\text{1001}}$ 可以规格化为 ${\text{1.1101}}\times {\text{2}}^{\text{1010}}$ ，储存时尾数只需要储存1101即可。
二进位制的 ${\text{0.00110011}}\times {\text{2}}^{-1001}$ 可以规格化为 ${\text{1.10011}}\times {\text{2}}^{-1100}$ ，储存时尾数只需要储存10011即可。

小结

根据以上的叙述，一个双精度浮点数所代表的数值为：

$(-1)^{\text{sign}}\times 2^{\text{exponent}}\times 1.{\text{mantissa}}$

例子

0 01111111111 0000000000000000000000000000000000000000000000000000₂ ≙ 3FF0 0000 0000 0000₁₆ ≙ +2⁰ × 1 = 1

0 01111111111 0000000000000000000000000000000000000000000000000001₂ ≙ 3FF0 0000 0000 0001₁₆ ≙ +2⁰ × (1 + 2⁻⁵²) ≈ 1.0000000000000002, the smallest number > 1

0 01111111111 0000000000000000000000000000000000000000000000000010₂ ≙ 3FF0 0000 0000 0002₁₆ ≙ +2⁰ × (1 + 2⁻⁵¹) ≈ 1.0000000000000004

0 10000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 4000 0000 0000 0000₁₆ ≙ +2¹ × 1 = 2

1 10000000000 0000000000000000000000000000000000000000000000000000₂ ≙ C000 0000 0000 0000₁₆ ≙ −2¹ × 1 = −2

0 10000000000 1000000000000000000000000000000000000000000000000000₂ ≙ 4008 0000 0000 0000₁₆ ≙ +2¹ × 1.1₂ = 11₂ = 3

0 10000000001 0000000000000000000000000000000000000000000000000000₂ ≙ 4010 0000 0000 0000₁₆ ≙ +2² × 1 = 100₂ = 4

0 10000000001 0100000000000000000000000000000000000000000000000000₂ ≙ 4014 0000 0000 0000₁₆ ≙ +2² × 1.01₂ = 101₂ = 5

0 10000000001 1000000000000000000000000000000000000000000000000000₂ ≙ 4018 0000 0000 0000₁₆ ≙ +2² × 1.1₂ = 110₂ = 6

0 10000000011 0111000000000000000000000000000000000000000000000000₂ ≙ 4037 0000 0000 0000₁₆ ≙ +2⁴ × 1.0111₂ = 10111₂ = 23

0 01111111000 1000000000000000000000000000000000000000000000000000₂ ≙ 3F88 0000 0000 0000₁₆ ≙ +2⁻⁷ × 1.1₂ = 0.00000011₂ = 0.01171875 (3/256)

0 00000000000 0000000000000000000000000000000000000000000000000001₂ ≙ 0000 0000 0000 0001₁₆ ≙ +2⁻¹⁰²² × 2⁻⁵² = 2⁻¹⁰⁷⁴
≈ 4.9406564584124654 × 10⁻³²⁴ (Min. subnormal positive double)

0 00000000000 1111111111111111111111111111111111111111111111111111₂ ≙ 000F FFFF FFFF FFFF₁₆ ≙ +2⁻¹⁰²² × (1 − 2⁻⁵²)
≈ 2.2250738585072009 × 10⁻³⁰⁸ (Max. subnormal double)

0 00000000001 0000000000000000000000000000000000000000000000000000₂ ≙ 0010 0000 0000 0000₁₆ ≙ +2⁻¹⁰²² × 1
≈ 2.2250738585072014 × 10⁻³⁰⁸ (Min. normal positive double)

0 11111111110 1111111111111111111111111111111111111111111111111111₂ ≙ 7FEF FFFF FFFF FFFF₁₆ ≙ +2¹⁰²³ × (1 + (1 − 2⁻⁵²))
≈ 1.7976931348623157 × 10³⁰⁸ (Max. Double)

0 00000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 0000 0000 0000 0000₁₆ ≙ +0

1 00000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 8000 0000 0000 0000₁₆ ≙ −0

0 11111111111 0000000000000000000000000000000000000000000000000000₂ ≙ 7FF0 0000 0000 0000₁₆ ≙ +∞ (positive infinity)

1 11111111111 0000000000000000000000000000000000000000000000000000₂ ≙ FFF0 0000 0000 0000₁₆ ≙ −∞ (negative infinity)

0 11111111111 0000000000000000000000000000000000000000000000000001₂ ≙ 7FF0 0000 0000 0001₁₆ ≙ NaN (sNaN on most processors, such as x86 and ARM)

0 11111111111 1000000000000000000000000000000000000000000000000001₂ ≙ 7FF8 0000 0000 0001₁₆ ≙ NaN (qNaN on most processors, such as x86 and ARM)

0 11111111111 1111111111111111111111111111111111111111111111111111₂ ≙ 7FFF FFFF FFFF FFFF₁₆ ≙ NaN (an alternative encoding of NaN)

0 01111111101 0101010101010101010101010101010101010101010101010101₂
= 3fd5 5555 5555 5555₁₆ ≙ +2⁻² × (1 + 2⁻² + 2⁻⁴ + ... + 2⁻⁵²)
≈ ¹/₃

0 10000000000 1001001000011111101101010100010001000010110100011000₂
= 4009 21fb 5444 2d18₁₆ ≈ pi

参考文献

^ Stanley B. Lippman, Josée Lajoie, Barbara E. Moo. 《C++ Primer. fifth edition 中文版》. 碁峰资讯. 2020: 第33页. ISBN 978-986-502-172-6.

参阅

[1] Stanley B. Lippman, Josée Lajoie, Barbara E. Moo. 《C++ Primer. fifth edition 中文版》. 碁峰资讯. 2020: 第33页. ISBN 978-986-502-172-6.

[1]