FP64, FP32, FP16, BFLOAT16, TF32, and other members of the ZOO

There are many floating point formats you can hear about in the context of deep learning. Here is a summary of what are they about and where are they used.

FP80

A 80-bit IEEE 754 extended precision binary floating-point format typically known by the x86 implementation started from the Intel 8087 math co-processor (good old times when CPUs did not support floating point computations and FPU was a separate co-processor). In this implementation it contains:

Image from Wikipedia

Range: ~3.65e−4951 to ~1.18e4932 with approximately 18 significant digits of precision.

Usage:

Software support:

Hardware support:

FP64

A 64-bit floating point, typically the IEEE 754 double-precision binary floating-point format with:

Image from Wikipedia

Range: ~2.23e-308 … ~1.80e308 with full 15–17 decimal digits precision.

Usage:

Software support:

Hardware support:

FP32

The format that was the workhorse of deep learning for a long time. Another IEEE 754 format, the single-precision floating-point with:

Image from Wikipedia

Range: ~1.18e-38 … ~3.40e38 with 6–9 significant decimal digits precision.

Usage:

Software support:

Hardware support:

FP16

Again, the IEEE 754 standard format, the half-precision floating-point format with:

Image from Wikipedia

Range: ~5.96e−8 (6.10e−5) … 65504 with 4 significant decimal digits precision.

Usage:

Software support:

Hardware support:

Useful links:

BFLOAT16

Another 16-bit format originally developed by Google is called “Brain Floating Point Format”, or “bfloat16” for short. The name flows from “Google Brain”, which is an artificial intelligence research group at Google where the idea for this format was conceived.

The original IEEE FP16 was not designed with deep learning applications in mind, its dynamic range is too narrow. BFLOAT16 solves this, providing dynamic range identical to that of FP32.

So, BFLOAT16 has:

Image from WikiPedia

The bfloat16 format, being a truncated IEEE 754 FP32, allows for fast conversion to and from an IEEE 754 FP32. In conversion to the bfloat16 format, the exponent bits are preserved while the significand field can be reduced by truncation.

Source

Range: ~1.18e-38 … ~3.40e38 with 3 significant decimal digits.

Usage:

Software support:

Hardware support:

Useful links:

Flexpoint

Flexpoint is a compact number encoding format developed by Intel’s Nervana used to represent standard floating point values. Later, Nervana switched to BFLOAT, then, even later, Intel cancelled Nervana processors.

Flexpoint combines the advantages of fixed point and floating point by splitting up the mantissa and the exponent part which is shared across all arithmetic execution elements. By only passing the integer value, both memory and bandwidth requirements are reduced. Additionally, this lowers hardware complexity, lowering both power and area requirements.

NIPS 2017 paper

Usage:

Useful links:

TF32

TensorFloat-32, or TF32, is the new math mode in NVIDIA A100 GPUs.

TF32 uses the same 10-bit mantissa as the half-precision (FP16) math, shown to have more than sufficient margin for the precision requirements of AI workloads. And TF32 adopts the same 8-bit exponent as FP32 so it can support the same numeric range.

It is technically a 19-bit format. You can treat it as an extended-precision BFLOAT16, say “BFLOAT19" ☺ Or like reduced-precision FP32.

So, TF32 has:

Image from NVIDIA blog post

The advantage of TF32 is that the format is the same as FP32. When computing inner products with TF32, the input operands have their mantissas rounded from 23 bits to 10 bits. The rounded operands are multiplied exactly, and accumulated in normal FP32.

TF32 Tensor Cores operate on FP32 inputs and produce results in FP32 with no code change required. Non-matrix operations continue to use FP32. This provides an easy path to accelerate FP32 input/output data in DL frameworks and HPC.

Range: ~1.18e-38 … ~3.40e38 with 4 significant decimal digits precision.

Usage:

For comparison, A100’s peak performances are:

Software support:

Hardware support:

Useful links:

ML/DL/AI expert. Software engineer with 20+ years programming experience. Loves Life Sciences. CTO and co-Founder of Intento. Google Developer Expert in ML.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store