All published worksheets from http://sagenb.org

Project: sagenb.org published worksheets

Path: pub / 1501-1601 / 1539-Floating Point Numbers.sagews

Views: ¹⁶⁸⁷³⁹
Image: ubuntu2004

Floating Point Numbers

This is a very rough sheet containing calculations and explorations for Numerical Analysis in Sage.

For the following command (setting the RR print options) to work, you need to apply the patches at http://trac.sagemath.org/sage_trac/ticket/7682. Alternatively, every time you print a real number m, you can print it as m.str(truncate=False). This is important in this worksheet since we want to see the roundoff error, and not have Sage hide it by showing only correct digits.

RR.print_options['truncate']=False

First, let's convert 13.28125 to IEEE-754 format.

x=13.28125

x.str(base=2)

'1101.0100100000000000000000000000000000000000000000000'

xbin=2^3 +2^2 + 2^0+2^-2+2^-5; xbin

425/32

xbin.n()

13.281250000000000

x.exact_rational()

425/32

(x*2^(-3)).str(base=2)

'1.1010100100000000000000000000000000000000000000000000'

f=(x*2^(-3)).frac() # .frac() gets the fraction part, the part after the decimal
mantissa=f.str(base=2)
mantissa

'0.10101001000000000000000000000000000000000000000000000'

c=(3+1023).str(base=2)
c

'10000000010'

len(c)

11

len(mantissa[2:54])

52

sign='0'

So here is the IEEE-754 binary representation.

sign+' '+c+' '+mantissa[2:54] # the [2:] just chops off the '0.', since we just need to store the digits after the decimal point

'0 10000000010 1010100100000000000000000000000000000000000000000000'

len(sign+c+mantissa[2:54]) # it's 64 bits!

64

((-1)^(int(sign)) * 2^(int(c,base=2)-1023)*(1+RR(mantissa[:54], base=2)))

13.281250000000000

13.281250000000000

So they agree!

x.nextabove()

13.281250000000002

x.nextbelow()

13.281249999999998

x.nextabove()-x.nextbelow()

3.5527136788005009e-15

MPFR has a slightly different convention---store an integer.

x.sign_mantissa_exponent()

(1, 7476679068876800, -49)

7476679068876800*2^(-49)

425/32

7476679068876800.str(base=2)

'11010100100000000000000000000000000000000000000000000'

We can't represent all numbers

(1.0).nextabove()

1.0000000000000002

A finite system

Let's imagine we have a 2-bit mantissa and a 2-bit exponent. Then all of our machine numbers look like $1.b_1b_2\times 2^{k-2}$ , where $b_1$ and $b_2$ can each be $0$ or $1$ , and $k$ can be $0,1,2,3$ .

Question: List all possible numbers in our set.

RR.print_options['skip_zeroes']=True
nums=[RR('1.%d%d'%(b1,b2),base=2)*2^(k-2) for b1,b2,k in CartesianProduct([0,1],[0,1],[0,1,2,3])]

sorted(nums)

[0.25, 0.3125, 0.375, 0.4375, 0.5, 0.625, 0.75, 0.875, 1., 1.25, 1.5, 1.75, 2., 2.5, 3., 3.5]

or as fractions

sorted([i.exact_rational() for i in nums])

[1/4, 5/16, 3/8, 7/16, 1/2, 5/8, 3/4, 7/8, 1, 5/4, 3/2, 7/4, 2, 5/2, 3, 7/2]

The distribution of these numbers is interesting. Let's plot them on the real line.

points(zip(nums,[0]*len(nums)),pointsize=25).show(xmin=0,xmax=4,figsize=[10,1])

Note that the numbers are unevenly spaced. In fact, count how many numbers are in $[2^{-2},2^{-1})$ , and how many are in $[2^{-1},2^0)$ , and between $[2^0,2^1)$ . What do you notice? Is there a pattern, and why?

Note that there is no machine number representing 1.1. In order to deal with machine arithmetic, the machine needs to decide how to represent 1.1. We notate the machine's version of 1.1 as $fl(1.1)$ ( $fl$ stands for "float", as in "floating point approximation").

What are the reasonable possibilities for $fl(1.1)$ ? What would you choose, and why?

The gap around zero is called the "hole at zero". Where does this come from? Subnormal numbers fill this gap in IEEE 754-2008.

What happens if we do $5/16-1/4=1/16$ on the machine? In other words, what is $fl(fl(5/16)-fl(1/4))$ ?

What is $fl(fl(1)+fl(5/4))$ ?

What is $1.1+1.1+1.1$ , as the machine does it? (first write down all applicable $fl$ , then calculate the result.)

Rounding Modes

Look up in Wikipedia what the possible rounding modes are for IEEE 754-2008. For each of these rounding modes, what is $fl(3.25)$ ? (To answer this for some of the rounding modes, you'll need to know what the mantissas for 3 and 3.5 are.)

Relative Error

What is the relative error of the approximation of $fl(1.1)$ and 1.1? (See the definition from last class or at the bottom of p. 19 of the text.)

What is the maximum relative error for a number $x$ such that $fl(x)=1$ ? This relative error is called the machine epsilon, and is often denoted with the Greek epsilon $\epsilon$ or $\epsilon_{\textrm{mach}}$ .

What is the maximum relative error for a number $y$ such that $fl(y)=2$ ? What about the maximum relative error for a number $z$ such that $fl(z)=1/2$ ? Do you notice a pattern? What about the maximum relative error for a number $w$ such that $fl(z)=3$ ?

What is the machine epsilon for a 53-bit number? (Work it out by hand, from what you know about the exponent range, and verify the next computation.)

((RR(1).nextabove()-RR(1))/2*RR(1))

1.1102230246251565e-16

So intuitively, what does it mean if you do several computations using 53-bit precision floating point numbers (i.e., double precision) and the results differ by about $10^{-16}$ , or if you get an answer that is about $10^{-16}$ ?

2.0^(-53) # Here's a hint for the by-hand part above...

1.1102230246251565e-16

Some more topics

thinking about $fl(x)=x+\delta$ , and so $fl(x)+fl(y)$ is really $(x+\delta_1)+(y+\delta_2)$ . So how should we perturb things so that we get the exact answer (backward error analysis), or how much does our computer answer differ from the exact answer (direct error analysis).

significant digits

subtraction

dividing by small number

quadratic formula

horner's form

You can read about two disasters in history coming from ignoring numerical issues here: http://www.ima.umn.edu/~arnold/455.f96/disasters.html

f(x)=x^3-6.1*x^2+3.2*x+1.5

R=RealField(10,truncate=False)
R(4.71)

4.7109

f(R(4.71))

-14.422

g=f(x).polynomial(RR)

g(x)=f(x)._maxima_().horner().sage()
g(x)

((x - 6.1)*x + 3.2)*x + 1.5

g(R(4.71))

-14.312

g(RR(4.71))

-14.263899

f(RR(4.71))

-14.263899000000011