4.2 Denitions
5.1.5 Integer square root (rounded to nearest integer) operation
sqrt
I(x
) = round(px
) ifx
2I
andx
0=
invalid
ifx
2I
andx <
0Third Committee Draft ISO/IEC CD 10967-2.3:1998(E) 5.1.6 Divisibility and even/odd test operations
divides
I :I
I
!Boolean
divides
I(x;y
) =true
ifx;y
2I
andx
jy
=
false
ifx;y
2I
and notx
jy
NOTES
1 dividesI(0;0) =false, since 0 does not divide anything, not even 0.
2 dividesI cannot be implemented as, e.g., eqI(0;remfI(y;x)), since the remainder functions areundenedfor a zero second argument.
even
I :I
!Boolean
even
I(x
) =true
ifx
2I
and 2jx
=
false
ifx
2I
and not 2jx odd
I :I
!Boolean
odd
I(x
) =true
ifx
2I
and not 2jx
=
false
ifx
2I
and 2jx 5.1.7 Additional integer division and remainder operations
quot
I :I
I
!I
[finteger over ow ; invalid
gquot
I(x;y
) =result
I(dx=y
e) ifx;y
2I
andy
6= 0=
invalid
ifx
2I
andy
= 0pad
I :I
I
!I
[finvalid
gpad
I(x;y
) = (dx=y
ey
),x
ifx;y
2I
andy
6= 0=
invalid
ifx
2I
andy
= 0remc
I :I
I
!I
[finteger over ow ; invalid
gremc
I(x;y
) =result
I(x
,(dx=y
ey
))ifx;y
2I
andy
6= 0=
invalid
ifx
2I
andy
= 0divr
I :I
I
!I
[finteger over ow ; invalid
gdivr
I(x;y
) =result
I(round(x=y
)) ifx;y
2I
andy
6= 0=
invalid
ifx
2I
andy
= 0remr
I :I
I
!I
[finteger over ow ; invalid
gremr
I(x;y
) =result
I(x
,(round(x=y
)y
))if
x;y
2I
andy
6= 0=
invalid
ifx
2I
andy
= 0NOTE { remcI and remrI can over ow only for unsigned integer datatypes (minI = 0).
ISO/IEC CD 10967-2.3:1998(E) Third Committee Draft 5.1.8 Greatest common divisor and least common multiple operations
gcd
I :I
I
!I
[finteger over ow ; invalid
ggcd
I(x;y
) =result
I(maxfv
2Z jv
jx
andv
jy
g)if
x;y
2I
and (x
6= 0 ory
6= 0)=
invalid
ifx
= 0 andy
= 0 and +1 is not available NOTES1 Returning 0 for gcdI(0;0), as is sometimes suggested, would be incorrect, since the greatest common divisor for 0 and 0 is innity.
2 gcdI will over ow only if boundedI=true, minintI =,maxintI,1, and both arguments to gcdI are minintI. The greatest common divisor is then,minintI, which is then not in I.
lcm
I :I
I
!I
[finteger over ow
glcm
I(x;y
) =result
I(minfv
2Z jx
jv
andy
jv
andv >
0g)if
x;y
2I
andx
6= 0 andy
6= 0= 0 if
x;y
2I
and (x
= 0 ory
= 0)NOTE 3 { lcmI(x;y) over ows for many arguments: e.g., if x and y are relative primes, then the least common multiple isjxyj, which may be greater than maxintI.
gcd seq
I : [I
]!I
[finteger over ow ; invalid
ggcd seq
I([x
1;:::;x
n])=
result
I(maxfv
2Z jv
jx
i for alli
2f1;:::;n
gg)if f
x
1;:::;x
ngI
and f0g6=fx
1;:::;x
ng=
invalid
if f0g=fx
1;:::;x
ngand +1 is not availablelcm seq
I : [I
]!I
[finteger over ow
glcm seq
I([x
1;:::;x
n])=
result
I(minfv
2Z jx
ijv
for alli
2f1;:::;n
gandv >
0g) if fx
1;:::;x
ngI
and 062fx
1;:::;x
ng= 0 if f
x
1;:::;x
ngI
and 02fx
1;:::;x
ng5.1.9 Support operations for extended integer range
These operations can be used to implement extended range integer datatypes, and unbounded integer datatypes.
add wrap
I :I
I
!I
add wrap
I(x;y
) =wrap
I(x
+y
) ifx;y
2I add ov
I :I
I
!f,1;
0;
1gadd ov
I(x;y
) = ((x
+y
),add wrap
I(x;y
))=
(maxint
I,minint
I+ 1) ifx;y
2I
andI
6=Z= 0 if
x;y
2I
andI
=Zsub wrap
I :I
I
!I
Third Committee Draft ISO/IEC CD 10967-2.3:1998(E) sub wrap
I(x;y
) =wrap
I(x
,y
) ifx;y
2I
sub ov
I :I
I
!f,1;
0;
1gsub ov
I(x;y
) = ((x
,y
),sub wrap
I(x;y
))=
(maxint
I,minint
I + 1) ifx;y
2I
andI
6=Z= 0 if
x;y
2I
andI
=Zmul wrap
I :I
I
!I
mul wrap
I(x;y
) =wrap
I(x
y
) ifx;y
2I mul ov
I :I
I
!I
mul ov
I(x;y
) = ((x
y
),mul wrap
I(x;y
))=
(maxint
I ,minint
I + 1) ifx;y
2I
andI
6=Z= 0 if
x;y
2I
andI
=ZNOTE { The add ovI and sub ovI will only return,1 (for negative over ow), 0 (no over ow), and 1 (for positive over ow).
5.2 Additional basic oating point operations
Clause 5.2 of ISO/IEC 10967-1 species oating point datatypes and a number of operations on values of a oating point datatype. In this clause some additional operations on values of a oating point datatype are specied.
NOTE { Further operations on values of a oating point datatype, for elementary oating point numerical functions, are specied in clause 5.3.
F
is a oating point type conforming to ISO/IEC 10967-1. Floating point datatypes con-forming to ISO/IEC 10967-1 usually do contain,0
, innity, andNaN
values. Therefore, in this clause there are specications for such values as arguments.5.2.1 The rounding and oating point result helper functions
Floating point rounding helper functions:
down
F :R!F
is a rounding function. It rounds towards negative innity.
NOTE 1 { Fis dened in ISO/IEC 10967-1. It is the unbounded extension of F.
up
F :R!F
is a rounding function. It rounds towards positive innity.
nearest
F :R!F
is a rounding function, that is partially implementation dened. It rounds to nearest. The handling of ties is implementation dened, but must be sign symmetric. If iec 559F =
true
, the semantics ofnearest
F is completely determined: ties are rounded to even last digit bynearest
F.result
F is a helper function that is partially implementation dened. The specication from ISO/IEC 10967-1 is repeated here, but here details regarding continuation values upon over ow and under ow are given.NOTE 2 { These details are intended to be in accordance with IEC 559 wheniec 559F =
true.
ISO/IEC CD 10967-2.3:1998(E) Third Committee Draft
and under ow is only recorded in indicator
=
under ow
(x
) if iec 559F =true
andx
6= 05.2.2 Floating point maximum and minimum operations
What the maximum and minimum operations should return on one quiet
NaN
(qNaN
) input depends on the context. SometimesqNaN
is the appropriate result, sometimes the non-NaN
argument is the appropriate result. Therefore, two variants (each) of the oating point maxi-mum and minimaxi-mum operations are specied here, and the programmer can decide which one is appropriate to use at each particular place of usage, if both are included in the ISO/IEC 10967-2 binding.
Third Committee Draft ISO/IEC CD 10967-2.3:1998(E)
= +1 if
y
= +1andx
2F
[f+1;
,0
g=
x
ify
=,0
andx
2F
andx
0=,
0
ify
=,0
andx
2F
andx <
0=
x
ify
=,1andx
2F
[f,1;
,0
g=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaNmin
F :F
F
!F
min
F(x;y
) = minfx;y
g ifx;y
2F
=
y
ifx
= +1 andy
2F
[f,1;
,0
g=,
0
ifx
=,0
andy
2F
andy
0=
y
ifx
=,0
and ((y
2F
andy <
0) ory
=,0
)=,1 if
x
=,1andy
2F
[f+1;
,0
g=
x
ify
= +1andx
2F
[f+1;
,0
g=,
0
ify
=,0
andx
2F
andx
0=
x
ify
=,0
andx
2F
andx <
0=,1 if
y
=,1andx
2F
[f,1;
,0
g=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaNmmax
F :F
F
!F
mmax
F(x;y
) =max
F(x;y
) ifx;y
2F
[f+1;
,0 ;
,1g=
x
ifx
2F
[f+1;
,0 ;
,1gandy
is a quiet NaN=
y
ify
2F
[f+1;
,0 ;
,1g andx
is a quiet NaN=
qNaN
ifx
is a quiet NaN andy
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaNmmin
F :F
F
!F
mmin
F(x;y
) =min
F(x;y
) ifx;y
2F
[f+1;
,0 ;
,1g=
x
ifx
2F
[f+1;
,0 ;
,1gandy
is a quiet NaN=
y
ify
2F
[f+1;
,0 ;
,1g andx
is a quiet NaN=
qNaN
ifx
is a quiet NaN andy
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaN NOTE { If one of the arguments to mmaxF or mminF is a quiet NaN, that argument is ignored.max seq
F : [F
]!F
[f,1; invalid
gmax seq
F([x
1;:::;x
n])=,1 if
n
= 0 and ,1is available=
invalid
(qNaN
) ifn
= 0 and ,1is not available=
x
1 ifn
= 1 andx
1 is not a NaN=
qNaN
ifn
= 1 andx
1 is a quiet NaN=
invalid
(qNaN
) ifn
= 1 andx
1 is a signalling NaN=
max
F(max seq
F([x ;:::;x
n ]);x
n)ISO/IEC CD 10967-2.3:1998(E) Third Committee Draft
if
n
2min seq
F : [F
]!F
[f+1; invalid
gmin seq
F([x
1;:::;x
n])= +1 if
n
= 0 and +1 is available=
invalid
(qNaN
) ifn
= 0 and +1 is not available=
x
1 ifn
= 1 andx
1 is not a NaN=
qNaN
ifn
= 1 andx
1 is a quiet NaN=
invalid
(qNaN
) ifn
= 1 andx
1 is a signalling NaN=
min
F(min seq
F([x
1;:::;x
n,1]);x
n) ifn
2mmax seq
F : [F
]!F
[f,1; invalid
gmmax seq
F([x
1;:::;x
n])=,1 if
n
= 0 and ,1is available=
invalid
(qNaN
) ifn
= 0 and ,1is not available=
x
1 ifn
= 1 andx
1 is not a signalling NaN=
invalid
(qNaN
) ifn
= 1 andx
1 is a signalling NaN=
mmax
F(mmax seq
F([x
1;:::;x
n,1]);x
n) ifn
2mmin seq
F : [F
]!F
[f+1; invalid
gmmin seq
F([x
1;:::;x
n])= +1 if
n
= 0 and +1 is available=
invalid
(qNaN
) ifn
= 0 and +1 is not available=
x
1 ifn
= 1 andx
1 is not a signalling NaN=
invalid
(qNaN
) ifn
= 1 andx
1 is a signalling NaN=
mmin
F(mmin seq
F([x
1;:::;x
n,1]);x
n) ifn
25.2.3 Floating point positive dierence (monus, diminish) operation dim
F :F
F
!F
[foating over ow ; under ow
gdim
F(x;y
) =result
F(maxf0;x
,y
)g;rnd
F) ifx;y
2F
=
dim
F(0;y
) ifx
=,0
andy
2F
[f,1;
,0 ;
+1g=
dim
F(x;
0) ify
=,0
andx
2F
[f,1;
+1g= +1 if
x
= +1 andy
2F
[f,1 g=
invalid
(qNaN
) ifx
= +1 andy
= +1= 0 if
x
=,1 andy
2F
[f+1 g=
invalid
(qNaN
) ifx
=,1 andy
=,1= 0 if
y
= +1 andx
2F
= +1 if
y
=,1 andx
2F
=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaN NOTE { dimF cannot be implemented by maxF(0;subF(x;y)), since this latter expression has other over ow properties.Third Committee Draft ISO/IEC CD 10967-2.3:1998(E) 5.2.4 Round, oor, and ceiling operations
rounding
F :F
!F
[f,0
grounding
F(x
) = round(x
) ifx
2F
and (x
0 or round(x
)6= 0)=
neg
F(0) ifx
2F
andx <
0 and round(x
) = 0=,
0
ifx
=,0
= +1 if
x
= +1=,1 if
x
=,1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaN oorF :F
!F
oorF(
x
) =bx
c ifx
2F
=,
0
ifx
=,0
= +1 if
x
= +1=,1 if
x
=,1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaNceiling
F :F
!F
[f,0
gceiling
F(x
) =dx
e ifx
2F
and (x
0 or dx
e6= 0)=
neg
F(0) ifx
2F
andx <
0 and dx
e= 0=,
0
ifx
=,0
= +1 if
x
= +1=,1 if
x
=,1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaN NOTES1 The result in the second case for roundingF and ceilingF is 0, if ,0 is not in the type corresponding to F, otherwise it is,0.
2 oorF(x) = negF(ceilingF(negF(x))), ceilingF(x) = negF( oorF(negF(x))), and roundingF(x) = negF(roundingF(negF(x))).
Negative zeroes, if available, are handed in such a way as to maintain these identites.
3 Truncate to integer is specied in ISO/IEC 10967-1:1994, by the name intpartF.
rounding rest
F :F
!F rounding rest
F(x
)=
x
,round(x
) ifx
2F
= 0 if
x
=,0
=
invalid
(qNaN
) ifx
= +1=
invalid
(qNaN
) ifx
=,1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaN oor restF :F
!F
ISO/IEC CD 10967-2.3:1998(E) Third Committee Draft
oor restF(
x
) =x
,bx
c ifx
2F
= 0 if
x
=,0
=
invalid
(qNaN
) ifx
= +1=
invalid
(qNaN
) ifx
=,1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaNceiling rest
F :F
!F
ceiling rest
F(x
)=
x
,dx
e ifx
2F
= 0 if
x
=,0
=
invalid
(qNaN
) ifx
= +1=
invalid
(qNaN
) ifx
=,1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaNNOTE 4 { The rest after truncation is specied in ISO/IEC 10967-1:1994, by the name fractpartF.
5.2.5 Operation for remainder after division and round to integer (IEEE remainder) irem
F :F
F
!F
[f,0 ; under ow ; invalid
girem
F(x;y
) =result
F(x
,(round(x=y
)y
);nearest
F)if
x;y
2F
andy
6= 0 and(
x
0 orx
,(round(x=y
)y
)6= 0)=,
0
ifx;y
2F
andy
6= 0 andx <
0 andx
,(round(x=y
)y
) = 0=,
0
ifx
=,0
andy
2F
[f,1;
+1g andy
6= 0=
x
ifx
2F
andy
2f,1;
+1g=
invalid
(qNaN
) ifx
2F
[f,1;
,0 ;
+1g andy
=,0
=
invalid
(qNaN
) ifx
2F
[f,0
g andy
= 0=
invalid
(qNaN
) ifx
2f,1;
+1g andy
2F
[f,1;
+1g=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaN5.2.6 Square root and reciprocal square root operations sqrt
F :F
!F
[finvalid
gsqrt
F(x
) =nearest
F(px
) ifx
2F
andx
0=,
0
ifx
=,0
=
invalid
(qNaN
) if (x
2F
andx <
0) orx
=,1= +1 if
x
= +1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaNrec sqrt
F :F
!F
[finvalid ; pole
gThird Committee Draft ISO/IEC CD 10967-2.3:1998(E) rec sqrt
F(x
) =rnd
F(1=
px
) ifx
2F
andx >
0=
pole
(+1) ifx
2F
andx
= 0=
pole
(+1) ifx
=,0
= 0 if
x
= +1=
invalid
(qNaN
) if (x
2F
andx <
0) orx
=,1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaN5.2.7 Support operations for extended oating point precision add lo
F :F
F
!F
[foating over ow ; under ow
gadd lo
F(x;y
) =result
F((x
+y
),rnd
F(x
+y
);rnd
F)if
x;y;add
F(x;y
)2F
=
under ow
(0)? ifadd
F(x;y
) =under ow
(u
)= 0? if
add
F(x;y
) =oating over ow
(+1)= 0? if
add
F(x;y
) =oating over ow
(,1)=
add lo
F(0;y
) ifx
=,0
andy
2F
[f,1;
,0 ;
+1g=
add lo
F(x;
0) ify
=,0
andx
2F
[f,1;
+1g=
invalid
(qNaN
)? ifx
2f,1;
+1g andy
2F
[f,1;
+1g=
invalid
(qNaN
)? ify
2f,1;
+1g andx
2F
=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaNsub lo
F :F
F
!F
[foating over ow ; under ow
gsub lo
F(x;y
) =result
F((x
,y
),rnd
F(x
,y
);rnd
F)if
x;y;sub
F(x;y
)2F
=
under ow
(0)? ifsub
F(x;y
) =under ow
(u
)=
oating over ow
(,1)?0?if
sub
F(x;y
) =oating over ow
(+1)=
oating over ow
(+1)?0?if
sub
F(x;y
) =oating over ow
(,1)=
sub lo
F(0;y
) ifx
=,0
andy
2F
[f,1;
,0 ;
+1g=
sub lo
F(x;
0) ify
=,0
andx
2F
[f,1;
+1g=
invalid
(qNaN
)? ifx
2f,1;
+1g andy
2F
[f,1;
+1g=
invalid
(qNaN
)? ify
2f,1;
+1g andx
2F
=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaN NOTES1 If rnd styleF = nearest, then, in the absence of notications, add loF and sub loF returns exact results.
2 sub loF(x;y) = add loF(x;negF(y)).
mul lo
F :F
F
!F
[foating over ow ; under ow
gmul lo
F(x;y
) =result
F((x
y
),rnd
F(x
y
);rnd
F)if
x;y;mul
F(x;y
)2F
=
under ow
(0)? ifmul
F(x;y
) =under ow
(u
)ISO/IEC CD 10967-2.3:1998(E) Third Committee Draft
= 0 if
x;y
2F
andmul
F(x;y
) =,0
=
oating over ow
(,1)?0?if
mul
F(x;y
) =oating over ow
(+1)=
oating over ow
(+1)?0?if
mul
F(x;y
) =oating over ow
(,1)=
mul lo
F(0;y
) ifx
=,0
andy
2F
[f,1;
,0 ;
+1g=
mul lo
F(x;
0) ify
=,0
andx
2F
[f,1;
+1g=
invalid
(qNaN
)? ifx
2f,1;
+1g andy
2F
[f,1;
+1g=
invalid
(qNaN
)? ify
2f,1;
+1g andx
2F
=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaN NOTE 3 { In the absence of notications, mul loF returns an exact result.div rest
F :F
F
!F
[foating over ow ; under ow ; invalid
gdiv rest
F(x;y
)=result
F(x
,(y
rnd
F(x=y
));rnd
F)if
x;y;div
F(x;y
)2F
=
result
F(x
,(y
u
);rnd
F)if
div
F(x;y
) =under ow
(u
) andz
2F
=
x
ifx;y
2F
and(
div
F(x;y
) =,0
ordiv
F(x;y
) =under ow
(,0
))=
invalid
(qNaN
) ifx
2F
andy
= 0=
oating over ow
(,1)?0?if
div
F(x;y
) =oating over ow
(+1)=
oating over ow
(+1)?0?if
div
F(x;y
) =oating over ow
(,1)=
div rest
F(0;y
) ifx
=,0
andy
2F
[f,1;
,0 ;
+1g=
invalid
(qNaN
) ify
=,0
andx
2F
[f,1;
+1g=
invalid
(qNaN
)? ifx
2f,1;
+1g andy
2F
[f,1;
+1g=
invalid
(qNaN
)? ify
2f,1;
+1g andx
2F
=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaNsqrt rest
F :F
!F
[funder ow ; invalid
gsqrt rest
F(x
) =result
F(x
,(sqrt
F(x
)sqrt
F(x
));rnd
F) ifx
2F
andx
0=,
0
ifx
=,0
=
invalid
(qNaN
) if (x
2F
andx <
0) orx
=,1=
invalid
(qNaN
)?0? ifx
= +1=
qNaN
ifx
is a quiet NaN=
invalid
(qNaN
) ifx
is a signalling NaN NOTE 4 { sqrt restF(x) is exact when there is nounder ow. add3F :F
F
F
!F
[foating over ow ; under ow
gadd3F(
x;y;z
) =result
F((x
+y
) +z;rnd
F)if
x;y;z
2F
Third Committee Draft ISO/IEC CD 10967-2.3:1998(E)
not
y
norz
is a signalling NaN=
qNaN
ify
is a quiet NaN andnot
x
norz
is a signalling NaN=
qNaN
ifz
is a quiet NaN andnot
x
nory
is a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ornot
y
norz
is a signalling NaN=
qNaN
ify
is a quiet NaN andnot
x
norz
is a signalling NaN=
qNaN
ifz
is a quiet NaN andnot
x
nory
is a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN orISO/IEC CD 10967-2.3:1998(E) Third Committee Draft
not
y
norz
is a signalling NaN=
qNaN
ify
is a quiet NaN andnot
x
norz
is a signalling NaN=
qNaN
ifz
is a quiet NaN andnot
x
nory
is a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ornot
y
norz
is a signalling NaN=
qNaN
ify
is a quiet NaN andnot
x
norz
is a signalling NaN=
qNaN
ifz
is a quiet NaN andnot
x
nory
is a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaN orz
is a signalling NaNFor the following operation
F
0is a oating point type conforming to ISO/IEC 10967-1.NOTE 7 { It is expected that pF0 > pF, i.e. F0has higher precision than F, but that is not required.
mul
F!F0 :F
F
!F
0[f,0 ; oating over ow ; under ow
gThird Committee Draft ISO/IEC CD 10967-2.3:1998(E)
NOTE 8 { Converting a signallingNaNresults in a notication ofinvalid. See clause 5.4.5.2.8 Exact summation operation
An exact summation operation is useful for computing high accuracy sums, even if only the rst element of the resulting list is ultimately kept.
In order to be able to specify the exact sum operation, which sums a sequence of oating point numbers returning a sequence of oating point numbers of decreasing magnitude, by
p
F, a number of helper functions are needed.=
sNaN
ifx
is a signalling NaN ory
is a signalling NaN The extended real summation helper function:ISO/IEC CD 10967-2.3:1998(E) Third Committee Draft
=
rnd
(x
) :seq result
F(x
,rnd
(x
);rnd
)if
rnd
(x
)6= 0 andrnd
(x
)2F
and (denorm
F =true
or jx
jfminNF)= [
rnd
(x
,fminNF);
fminNF]if ,fminNF
< x
andx <
0 anddenorm
F =false
= [
rnd
(x
+ fminNF);
,fminNF]if 0
< x
andx <
fminNF anddenorm
F =false
The exact summation operation:
sum
F : [F
]![F
][foating over ow
gsum
F([x
1;:::;x
n])=
seq result
F(sum
([x
1;:::;x
n]);nearest
F)if
sum
([x
1;:::;x
n])2R andn
1= [
sum
([x
1;:::;x
n])] ifsum
([x
1;:::;x
n])2f,1;
,0 ;
+1g andn
1= [,
0
] ifn
= 0 and ,0
is available= [0] if
n
= 0 and ,0
is not available= [
qNaN
] ifsum
([x
1;:::;x
n]) is a quiet NaN=
invalid
([qNaN
]) ifsum
([x
1;:::;x
n]) is a signalling NaN NOTE { sumF(sumF(a)) = sumF(a), and sumF(sumF(a)++sumF(b)) = sumF(a++b) if there is no notication (where ++ is sequence concatenation). Thus sumF([]) = sumF([,0]).5.3 Elementary transcendental oating point operations 5.3.1 Specication format
5.3.1.1 Maximum error requirements
The specications for each of the transcendental operations use an approximation helper function.
The approximation helper functions are ideally identical to the true mathematical functions.
However, that would imply that the maximum error for the corresponding operation was merely 0.5 ulp. This part of ISO/IEC 10967 does not require that the maximum error is only 0.5 ulp, but may be a bit bigger. To express this, the approximation helper functions need not be identical to the mathematical elementary transcendental functions, but are allowed to be approximate.
The approximation helper functions for the individual operations in this subclause have maxi-mum error parameters that describe the maximaxi-mum relative error of the helper function composed with
nearest
F, for normalised results. The maximum error parameter also describe the maximum absoluteerror for subnormal continuation values ifdenorm
F =true
. The relevant maximum er-ror parameters shall be available to programs.That for a helper function
h
F, approximatingf
, the maximum error ismax error op
F means that for all argumentsx;:::
2F
:::
the following inequality is true:j
f
(x;:::
),nearest
F(h
F(x;:::
))jmax error op
Fr
eF(f(x;:::)),pFNOTES
1 Partially conforming implementations may have greater values for maximum error param-eters than stipulated below. See annex B.
2 For most positive (and not too small) return values t, the true result is thus claimed to be in the interval [t,(max error opF ulpF(t));t + (max error opF ulpF(t))]. But if the return value is exactly rnF for some n2Z, then the true result is claimed to be in the interval [t,(max error opF ulpF(t)=rF);t + (max error opFulpF(t))], Similarly for negative return values.
Third Committee Draft ISO/IEC CD 10967-2.3:1998(E)
The results of the approximating helper functions in this clause must be exact for certain arguments as detailed below, and may be exact for all arguments. If the approximating helper function is exact for all arguments, then the corresponding maximum error parameter should be 0.5, the minimum value.
5.3.1.2 The trans result helper function
The
trans result
F helper function is similar to theresult
F helper function extended with spec-ications for the continuation value on over ow, and it also returns,0
for negative under ows that round (or are ushed) to zero, if possible. (Those extentions are implied in ISO/IEC 10967-1 for IEC 559 conforming implementations.) Buttrans result
F is simplied compared toresult
Fconcerning
under ow
:trans result
F always under ows for nonzero arguments that have an absolute value less than fminNF, whereasresult
F does not always under ow then.In addition, the rounding is xed to
nearest
F, rather than being parameterised. This is user visible only in the cases where the operation's approximation helper function is (required to be) exact, but where that value is not representable inF
, e.g.e
or .trans result
F :R!F
[funder ow ; oating over ow
gThe approximation helper functions are required to be zero exactly at the points where the approximated mathematical function is exactly zero. At points where the approximation helper functions are not zero, they are required to have the same sign as the approximated mathematical function at that point.
For the radian trigonometric helper functions, this sign requirement is imposed only for argu-ments,
x
, such that jx
jbig angle r
F (see clause 5.3.6).NOTE { For the operations, the continuation value after an under ow may be zero (or negative zero) as given by trans resultF, even though the approximation helper function is not zero at that point. Such zero results are required to be accompanied by anunder ow
ISO/IEC CD 10967-2.3:1998(E) Third Committee Draft
notication. When appropriate, zero may also be returned for IEC 559 innities arguments.
See the individual specications.
5.3.1.4 Monotonicity requirements
When the maximum error is tight, i.e. 0.5 ulp, that implies that the approximation helper func-tions must be monotonous on the same intervals as the corresponding exact function is strictly monotonous. When the maximum error is greater than 0.5 ulp, and the rounding is not directed, a numerical function is not automatically monotonous where the corresponding exact function is.
The approximation helper functions in this clause are required to be monotonous on the same intervals as the mathematical functions they are approximating are monotonous. There is no general requirement that the approximation helper functions are strictly monotonous on the same intervals as the corresponding exact function is strictly monotonous, however, since such a requirement cannot be made due to that all oating point types are discrete, not continuous.
For the radian trigonometric helper functions, this monotonicity requirement is imposed only for arguments,
x
, such that jx
jbig angle r
F (see clause 5.3.6).The unit argument trigonometric and unit argument inverse trigonometric approximating helper functions are excepted from the monotonicity requirement for the angular unit argument.
5.3.2 Hypotenuse operation
Maximum error parameter for the
hypot
F operation:max error hypot
F 2F
The
max error hypot
F parameter is required to be in the interval [0:
5;
1].The
hypot
F approximation helper function:hypot
F :F
F
!Rhypot
F(x;y
)returns a close approximation topx
2+y
2inR, with maximum errormax error hypot
F. Further requirements on thehypot
F approximation helper function:hypot
F(x;y
) =hypot
F(y;x
)hypot
F(,x;y
) =hypot
F(x;y
)hypot
F(x;y
)maxfjx
j;
jy
jghypot
F(x;y
)jx
j+jy
jhypot
F(x;y
)1 if px
2+y
21hypot
F(x;y
)1 if px
2+y
21 Thehypot
F operation:hypot
F :F
F
!F
[funder ow ; oating over ow
ghypot
F(x;y
) =trans result
F(hypot
F(x;y
))if
x;y
2F
=
hypot
F(0;y
) ifx
=,0
andy
2F
[f,1;
,0 ;
+1g=
hypot
F(x;
0) ify
=,0
andx
2F
[f,1;
+1g= +1 if
x
2f,1;
+1g andy
2F
[f,1;
+1g= +1 if
y
2f,1;
+1g andx
2F
=
qNaN
ifx
is a quiet NaN andy
is not a signalling NaN=
qNaN
ify
is a quiet NaN andx
is not a signalling NaN=
invalid
(qNaN
) ifx
is a signalling NaN ory
is a signalling NaNThird Committee Draft ISO/IEC CD 10967-2.3:1998(E) 5.3.3 Operations for exponentiations and logarithms
There are two maximum error parameters for approximate exponentiations and logarithms:
max error exp
F 2F max error power
F 2F
The
max error exp
F parameter is required to be in the interval [0:
5;
1:
5rnd error
F].The
max error power
F parameter is required to be in the interval [max error exp
F;
2The