The mathematician Leopold Kronecker is believed to have said:
God made the integers, all else is the work of man.
And Kronecker didn’t even know the floating-point numbers “made” for computers. Comparing two numbers of type float
or double
in C++ and related languages is a source for regular errors. When are the if-conditions in the following code snippet true?
float actual = calculate(10.0f, 0.2f, 45); // 10 - 0.2 * 45 = 1
if (actual == 1.0f) {
// is this ever executed?
}
if (actual - 1.0f == 0.0f) {
// is this ever executed?
}
The function calculate
produces the result 10 – 0.2 * 45 = 10 – 9 = 1 – at least for human beings. So, both conditions should be true. However, they are both false.
If we turn on the warning -Wfloat-equal
, the compiler will warn us that comparing floating point with == or != is unsafe. The compiler tells us what is wrong, but not how to fix it. I’ll explain why the above “solution” is wrong, why another obvious “solution” is nearly as bad, and why even the Qt function qFuzzyCompare
can still be improved.
The Setup
The function calculate
produces the result start - decrement * count
. The important thing is that each of the subtractions may introduce a rounding error.
float calculate(float start, float decrement, int count)
{
for (int i = 0; i < count; ++i)
start -= decrement;
return start;
}
The function testEquality
calls calculate(x, 0.2f, 45)
for every value from 9.0 to 1009 in steps of 0.1 and produces actual = x - 9
as the result. It counts how many times the if-condition is true. I’ll try different implementations for the floatCompare
function in the next sections.
void testEquality()
{
const auto total = 10000;
auto count = 0;
for (auto i = 0; i < total; ++i)
{
auto expected = (i / 10.0f);
auto actual = calculate(9.0f + expected, 0.2f, 45);
if (floatCompare(actual, expected))
++count;
}
QCOMPARE(count, total);
}
The Built-In Equality Operator
The first implementation of floatCompare
is the built-in equality operator for float
s.
bool floatCompare(float f1, float f2) const
{
return f1 == f2;
}
Running the test case yields a disastrous result. Only 0.09% of the comparisons yielded the correct result!
FAIL! : TestCompareFloats::testEquality()
Compared values are not the same
Actual (count): 9
Expected (10000): 10000
For the expected value 1.0, the situation looks as follows:
expected = 1, actual = 1.000005007, error = 5.006790161e-06
The actual value has its first non-zero digit in the 6th decimal place. The digits after the decimal point are the rounding error, which is in the order of 1e-06 (10 to the power of -6). The calculation of the actual value introduced some rounding error, whereas the expected value didn’t suffer from a rounding error.
Rounding errors occur, when a number cannot be represented in a finite way. For example, we can represent the fraction 1/5 as a finite decimal number 0.2, but wecannot represent 1/3 as a finite decimal number. It’s 0.3333… with infinitely many 3s after the decimal point.
In the binary system, only fractions with a power of 2 as their denominator like 1/2, 1/4 and 1/8 have finite representations. Other fractions like 1/5 = 0.2 and 1/10 = 0.1 only have infinite binary representations. Computers must round these numbers to the next number with a finite representation. Hence, the binary representations of the actual value 1.000005007 and expected value 1 differ and the condition is false.
Comparisons with Fixed Epsilons
At this point, we probably have some fond memories of our undergraduate course in numerical mathematics. When we check whether actual
is equal to 1.0, we actually want to check whether actual
is in a small range around 1.0. In general, we check whether actual
is in the range from expected - epsilon
to expected + epsilon
. epsilon
is the rounding error, which is a small number. What is a good value for epsilon
?
The C++ header <limits> provides a constant std::numeric_limits<float>::epsilon() for this purpose. The “standard” epsilon has the value 1.192092896e-07. This is to small for the problem at hand, where the error is roughly 5.0e-06. Just do the simplest thing that could work and use 1.0e-05 for epsilon. The equality function looks like this.
bool floatCompare(float f1, float f2) const
{
return qAbs(f1 - f2) <= 1.0e-05;
}
Running the test case yields another surprising result.
FAIL! : TestCompareFloats::testEquality()
Compared values are not the same
Actual (count): 126
Expected (10000): 10000
A meagre 1.26% of all conditions produced the correct the result. That’s still bad, but a lot better than before. The obvious thing is to increase epsilon. Here are the results.
1.0-e06 1.0-e05 1.0-e04 1.0e-03
10 126 648 10000
For once, this worked as expected. But, it doesn’t help us to come up with an idea how to improve the comparison function. Let’s look at successful comparisons for epsilon 1.0-e05. All the comparison from 0.0 to 11.3 were successful. All the other comparisons except for the 14 comparisons from 56.3 to 57.3 and from 248.8 to 249.2 failed.
For epsilon 1.0-e04, the good range is from 0.0 to 62.2. We will see the same behaviour for epsilon 1.0-e03, if we set total to 100,000. Then, the condition will only be true 10,219 times (roughly 10.2%). If we compared numbers are greater than 16,777,216, we will have to use an epsilon greater than 1. It seems that we must choose a bigger epsilon the bigger the absolute values of the compared numbers are.
Let us look at small numbers closer and closer to 0. If the compared numbers are smaller than epsilon, say 1.0e-05, their difference must be smaller than epsilon as well. The comparison will always yield true. It seems that we must choose a smaller epsilon the smaller the absolute values of the compared numbers are.
We can summarise our observations as: Epsilon must adapt to the absolute values of the compared numbers. Using adaptive epsilons for comparing floats is the topic of the next section.
Comparisons with Adaptive Epsilons
Using adaptive epsilons seems to be a good idea. Should epsilon change proportional to the absolute value of the bigger number or of the smaller number? We choose the bigger number. Then, the implementation of floatCompare
looks like this:
bool floatCompare(float f1, float f2) const
{
return qAbs(f1 - f2) <= 1.0e-5f * qMax(qAbs(f1), qAbs(f2));
}
If we used qMin
instead of qMax
, epsilon would be smaller than for the other case. The function floatCompare
would return true for a smaller range of numbers. The range for qMin
would be included in the range of qMax
.
The right-hand side of operator<=
is the adaptive epsilon. The adaptive epsilon grows, if the maximum of the compared numbers grows. It shrinks, if the maximum shrinks. So, this version of floatCompare
should do better than the previous versions. Running the test bears this out.
FAIL! : TestCompareFloats::testEquality()
Compared values are not the same
Actual (count): 9995
Expected (total): 10000
99.95% of the comparisons yield the correct result. The five wrong comparisons are:
expected actual error
0 4.082918167e-06 4.082918167e-06
0.1000000015 0.1000044793 4.477798939e-06
0.200000003 0.2000040859 4.082918167e-06
0.3000000119 0.3000044823 4.470348358e-06
0.400000006 0.4000040889 4.082918167e-06
The Qt function qFuzzyCompare uses adaptive epsilons as well. It uses qMin
instead of qMax
and has mulitplied the inequation by the inverse of epsilon.
static inline bool qFuzzyCompare(float p1, float p2)
{
return (qAbs(p1 - p2) * 100000.f <=
qMin(qAbs(p1), qAbs(p2)));
}
Not surprisingly, qFuzzyCompare
yields the same result as our latest version of floatCompare
– with the same five wrong comparisons.
Improving qFuzzyCompare
I am sure that the five wrong comparisons left over from the last section bother you as much as they bother me. If we used the fixed epsilon 1.0e-05 for the bad five, they would turn good. And, that’s the idea for a pragmatic solution. If two numbers are very close and, hence, the absolute value of their difference is smaller than epsilon, then these two numbers should be considered equal.
bool floatCompare(float f1, float f2) const
{
if (qAbs(f1 - f2) <= 1.0e-5f)
return true;
return qAbs(f1 - f2) <= 1.0e-5f * qMax(qAbs(f1), qAbs(f2));
}
Running the test case yields a 100% score! floatCompare
computed all comparisons correctly – including a comparison with 0.
The documentation of qFuzzyCompare
advises us to add to 1.0f to the compared numbers or to turn to qFuzzyIsNull
, if one of the compared numbers is 0, NaN or infinity. This is a bit tedious, often hard to apply and easy to forget. The next test confirms the behaviour of floatCompare
, qFuzzyIsNull
and qFuzzyCompare
, if the expected value is 0.
void testIsZero()
{
auto actual = calculate(9.0f, 0.2f, 45);
QVERIFY(floatCompare(actual, 0.0f)); // PASS
QVERIFY(qFuzzyIsNull(actual)); // PASS
QVERIFY(qFuzzyCompare(actual, 0.0f)); // FAIL
}
We can implement floatCompare
using qFuzzyCompare
and qFuzzyIsNull
.
bool qFloatCompare(float f1, float f2) const
{
if (qFuzzyIsNull(qAbs(f1 - f2)))
return true;
return qFuzzyCompare(f1, f2);
}
Conclusion
Do not under any circumstances use ==
or !=
to compare two float
or double
numbers. The comparison result will almost never be what you expect.
Do not use a fixed-epsilon check like qAbs(f1 – f2) <= 1.0e-05f. The results are only marginally better than with ==
or !=
.
Use the adaptive-epsilon check of qFuzzyCompare
with a lot of caution. We saw a very simple example, where qFuzzyCompare
produced wrong comparison results – and not just comparisons with 0.
Use this more robust implementation for comparing floating-point numbers.
bool floatCompare(float f1, float f2) const
{
static constexpr auto epsilon = 1.0e-05f;
if (qAbs(f1 - f2) <= epsilon)
return true;
return qAbs(f1 - f2) <= epsilon * qMax(qAbs(f1), qAbs(f2));
}
It is not entirely fool-proof but it is much more reliable than qFuzzyCompare
. We can improve its reliabilty even more, if we adapt epsilon to the accuracy requirements of our specific applications. If two decimal places are good enough, then we should change epsilon to 0.01.
The best solution is not to use floating-point numbers at all. Fixed-point numbers are an excellent alternative. A fixed-point number is roughly an integer with a scaling factor (e.g., 100 for 2 decimal places). Comparisons are integer comparisons. Interestingly, machines use fixed-point numbers to send their values over networks to other machines, display computers, the cloud or PCs. Unfortunately, the receiving code almost always converts them into float
and double
.
References
I learned a lot from Bruce Dawson’s excellent post Comparing Floating Point Numbers, 2012 Edition. The post is a part of a series about floating-point numbers. The links to the other parts of the series are at the beginning of the post. If you want to know more about floating-point numbers, Bruce’s series is the right starting point.
Most of your code snippets are indented more than necessary. If you remove this “initial indentation “ your code will be easier to read
Thanks for the hint. Fixed.
A really nice article, comparing floating point numbers is a problem indeed. I agree totally and as you already referenced Bruce Dawnson’s excellent article on this topic I would like to add that it is also very good to know about the floating point model in general.
You are welcome. I see that you have written quite extensively on floating-point numbers on your site as well – including a post about the floating-point model of IEEE 754 numbers. I’ll certainly check out your articles.
There is an omission: “if one of the compared numbers may be 0 or . ” The word after ‘or’ is missing.
Very well spotted and fixed. It should read: “… if one of the compared numbers is 0, NaN or infinity.”
Thanks,
Burkhard
“Let us look at small numbers closer and closer to 0. […] The comparison will always yield true. It seems that we must choose a smaller epsilon the smaller the absolute values of the compared numbers are.”
It seems for me a check is missing in the final version for this scenario, because the if (qAbs(f1 – f2) <= epsilon) again reintroduced exactly this issue.
Since "the comparison will always yield true" then.
I believe epsilon in your discussion to be appropriate to the mantissa only. Both larger positive and negative exponents I believe will have problems. The epsilon you selected only seems appropriate for values around 1.0. Perhaps epsilon should be picked as “double epsilon = 1.0e-05 * qMax;” with adjustments to the other components of your function.