Published
Color Conversions
This is a quick note for converting byte values colors in 0-255 back and forth to floating-point colors in 0-1.
The Problem
A common (yet incorrect) method (shown in C++) looks like this:
|
|
This suffers from significant bias away from the end colors 0 and 255, and the +0.5f
“trick” to round values does not always work, which can bite you. Some problems with this style of conversion:
- The end values 0 and 255 only have half as much representation on the [0-1] range as the other byte values, making this method highly non-uniform. This bias reduces image quality.
- The common method to round float to integer,
(int)(floatVal +0.5)
, can fail to round correctly. First of all, casting in C/C++ truncates towards 0, giving problems (although not here), and some values cause floating point precision issues to round incorrectly, such as applying the above to the float value 0.49999997f yields not 0, but 1. (This value is the predecessor of 0.5f, obtainable vianextafter
andnexttoward
routines in<cmath>
from C++ 11 on). - Values slightly outside the range [0,1], common in routine calculations, cause wrapping instead of clamping. Pre-clamping floats to [0,1] suffices, but it’s better to make this safer since users often don’t clamp.
The solution
To continue, here are some important rules how numerics work in C/C++ (and related languages). We’ll design the final code to be explicit so conversion to other places can be done correctly.
- Floating point values on most architectures are IEEE 754 format. All floats in this note are such.
- Real numbers are called representable if they can be stored exactly as floating point. Representable values must be expressible as a sum of powers of 2 (and satisfy some other technical relationships). For example, 0.3, 1.9, 9.1 are not representable. 0.5, 0.75, 23.125 are representable.
- Casting from a float to an integer truncates towards zero (this is true in most languages, such as C/C++, C#, Java, etc.)
- Addition, subtraction, multiplication, and division of representable numbers, as guaranteed by the IEEE 754 standard, are computed as if there were infinite precision and rounded at the end to the limited number of bits in the format. Breaking rounding ties can be one of several methods. I’ll be careful to avoid mistakes here.
Here is a better method of converting colors. First, some desires/requirements
- Uniform representation: we want floats in [0,1] to be equally likely to end up as 0,1,2,…,255. Since the number of representable real numbers in 0-1 is not a multiple of 256, there must be some bias, but we will require it to be minimal. The method above has the mistake of 0 and 255 getting half as many values as 1,2,…,254. This is bad
- Tolerance for some numerical error. Computing with floating point, since they have limited precision, leads to situations where values are close but not quite the mathematical truth, leading to rounding errors or comparison errors.
- TODO
To match up the interval [0,1] with the colors {0,1,…,255}, treat the latter as the interval [0,255) (open ended) and for a moment treat the floats as [0,1) open ended. This treats the single value 1.0f as off the end (to be handled), and now there are nice intervals for each piece:
|
|
where a = 1/256. This has the nice property that a is representable, and all the interval endpoints are representable. So there is no error in moving between these, and all are equal sized.
To map colors, let’s map midpoints to midpoints, and explicitly list each operation for careful analysis.
|
|
Let’s analyze:
f1
,f2
, andf3
are all exact. This is nice since we can reverse this and get an exact integer back, with zero error. You can even compare the results bitwise as floats, which is usually a bad numerical idea. But here the conversion is perfect (and reversible).- The other direction needs some care. todo - picture nice here… The idea is to scale the [0,1] interval to the [0,256] interval, and map each interval [n,n+1) to the integer n. Using clamp ensures that slightly out of bounds float values (common from computations) and the single value 1.0f do not end up out of bounds. It’s important to clamp while the range is wide. Simply casting the result to uint8_t will incorrectly map 1.0f to the byte value 0 if care is not taken.
- Note compilers will pack the above verbose code as tightly as if you write terse code.
Final Code
For a last gain, in C++, we can make the conversions constexpr
. Since std::floor
is not constexpr
until C++ 23, I’ll use another method. The C++ standard guarantees that casting a floating point value to an integral type truncates towards zero, which is different than floor for negative values, but any negative values from numerical errors will be ok to truncate towards 0. This leads to the best C/C++ version:
|
|
Making the functions constexpr
allows putting in compile-time testing via use of static_assert
.
Also, compilers will likely do the division to multiply trick, and you can even hand code it to remove all multiplies and divisions altogether using tricks on the exponent, such as the C++ 11 scalbn
style functions.
As a final note, for debugging such things, it’s useful to know the C++ functions
to_chars
andfrom_chars
from the header<charconv>
are especially useful. They convert floating-point to and from char arrays, and are the only methods in the C++ library that are guaranteed to roundtrip such values correctly, which means you can output a float to text and then back and get the same float. Surprisingly, other methods in the standard do not guarantee this, and often fail. And, cine it’s C++, even these methods have a significant gotcha: you cannot use them across implementations, since that is not guaranteed to round trip!- The other functions mentioned above, that provide manipulation of float internals and finding nearby floats, are useful for learning and testing.
As a final test, let’s run all possible float32s through the above routines and check ranges are equally the same size. Note we cannot count how many hit each bin and expect those to be the same: there are lots more representable values near 0 than away from 0. but checking the range is a good test.
|
|
Comments
Comment is disabled to avoid unwanted discussions from 'localhost:1313' on your Disqus account...