The temporary table #Data contains the following:
SELECT * FROM #Data
GO
value
-------
123.456
123.456
123.456
(3 row(s) affected)
Three copies of the same number, right? However:
SELECT DISTINCT value FROM #Data
GO
value
-------
123.456
123.456
123.456
(3 row(s) affected)
We have the exact same result set. How can this be?
It’s because what’s being displayed isn’t necessarily what’s stored internally. This should make it clearer:
SELECT remainder = (value - 123.456) FROM #Data
GO
remainder
----------------------
9.9475983006414E-14
1.4210854715202E-14
0
(3 row(s) affected)
The numbers aren’t all 123.456 exactly; the data is in floating-point format, and two of the values were ever-so-slightly larger. The lesson is: be very careful when using aggregate functions on columns declared as type float.
Some other observations:
- The above will probably be reminiscent to anyone who’s done much text wrangling in SQL. Strings look identical to the eye, but different to SQL Server’s processing engine; you end up having to examine every character, finding and eliminating extraneous tabs (ASCII code 9), carriage returns (ASCII code 13), line-feeds (ASCII code 10), or even weirder.
- If your requirement warrants it, I can thoroughly recommend the GNU Multiple Precision Arithmetic Library, which stores numbers to arbitrary precision. It’s available as libraries for C/C++, and as the R package gmp:
# In R:
> choose(200,50); # This is 200! / (150! 50!)
[1] 4.538584e+47
> library(gmp);
Attaching package: ‘gmp’
> chooseZ(200,50);
Big Integer ('bigz') :
[1] 453858377923246061067441390280868162761998660528
# Dividing numbers:
> as.bigz(123456789012345678901234567890) / as.bigz(9876543210)
Big Rational ('bigq') :
[1] 61728394506172838938859798528 / 4938271605
# ^^ the result is stored as a rational, in canonical form.
Advertisements