# Number Comparison Without Overflow

When dealing with numbers that come from untrusted sources, you should always expect that these numbers may not fit into the integer data type that you are using. Such number overflows may cause subtle bugs or security problems in your code.

I recently wrote code that checked the validity of JSON web tokens. JWTs
may contain a claim (exp

) that specifies the expiry date of the token in
seconds since the epoch (the time 00:00:00 UTC on 1 January 1970). I wrote
that in Perl and it is possible that there are still Perls around that have
an integer size of 32 bits.

But 32 bit integers for the epoch will overflow in January 2038 and then
start again at -2147483648 (2^{31}). On a 32 bit system, the seconds
since the epoch in 2039 will be negative, and so they are considered less
than the current time (if you read this before January 2038).

Let's see how arbitrarily large decimal integers can be compared in C. Like most comparison functions, ours will return a negative value, when the first number is less than the second, a positive one when it is greater, and zero, when both numbers are equal.

## The Naive Approach With `strcmp()`

The function `strcmp()`

compares two strings lexically. And because the
ASCII code of the number `0`

is 48, while that of `9`

is 57, why not just
compare the two numbers as strings:

```
int
compare_integers(const char *num1, const char *num2)
{
return strcmp(num1, num2);
}
```

But that has several issues:

- Integers may have a leading
`+`

or`-`

. - The string
23

is considered greater than1303

because2

is greater than1

as a string. - Integers may have gratuitous leading zeros which leads to more problems.

So that method will not work.

## Checking Whether a String is an Integer

The first step should be to check whether the input strings are really integers. That is easy with regular expressions. In C it is a little bit more complicated:

```
static int
is_integer(const char *ptr)
{
if (ptr[0] == '-') {
++ptr;
} else if (ptr[0] == '+') {
++ptr;
}
if (ptr[0] == '\0') return 0;
while (ptr[0] != '\0') {
if (!isdigit(ptr[0])) return 0;
++ptr;
}
return 1;
}
```

First a possible leading `-`

or `+`

sign is skipped.

In line 10, we check whether the first character after the possible sign is
a null byte which terminates strings in C. If that is the case, the
integer

was either an empty string or a lone `+`

or `-`

without any digits.

The `while`

loop beginning in line 12 checks whether all remaining characters
are decimal digits.

## The Comparison Function

If both input strings are valid interpretations of integers, we can compare them.

The comparison is very simple if one number is positive and the other
negative. So we first check for a possible `+`

or `-`

sign:

```
static int
compare_integers(const char *num1, const char *num2)
{
char sign1 = '+', sign2 = '+';
if (*num1 == '-' || *num1 == '+') {
sign1 = *num1;
++num1;
}
if (*num2 == '-' || *num2 == '+') {
sign2 = *num2;
++num2;
}
/* ... */
}
```

Numbers without a sign are positive. So we initialize the two variables with
a `+`

sign.

If either of the strings starts with a sign, we store it and skip the sign. In C, incrementing a pointer to a string has the effect that the first character is skipped.

```
while (num1[0] == '0') ++num1;
if (!num1[0]) --num1;
while (num2[0] == '0') ++num2;
if (!num2[0]) --num2;
```

Next we skip all leading zeros, again by incrementing the pointer to the
string. But that would turn the string 0

into an empty string. So we
undo the last skip, if the resulting string is empty.

Note: The if clause `if (!num1[0])`

checks, whether the string `num1`

is
empty. If the first character of the string is a null byte, the string is
terminated there, and that means it is empty.

There is one edge case that has to be solved. The numbers -0

and
+0

are equal. We therefore change -0

into +0

before
we start comparing:

```
if (sign1 == '-' && num1[0] == '0' && num1[1] == '\0')
sign1 = '+';
if (sign2 == '-' && num2[0] == '0' && num2[1] == '\0')
sign2 = '+';
```

This can be read as: if the sign is `-`

*and* the first character is a `0`

*and* the second character is the terminating null byte, change the minus
into a plus sign.

Now that everything is normalized, there is an early exit, when the two signs differ:

```
if (sign1 == '-' && sign2 == '+') {
return -1;
}
if (sign1 == '+' && sign2 == '-') {
return -1;
}
```

Before we continue with the comparison, we have to handle the case that both
numbers are negative. 100 is greater than 10 but -100 is *less* than 10.
In other words, if both numbers are negative, the result of the comparison
must be inverted, +1 must become -1 and vice versa. This can be easily
achieved by multiplying the result with -1 later.

So we check whether the numbers are negative:

```
int result_sign = sign1 == '-' ? -1 : +1;
```

There is no need to check that `sign2`

is `-`

because at this point we know
that `sign1`

and `sign2`

are the same.

And now comes the actual trick! Even without converting the strings into real integers (which can cause overflows) it is trivial to decide which of the two strings represent the greater integer. We just compare the lengths of the strings:

```
ssize_t l1 = strlen(num1);
size_t l2 = strlen(num2);
if (l1 > l2) {
return result_sign * +1;
} else if (l1 < l2) {
return result_sign * -1;
}
```

A longer

numbers is always greater than a shorter

number, at least after
a possible leading sign or gratuitous leading zeros have been removed. This
is the case here.

And what if two numbers have equal length? Then you can just use the already
mentioned function `strcmp()`

and do a string comparison. 2304

is
greater than 1303

, both as a string and as a number:

```
return result_sign * strcmp(num1, num2);
```

By the way, since Sunday, September 9th 2001 at 01:46:40 AM UTC Unix timestamps have 10 digits. This will last until Saturday, November 20th 2286 05:46:39 PM UTC, leap seconds ignored. That means that for that period of time it will work to compare Unix timestamps as strings! But you are still a lousy developer, if you do so! 😁

## Floating Point Numbers

Comparing floating point numbers without overflow is left as an exercice to the reader because it is simple. You first compare the part before the dot, and if that is equals you always compare the fractional parts as strings.

## Source Code

The complete source code (download link)
follows below. Save it as `compare-integers.c`

and create a binary
executable `compare-integers`

from it with the command `make compare-integers`

.

You can test it like `./compare-integers 1303 2304`

.

`#include `
#include
#include
#include
#include
static int is_integer(const char *str);
static int compare_integers(const char *num1, const char *num2);
int
main(int argc, char *argv[])
{
if (argc != 3) {
fprintf(stderr, "Usage: %s INTEGER1 INTEGER2\n", argv[0]);
exit(1);
}
if (!is_integer(argv[1])) {
fprintf(stderr, "'%s' is not an integer.\n", argv[1]);
exit(1);
}
if (!is_integer(argv[2])) {
fprintf(stderr, "'%s' is not an integer.\n", argv[2]);
exit(1);
}
int result = compare_integers(argv[1], argv[2]);
if (result > 0) {
printf("%s is greater than %s.\n", argv[1], argv[2]);
} else if (result < 0) {
printf("%s is less than %s.\n", argv[1], argv[2]);
} else {
printf("%s equals %s.\n", argv[1], argv[2]);
}
return 0;
}
static int
compare_integers(const char *num1, const char *num2)
{
char sign1 = '+', sign2 = '+';
if (*num1 == '-' || *num1 == '+') {
sign1 = *num1;
++num1;
}
if (*num2 == '-' || *num2 == '+') {
sign2 = *num2;
++num2;
}
while (num1[0] == '0') ++num1;
if (!num1[0]) --num1;
while (num2[0] == '0') ++num2;
if (!num2[0]) --num2;
if (sign1 == '-' && num1[0] == '0' && num1[1] == '\0')
sign1 = '+';
if (sign2 == '-' && num2[0] == '0' && num2[1] == '\0')
sign2 = '+';
if (sign1 == '-' && sign2 == '+') {
return -1;
}
if (sign1 == '+' && sign2 == '-') {
return -1;
}
int result_sign = sign1 == '-' ? -1 : +1;
size_t l1 = strlen(num1);
size_t l2 = strlen(num2);
if (l1 > l2) {
return result_sign * +1;
} else if (l1 < l2) {
return result_sign * -1;
}
return result_sign * strcmp(num1, num2);
}
static int
is_integer(const char *ptr)
{
if (ptr[0] == '-') {
++ptr;
} else if (ptr[0] == '+') {
++ptr;
}
if (ptr[0] == '\0') return 0;
while (ptr[0] != '\0') {
if (!isdigit(ptr[0])) return 0;
++ptr;
}
return 1;
}

blog comments powered by Disqus