CRC / Checksums Again

Shudda noted that the method of doing additive checksums by look-up table values makes for some pretty dumb, fast code able to compute things like 256 bit checksums. Just concatenate 16 16-bit checksums, each computed using a different set of tables. Or whatever.

Also thinking about computing a rolling checksum or incremental checksum:

I gather the standard method of distinguishing the difference between, for instance, “xy” and “yx” (which a normal checksum considers equal) is to sum the sum in a separate super-sum. … The final checksum is a combination of the sum and super-sum. The super-sum effectively indexes the position of each byte by shifting the byte’s value up in the super-sum as a function of how far back the byte is in the data stream. Pulling the byte’s effect out of the super-sum is easy: subtract the byte’s value times the length of the rolling data block. Let’s presume the byte-value multiple is precomputed.

Off hand, I don’t see anything wrong with wrapping the sums something like this (Warning: vanilla C carry bit kludge ahead.):

byte           b
uint32         sum_of_sums, sum

sum         += table[b]
sum          = (sum         >> 16) + (sum         & 0xffff)
sum_of_sums +=  sum
sum_of_sums  = (sum_of_sums >> 16) + (sum_of_sums & 0xffff)

Yes, in this 16-bit example if you have more than 64k bytes of data you could have a problem, but you’d have a problem if the overflow goes to the bit bucket. And, anyway, you’d probably want to use 31 bits or 32 bits in a real application, giving you a 2-4 gig of data without overflow. Etc.

Fast changes to a CRC / Checksum of a block of data

For obscure reasons having to do with wanting a better way to read text containing lots of familiar boiler plate, I wandered back in to CRC-land.

Thinking about CRCs reminded me of how the spider player kept track of known game states.

Specifically, PLSPIDER simply used a 48-bit checksum to detect equal game states. A checksum of a game state could be the summation of all cards (including the none-card) in all possible card positions in the game. But, since PLSPIDER needed to know this checksum fast, and each game state was very, very close to the previous state (1 or more cards move, but most stay put), PLSPIDER used a different method to checksum the game state.

The difference between PLSPIDER’s checksum and a normal checksum was that instead of adding in each card’s value to the checksum, each card’s value was translated through a table to a 48-bit value. That value was added to the checksum. The extra wrinkle was that each possible card position in the game had a unique look-up table. That way an Ace, say, at the bottom of the first stack would cause a different 48-bit value to be added to the checksum as would be added if the Ace were in the second-from-bottom position of the same stack.

So, in spider there are a few hundred possible positions that may contain a card. Each of those positions had a value-table associated with it. And, each of the values in the tables were unique. In fact, the python script that generated the values simply picked random numbers with 24 bits set and 24 bits cleared (well, honestly, it had a bug – it did 32-on/32-off and stripped the high 16 bits). And the script rejected a value if any one of the 3 16-bit words of the 48-bit value had already been used in the same word in any table entry. No dupes, down to the 16-bit level.

This sort of logic could be used for message checksums. Imagine a table of 256 32-bit values for each possible byte position in a message. 1024 tables, if the messages can only be 1024 bytes long.

Easy to change a byte in the message and recompute the checksum.

cs = cs – table[msg_pos][old_byte] + table[msg_pos][new_byte]

DATAERRS, a program I first wrote in the early ’80s to evaluate checksum/CRC algorithms and code, says that this sort of checksum is superior to a normal checksum (which, incidently, is better than an XOR-sum / longitudinal parity).

In fact, DATAERRS finds this sort of checksum just as good as a CRC of the same bit-length.