Shudda noted that the method of doing additive checksums by look-up table values makes for some pretty dumb, fast code able to compute things like 256 bit checksums. Just concatenate 16 16-bit checksums, each computed using a different set of tables. Or whatever.

Also thinking about computing a rolling checksum or incremental checksum:

I gather the standard method of distinguishing the difference between, for instance, “xy” and “yx” (which a normal checksum considers equal) is to sum the sum in a separate super-sum. … The final checksum is a combination of the sum and super-sum. The super-sum effectively indexes the position of each byte by shifting the byte’s value up in the super-sum as a function of how far back the byte is in the data stream. Pulling the byte’s effect out of the super-sum is easy: subtract the byte’s value times the length of the rolling data block. Let’s presume the byte-value multiple is precomputed.

Off hand, I don’t see anything wrong with wrapping the sums something like this (Warning: vanilla C carry bit kludge ahead.):

byte b uint32 sum_of_sums, sum sum += table[b] sum = (sum >> 16) + (sum & 0xffff) sum_of_sums += sum sum_of_sums = (sum_of_sums >> 16) + (sum_of_sums & 0xffff)

Yes, in this 16-bit example if you have more than 64k bytes of data you could have a problem, but you’d have a problem if the overflow goes to the bit bucket. And, anyway, you’d probably want to use 31 bits or 32 bits in a real application, giving you a 2-4 gig of data without overflow. Etc.