How can data be split among three sites such that any two of the site’s data is enough to reconstruct the original data?
This question led to an hours-long web search for key words I can never remember, “Reed-Solomon.” And a lot of useless information.
Time to think for myself.
Which was fun.
I figured out quickly an easy way to solve the problem is to store 2 bits in each of the three sites for every 3 original data bits. Of the 3 data bits, site A has bits 1 and 2. Site B has bits 2 and 3. And site C has bits 1 and 3. Any pair of sites has all 3 bits between them.
This didn’t look optimal – 6 bits stored for every 3 bits of data. Sure, it is only double the original data size, but the two-of-three logic doesn’t really buy any extra reliability over two mirrors of the data. Maybe some obfuscation of the data at each site, and maybe it’s harder for a bad guy to get his hands on two sites’ data than any one, but that’s it.
The other thing I figured quickly was that each site would need to store at least 3 bits for every 6 data bits – 9 stored bits, total. It would be asking a lot to recover 6 bits of data from, say, 5 bits of storage from a pair of sites. Or 100 data bits from 99 site bits. Or 1,000,000,000 data bits from 999,999,999 site bits. Yes. very difficult.
But how to do better than 6 bits stored for every 3 data bits?
That stumped me. I didn’t like either the idea of using ungrocked, off-the-shelf code or trying to make sense out of the horrid, academic writings that explain how this sort of thing is officially, correctly done.
And when have I ever been interested in doing things the official way?
So, I figured:
- There would be some actual minimum at or above 50% of the data that each site would need to store. This minimum would probably be some odd transcendental meditation number depending upon pi to the e’th power or some such.
- The simple data recover operation I already knew for each 3 bits of data has 4 bits from any pair of sites to work with – 2 from each of two sites. 2 of these 4 bits are unique and needed. The other 2 are dupes of each other. We know which bits they are, too, since we know which of the three sites the two recovery sites are! e.g. Sites A and B? Of the 3 data bits, A has bit 1, B has bit 3 and they both have bit 2. Sites B and C? Of the 3 data bits, B has bit 2, C has bit 1 and both have bit 3.
- The thing to do is to use half of each of those duped bits to replace one of those 2, unique data bits.
So, I changed to thinking of the original data being 6 bits rather than 3. No more half-bits. A half-bit is just too much to wrap the mind around. Instead, each site would need 3 or more bits for every 6 bits of data.
For instance, as above, when the data is recovered, each of a pair of sites can contribute 2 bits of unique, required data not at the other site. And each site can have 2 bits that dupe the other site’s. That’s 4 bits, total, at each site. 12 bits of storage for 6 data bits. The dupes are required because if either of a pair of sites didn’t have its copies of the duped bits, things would get difficult when that site is paired up with the third site – the one which does not have a copy of those particular bits.
OK. How to spread those duped bit’s information? How to trade those wasted bits for some of the information contained in the needed, unique 4 bits?
Parity combines parts of bits.
What is parity?
Add a bunch of data bits together. That is, count the ones in a list of ones and zeros. Is the count odd? The parity bit is 1. Even? The parity bit is 0. Effectively, some of the information in each of the bits is “stored” in the parity bit. But it’s all smeared together so you can’t reliably get the original data bits back from the parity bit. Parity loses information. In a way, the parity bit contains parts of each of the data bits. If a parity bit is computed from four data bits, then, in effect, the parity bit has a quarter of each of the four bits’ information. But, a “bit” is, by definition, the minimum quantity of data. So, Mr. Parity is one interesting fellow, what with having a part of each, indivisible data bit.
But how to use parity?
Each site would store 3 parity bits for every 6 bits of original data. Combine the 3 parity bits from two sites and use the 6-bit result to look up the original 6-bit data value in a 64 element translate/lookup table.
But, which bits of the original 6 bits of data would the parity bits each cover?
Since I didn’t know the answer to that question I just wrote down on paper parity masks in an arbitrary pattern. Each of the 9 parity masks defined a parity bit for a different combination of data bits. As an example, site A’s first bit was parity for data bits 1, 2, 3, and 4. Site A’s second parity bit was for data bits 3, 4, 5, and 6. Etc.
Problem: I can’t accurately calculate 9 parity bits for even one 6-bit value, let alone all 64. So, how am I to know if each of the 64 possible data values has a unique parity value for each of the three pairs of sites, AB, AC, and BC?
No problem: God invented computers to replace my clerical ineptitude.
And that’s when I got lucky.
I mistyped two parity bit mask definitions.
And ran the program against all 64 possible values that 6 bits of data can have.
The program said that sites A and B together needed 1 extra bit to disambiguate all their shared parity values. Ditto, sites A and C. But B and C were good to go with just their parity bits! And, not only that, only site A needed the extra bit! It could be used for both the AB and AC combinations.
That meant that site A could store 4 bits and sites B and C could each store 3 bits. 10 bits, total, to recover the original 6 bits. 12 bits was beat on the very first try.
Then I saw a typo.
One of the “parity” bits in site C was a “parity” bit of only one data bit! It did not have the full mask I’d drawn on paper.
Easy to fix.
Oooops. All three sites needed an extra bit.
And, since the code was typed without a lot of thought, I spent a lot of time checking it out. It had been too easy to get the storage requirement down to 10 bits per 6 data bits. Something must be wrong.
Nope. In fact, when I fixed an “x” typo and flipped one of the bits in the mask the typo was in, the program declared that 9 backup bits did the trick. In two tries I was at the true minimum data storage requirement. And all I’d done was type badly.
Pleased, I was.
Rest of the story:
Packing and unpacking 3/6 bit values? Yuck! I made the three sites have 4 parity bits each, covering 8 data bits. That way, the sites’ data would pack evenly in to bytes.
The program had to find the parity masks for the 12 parity bits. Too much work for me even to make the masks in some fancy pattern. After the bugs were out of the program (the key word being “after” – the start of “after” was a looooong time coming, what with my typing and all), my modern, personal super-computer took about a moment and a half to find a gob of perfect parity mask sets.
I fooled around for a while letting the program find parity masks with a minimum or maximum of data bits covered per parity bit. And finding parity masks that were as close to each parity bit covering half of the data bits as possible. And noting that some of the mask bits can be either one or zero and they still work fine.
All fun, but like the slide-out at the bottom of the hill, the thrill was over.
The program to split/join files is two_of_three.py.
It actually works.
Stand back and consider. Isn’t it kind of magical that any two parts of the data can be put together to get the whole? And the two parts are, together, no bigger than the whole.
From an information point of view, notice the sneaky way that information about which site is which is the key to how this whole scheme works so simply. You can’t just take two sites’ 4-bit parity nibbles, make one of them the high nibble and one the low and then do the 256-entry table lookup to find the data byte. No, you have to know which of the sites is the high nibble and which the low. Certainly not a problem. Bad data is pretty easy to spot. Especially encrypted backup data. The program actually stores an extra data byte at the top of the data. The extra byte is used to identify the sites and weakly cross-check that the joining logic uses the same parity tables the splitting logic did. There’s a byte or two at the end of the data, too, to tell whether the file has an even or odd number of bytes in it. The parity nibbles are stored packed in bytes, so half the time there’s an extra nibble.
It turns out that the logic can be done without using table lookup. That’s handy if the lookup tables would be too big for memory. The extra cost of non-table driven is probably parity translations on the sites’ parity bits, themselves. That is, the original data bits would each be a parity bit on pre-calculated subsets of the sites’ parity bits.
Such logic would nicely handle recovery from M sites out N where M time N causes the lookup tables to be too big. The lookup tables are sized at 2 to the power of M times N entries. So, if M is 3 and N is 5, then the tables have 32k, 15 bit entries. When M*N gets up in the 20’s and 30’s, tables get big.
I’ve not yet found a way to quickly generate the parity masks for M and N sizes greater than 3/5 and 6/2. The search program is pretty dumb, though.