It's a novice mistake for a corporation to violate criminal law. The right way is to pay off a Congressman to get the law changed, while at the same time making it illegal for your competitor so that the government will raid them with machine guns when they try to engage in the behavior that was made legal for you.
Realistically speaking applications aren't going to support the notion of a metaformat like this. If there's demand for this type of redundancy it would be better achieved by implementing it in the filesystem. And most modern file systems log their metadata updates anyway so the likelihood of not being able to reconstruct the file extents is rather low vs the probability of media-level corruption affecting the user data itself.
As for the SBX of hashes not being locatable due to metadata corruption, you can avoid that by applying a header to the SBX blocks themselves.
Ok, I missed the data layout details at the end of the article. Rather than creating a super-set SBX file that holds all the original user data + 16-byte headers did you consider just storing the SHA-256 of every user block of data and storing that in the SBX file? You can reconstruct the file during your scan by SHA-256 every block read and matching it against the list of hashes in the SBX file.
So if I'm understanding you correctly, the first 4K block is meatdata-only (no user data), then each additional block can contain up to 4096 - 16 bytes of user data (first 16 bytes of each block is the header).
The MFT limit is closer to 1K. But if your minimum size is 1K that should be fine.
Next question - for the encoding of the file, you're putting a 16-byte header in front of every blocksize-piece of data, correct? If that's the case, and if you're storing the entire block of original data after that pre-pended header, then how are you assuring that the spill-over piece of data will be on a contiguous block on the disk? For example, say you're encoding a single 4096 byte file using a 4K blocksize. The SBX-equivalent size would be 4,112 bytes; how are you assuring the final 16 bytes of data are on a contiguous disk block to the first 4K?
I did a quick read of the code and see that it relies on a magic cookie in the first four bytes of every physical sector to identify a block. This may not work for files small enough to fit entirely within the MFT on NTFS since that data isn't guaranteed to be aligned on a physical sector. There are other filesystems that store small file segments in the metadata structures as well.
Let's just make it illegal to hire all foreign workers so that every one of our tech companies moves entirely overseas and takes every American tech job with it. Wait, making competition illegal isn't the solution then?
CPU caches are set-associative so having sparse data doesn't hurt CPU performance, esp when the area occupied in memory by XMP would be a single large contiguous block.
There is an asymmetric relationship between the time it takes to access data and the time it takes to read, even with fast SSDs with low command overhead. So the additional transfer time of 5MB over a large set of files will be a rounding error relative to the total time spent issuing the I/Os.
The Feds have been investigating her and her company for a long time and here are details of obvious illegal activity (fraud) discovered by a civil investigation during the course of an investor lawsuit.
Perhaps a larger version placed at crucial nerves near the spinal cord would allow tremors to be eliminate for the entire body.
Which explains the varied outcomes of the two most recent elections.
Because the outcome would be worse if it did.
Now I can keep golf486 and never have to use golf487.
All that will remain is the vomit from people trying out the VR goggles.
A eulogy.
It's a novice mistake for a corporation to violate criminal law. The right way is to pay off a Congressman to get the law changed, while at the same time making it illegal for your competitor so that the government will raid them with machine guns when they try to engage in the behavior that was made legal for you.
Timmy is back and this time he's angry.
A chicken crossed the road to get to the other side.
Lock them in jail and throw away the key.
Realistically speaking applications aren't going to support the notion of a metaformat like this. If there's demand for this type of redundancy it would be better achieved by implementing it in the filesystem. And most modern file systems log their metadata updates anyway so the likelihood of not being able to reconstruct the file extents is rather low vs the probability of media-level corruption affecting the user data itself. As for the SBX of hashes not being locatable due to metadata corruption, you can avoid that by applying a header to the SBX blocks themselves.
For archival perhaps. But then they're denied access to use the original file unless they unpack it first.
I think in practical applications users would prefer just storing the hash. It's just over 6% storage overhead vs 103%.
Ok, I missed the data layout details at the end of the article. Rather than creating a super-set SBX file that holds all the original user data + 16-byte headers did you consider just storing the SHA-256 of every user block of data and storing that in the SBX file? You can reconstruct the file during your scan by SHA-256 every block read and matching it against the list of hashes in the SBX file.
So if I'm understanding you correctly, the first 4K block is meatdata-only (no user data), then each additional block can contain up to 4096 - 16 bytes of user data (first 16 bytes of each block is the header).
The MFT limit is closer to 1K. But if your minimum size is 1K that should be fine.
Next question - for the encoding of the file, you're putting a 16-byte header in front of every blocksize-piece of data, correct? If that's the case, and if you're storing the entire block of original data after that pre-pended header, then how are you assuring that the spill-over piece of data will be on a contiguous block on the disk? For example, say you're encoding a single 4096 byte file using a 4K blocksize. The SBX-equivalent size would be 4,112 bytes; how are you assuring the final 16 bytes of data are on a contiguous disk block to the first 4K?
I did a quick read of the code and see that it relies on a magic cookie in the first four bytes of every physical sector to identify a block. This may not work for files small enough to fit entirely within the MFT on NTFS since that data isn't guaranteed to be aligned on a physical sector. There are other filesystems that store small file segments in the metadata structures as well.
Let's just make it illegal to hire all foreign workers so that every one of our tech companies moves entirely overseas and takes every American tech job with it. Wait, making competition illegal isn't the solution then?
Windows doesn't display the XMP data from executables, at least not without prompting from the user or a Win32 app.
Agreed, but just for a fun comparative reference, the size of an entire Windows 3.1 installation was less than 15 MB :)
CPU caches are set-associative so having sparse data doesn't hurt CPU performance, esp when the area occupied in memory by XMP would be a single large contiguous block.
There is an asymmetric relationship between the time it takes to access data and the time it takes to read, even with fast SSDs with low command overhead. So the additional transfer time of 5MB over a large set of files will be a rounding error relative to the total time spent issuing the I/Os.
They don't need to parse it - it's described as separate sections within the executable.
The Windows executable loader doesn't look at this extraneous XMP data so why would it consume CPU cycles?
The Feds have been investigating her and her company for a long time and here are details of obvious illegal activity (fraud) discovered by a civil investigation during the course of an investor lawsuit.