I used to work for a company where 10000 files in a single directory is considered *small*. Try ten or a hundred times that many, and watch EXT(2|3) come to a grinding halt. Reiser (v3) happily obliged with these kinds of loads.
Any test can be made to highlight the good/bad given "properly" (ahem) chosen parameters.
Email that proveably cost something to send is likely not bulk
Higher cost implies the sender believes in (or is targeting) the message
Verifying the cost must be computationally trivial for the recipient
BACKGROUND
The SHA-512 FIPS standard produces a 512 bit, cryptographically secure,
(hence collision-resistant) checksum of a piece of data. When compared bit
per bit, two random checksums will statistically share 50% of their content;
ie: the expected number of identical bits in a pair of SHA-512 hashes is 256.
In short, it would require a certain amount of EFFORT to find a pair
of strings, that when SHA-512 hashed, deviate significantly from sharing
256 bits. This CPU effort is the cost involved in the proposed scheme.
The more effort one is willing to make, the more likely one finds such
a pair with a high deviation. However, verification of this deviation
by the recipient only involves computing two hashes and counting bits.
DETAILS
The proposed system would work with older mail clients, requires no
cryptographic keys or PKI infrastructure, and can easily be adjusted
for "inflation", ie: the gradual speedup of CPUs.
The sender of an email precomputes the following (where '.' is the concatenation operator)
X = sha(sha(sender).sha(recipient).sha(date).sha(subje ct).sha(body))
The sender subsequently produces a random string (alphabetic would be
preferred, since the string will ultimately be embedded into the
header information of the email being sent), and computes
score = abs(256 - matchingBitsBetween(X, sha(randomString))
The sender performs this calculation a number of times with different
random strings, keeping track of the string with the highest score.
The more CPU time the sender is willing to dedicate to looping and
trying out different strings, statistically the higher the best score
will be. At some point, the sender decides that either
(a) enough CPU time has been used, OR,
(b) the best score now crosses an acceptability threshold.
Now, the email is sent, along with a header item such as:
X-CPU-Token: iwpayzsk (+48)
The recipient, upon opening the email, extracts the sender's
email address, date, subject and body information from the
message, combines it with their address (which may be pulled
from the header, or the email client), and computes:
Y = sha(sha(sender).sha(recipient).sha(date).sha(subje ct).sha(body)) and
score = abs(256 - matchingBitsBetween(Y, sha(X-CPU-Token))
This score can then be used for filtering; scores above a user
adjustable threshold could be put into a separate folder.
As CPU power increases, it may become necessary to increase the
minimum score of the CPU token threshold. This would be done to
thwart off bulk mailing agents who find it acceptable to calculate
tokens with a deviation of 40 bits, due to what they feel is an
appropriate use of resources. At 45 bits, it would take too much
time to calculate the tokens for millions of emails -- until CPU
power increases enough. It would then be recommended that users
bump the slider in a dialog box up (to, perhaps, 48 bits).
RESULTS
A straight-forward C implementation of this scheme, running on
a 500MHz Pentium 3 machine is capable of producing tokens with an
average score of 48 bits in about a second.
Verifying the token upon receipt occupies a trivial amount of time.
While it doesn't give the exact time spent in a given function, running 'pstack' against a processID under Solaris will give the execution stack trace of any threads present.
If you find that 80% of your threads are in slow_function( someParam ) then ya better get to work fixing it. This also has the added advantage of not slowing down your program with profiling code and other hooks.
Obviously this isn't great for fine-grained profiling, or with applications with few threads, but I've found it helpful on my larger projects.
DANG! Spermicidal ragout...
on
Google Juice
·
· Score: 1
Silly fingers, press the right keys. You'd think that after this many years ytping, I'd get it right...
Spermicical ragout -- Re:How to Google Whack...
on
Google Juice
·
· Score: 1
$_ =~ s/q/g/;
I used to work for a company where 10000 files in a single directory is considered
*small*. Try ten or a hundred times that many, and watch EXT(2|3) come to a grinding
halt. Reiser (v3) happily obliged with these kinds of loads.
Any test can be made to highlight the good/bad given "properly" (ahem) chosen parameters.
- Cost == CPU effort == time == money
- Bulk email mailers cannot afford high cost
- Email that proveably cost something to send is likely not bulk
- Higher cost implies the sender believes in (or is targeting) the message
- Verifying the cost must be computationally trivial for the recipient
BACKGROUNDThe SHA-512 FIPS standard produces a 512 bit, cryptographically secure, (hence collision-resistant) checksum of a piece of data. When compared bit per bit, two random checksums will statistically share 50% of their content; ie: the expected number of identical bits in a pair of SHA-512 hashes is 256.
In short, it would require a certain amount of EFFORT to find a pair of strings, that when SHA-512 hashed, deviate significantly from sharing 256 bits. This CPU effort is the cost involved in the proposed scheme. The more effort one is willing to make, the more likely one finds such a pair with a high deviation. However, verification of this deviation by the recipient only involves computing two hashes and counting bits.
DETAILS
The proposed system would work with older mail clients, requires no cryptographic keys or PKI infrastructure, and can easily be adjusted for "inflation", ie: the gradual speedup of CPUs.
The sender of an email precomputes the following (where '.' is the concatenation operator)e ct).sha(body))
X = sha(sha(sender).sha(recipient).sha(date).sha(subj
The sender subsequently produces a random string (alphabetic would be preferred, since the string will ultimately be embedded into the header information of the email being sent), and computes
score = abs(256 - matchingBitsBetween(X, sha(randomString))
The sender performs this calculation a number of times with different random strings, keeping track of the string with the highest score. The more CPU time the sender is willing to dedicate to looping and trying out different strings, statistically the higher the best score will be. At some point, the sender decides that either
(a) enough CPU time has been used, OR,
(b) the best score now crosses an acceptability threshold.
Now, the email is sent, along with a header item such as:
X-CPU-Token: iwpayzsk (+48)
The recipient, upon opening the email, extracts the sender's email address, date, subject and body information from the message, combines it with their address (which may be pulled from the header, or the email client), and computes:e ct).sha(body))
Y = sha(sha(sender).sha(recipient).sha(date).sha(subj
and
score = abs(256 - matchingBitsBetween(Y, sha(X-CPU-Token))
This score can then be used for filtering; scores above a user adjustable threshold could be put into a separate folder.
As CPU power increases, it may become necessary to increase the minimum score of the CPU token threshold. This would be done to thwart off bulk mailing agents who find it acceptable to calculate tokens with a deviation of 40 bits, due to what they feel is an appropriate use of resources. At 45 bits, it would take too much time to calculate the tokens for millions of emails -- until CPU power increases enough. It would then be recommended that users bump the slider in a dialog box up (to, perhaps, 48 bits).
RESULTS
A straight-forward C implementation of this scheme, running on a 500MHz Pentium 3 machine is capable of producing tokens with an average score of 48 bits in about a second.
Verifying the token upon receipt occupies a trivial amount of time.
While it doesn't give the exact time spent in a
given function, running 'pstack' against a
processID under Solaris will give the execution
stack trace of any threads present.
If you find that 80% of your threads are in
slow_function( someParam ) then ya better get to
work fixing it. This also has the added advantage
of not slowing down your program with profiling
code and other hooks.
Obviously this isn't great for fine-grained
profiling, or with applications with few threads,
but I've found it helpful on my larger projects.
Silly fingers, press the right keys. You'd think
that after this many years ytping, I'd get it
right...
Yum yum, gotta have some of THAT stew! ;)
Folks, folks, that's not what Hello Kitty is all about.
Check out this vibrator !!
I've never looked at Hello Kitty quite the same way...