Dude, they aren't doing it because they can't possibly find a single person who can ask a single technical question. They're doing it because companies are innundated with people looking for technical jobs who have absolutely zero idea what they're talking about, and they have to have some kind of filter to keep their engineers from spending all day interviewing idiots.
On both the occasions I writing about, there was no technical inteviewer at all, as far as I recall (I forget, this was back in 1996/1997).
Some guy... had some sort of confused concept of a string
If that sort of thing happens I terminate the interview early and send the candidate away. Lying on the CV is the one thing that is absolutely guaranteed to make me reject a job applicant.
When interviewing with someone else (normally these days I interview candidates by myself) we agree a "code word" which signals that the candidate is a no-hoper and we just need to get rid of them quickly. We pick a word that is unlikely to come up in normal conversation but which can be shoehorned in without seeming completely surreal. I've never had to use it for real, as the only real no-hoper I interviewed was on an occasion when I was interviewing alone. This is not surprising; normally the agencies and the HR dept don't pass on completely useless people.
Research in this area is also being conducted by the UK universities of Bath and Liverpool, in collaboration with the Wellcome Trust and Smith & Nephew.
NEWS FLASH: everyone can't hire the top five percent. I'd say a good 99.9% of startups wouldn't know a good tech guy if he rewrote the Linux kernel as a Perl one-liner. This is just a scapegoat for the fact that they have no clue how to hire talented people.
You're so right.
I've had a whole lot of interviews recently. Some of them with releatively senior staff. A lot of the time these guys seem to prefer candidates with a noticable similarity to themselves. I suppose to some extent, this is justifiable - since the interviewer is senior, selecting candidates who exhibity similarities to themselves will cause the company to preferentially recruit people predisposed to advancement in the company. Provided the way the company promotes actually reflects contribution to their corporate success, then the strategy can work pretty well. However, this method does obviously beg the question, what other good candiates are they missing out on?
Most of the interviews I've had were with a pair of interviewers. However, at one company I was interviewed by a larger number of people, but separately. Perhaps that kind of process is less prone to selecting just candidates who are similar to the interviewer.
I've certainly started jobs in the past where I've come into the shop and thought, "Wow. These guys are still rubbing sticks to make fire". In those cases, the arrival of new people can be really good - it can jar the organisation out of a rut and make them more creative. This happens most easily of course with smaller teams.
On previous epiodes of job hunting, I've been offered "aptitude tests" at the interview. The kind of thing I'm talking about is where the HR person gives you a printed sheet of C++ exercises. I find those a real turn-off. It means that that company cannot field an interviewer who they can rely on to gauge the technial ability of a candidate. Those question sheets also invariably contain errors. In fact every time this has happened I have done the test anyway, been offered the job, and turned it down.
I recall interviewing at a Belgian financial software company and being asked if I wanted to take the C test or the C++ test (this was in 1996, I think). I said, "both". "But you only have 30 minutes to do the test". "Fine". So I completed both, and circled the errors in the questions, too. When the company contacted me with a job offer I explained that I was not interested in working for them. They came back with an improved offer and explained that they were very keen because I had scored better than anybody else ever had on their C++ test. Of course that made me want to work there even less. If I was the best candidate they'd ever had, everybody already in the company must be worse. Would I want to work in that environment? No thanks. Just imagine the reams of poor code.
Fortunately not all my interview experiences have been like that. In some recent interviews I've been asked questions that have really made me think hard, and we've talked about the issues surrounding the question; that gives me confidence that there are other people in the company who are technically very competent and care about the sorts of things I do (correctness, elegance, performance, maintainability). So I've found a job that I think I'm going to really enjoy doing.
Fast-forward to 2014.
Google the offers most popular network features, the OS, and the applications.
Every time something new comes along Google ties its version of that into its vast array of other services, and people gravitate towards it by default.
How is this different then Microsoft bundling IE?
Motivation. Don't Be Evil, remember? Google really really take that seriously.
Enter Unix, in the guise of SCT Banner. Don't let anyone kid you, it's an ERP, and a big hairy mess.
Banner... wow. I haven't worked with that for years. I worked on a project once to automate interaction with it in order to integrate it with another online system. The automation was via screen-scraping.
The only problem was that the project didn't have the money to buy a screen-scraping product. Enter the propeller-head. It was like a software version of Junkyard Wars. I took a copy of xterm(1) and ripped out all the X11 code. Then I welded it onto the Tcl interpreter. Then I added in a little bit of new functionality modelled after expect(1), except in two dimensions rather than one. Then I (metaphorically) stuck Duct Tape in all the gaps, and we had a screen-scraper.
That took about four weeks. I spent another six weeks or so trying to automate the interaction with just two Banner screens. It was murder. The whole thing was never reliable. That is the problem with screen-scraping; there is usually nobody who can enumerate for you all the possible responses that the program might make. The version of Banner I was using had lots of PL/SQL triggers in it, and so any particular transaction could fail in an almost uncountable number of ways, and each failure would generate either a message on the status line of the screen or a dialog box. Sometimes you even had to scroll the dialog box to see the full error message. It was challenging, but challenging-irritating, not challenging-fun.
For insignt into the experience of writing screen scrapers, see this short animation.
One of the problems is that, obviously, exploits can be known by The Bad Guys but not the software maintenance community (i.e. upstream maintainer, Debian package maintainer, Debian security folk). That's obviously bad.
A less obvious but perhaps more frequent problem is where security problems are discovered and announced in upstream packages, but the information doesn't flow down to all the distributions. There's no formalised or automated mechanism by which distribution security teams get alerted to relevant upstream security fixes. You might get duscussion of the problem on a mailing list which is specific to the upstream package, but the Debian Security team can't be expected to subscribe to all those lists.
Similarly though, you can't rely on upstream maintainers reliably notifying 19 (or however many) distribution security contacts for each security-relevant release. In the specific case of Debian, this sort of thing is the Debian package maitainer's responsibility. However, there are thousands of Debian packages; some of the maintainers are very responsive and some are less so. Even the responsive ones go on vacation sometimes.
I'm an upstream maintainer. I'm pretty sure that for some of the distrubutions, nobody has subscribed to the mailing list where security problems would be announced (bug-whatever@gnu.org). In this particular exmaple, Debian isn't one of them - the Debian maintainer in this specific case is very active.
However, having a single point where Linux-relevant security announcements could go would be useful. BUGTRAQ simply isn't it (partly because its mailing list software is somewhat broken, also because of the noise level due to broken out-of-office response programs, and because solving this problem isn't the goal of that mailing list). That way, at least the Debian Security team - among others - could count on being notified reliably about known problems.
Of course then you still have a workload for the security team of analysing problems, deciding on responses and preparing NMUs. That may indeed require more people - I'm not claiming that an aggregated feed of upstream security concerns and fixes solves the whole problem.
I remember earrings made from the cores of Pentium chips with the
FDIV bug
- the ones Intel recalled. I suppose Intel had to do something with all the recalled CPUs.
This Truck Number tends to imply that all people are contributing equally to a project.
It doesn't as you'll see below...
I've been on projects where there may be 10 folks working on it and the project would be devestated if, say, 2 specific developers were run over by a truck, but could still carry on even if 5 different developers were killed.
The Truck Number for that project would be 2, since the smallest number of developers you'd have to wipe out to derail the project is 2. The importance of the Truck Number is that it pays attention to the worst case scenario. See also
MoreFunWithTruckNumbers.
It was for the ability to obtain an "emergency tolerant" skillset.
You could lose a good few staff from any area, and your knowledge base wasn't significantly impacted.
An important metric for any software project is its Truck Number. This is the minimum number of your staff that would need to be hit by a runaway truck hitting a bus queue in order to completely derail your project.
So, if your project truck number is 2, you could afford to lose one member of staff due to a random event (sickness, quitting, etc.) but not two.
This hardly ever happens, if at all, in the UK. Most police cars on motorways travel at a significant amount (>5mph) below the speed limit. This allows other drivers to overtake them so that the police car doesn't cause congestion on the motorway - since people won't overtake a police car if they have to speed to do it. Once they're safely beyond the police car, they can speed up a bit. The police obviously know this. It's a sensible policy on the police's part.
As for being above the law, my cousin is a police officer. Her boss (also a police officer, obviously) was disciplined for speeding in a police car. The boss is the assistant chief constable of that police force. There must be only about 30 officers of that seniority in the whole of the UK, so it's probably safe to say that the
British police are not above the law.
On the other side of this coin, a couple of weeks ago there was a newsworthy court case where a British police officer was
prosecuted for speeding, and the court let him off, basically on the grounds that he needed to do what he did.
Might be because we realized that the IPV6 protocol was unnecessary.
Once people were forced to NAT, it suddently dawned on the great mass of people that workstations shouldn't be getting public IPs for security and management reasons.
You're confusing addressability with reachability. It's right that workstations should not in general be directly reachable from random other points on the internet, but that doesn't mean that this should be done only via NAT. Normal firewalling is the right way to limit reachability.
NAT imposes a number of design constraints and generally makes a lot of complex things even more difficult than they need to be.
For example, I once had to diagnose problems with an FTP transfer between two machines. This would have been easy if it were not for the fact that there were three layers of NAT (two of which translated both source and destination addresses) between the two. These NAT layers were translating the source address of the original DNS query twice, the destination address of the DNS query (three times), the source address of the DNS response packet (three times), the destination address of the DNS response packet (twice), the contents of the DNS response itself (twice), the source (twice) and destination (thrice) addresses of the resulting TCP connection for the FTP control channel, modifying the PORT commands passing over the control channel (twice, I think), and the source (three times) and destination (twice) addresses of the FTP data connection.
Suffice to say that when the FTP transfers weren't working, diagnosing where the problem lay was rather complex, especially as more than one organisation was involved (two of the NAT layers were in one organisation, and the third was in another).
You can't implement NAT fully without performing data changes at the application-level protocol layer (for example FTP PORT commands), and that's evil (in the hackish sense of the word).
Retinal scanning, iris photography, finger prints, hand vein scanning. When will they produce a biometric scanning system which is based on things the bad guys won't cut off or cut out to get into the secure facility?
If you think gas is expensive over there, try here
on
Liquid Hydrogen UAV
·
· Score: 1
Gas is about $2.50 per gallon in the USA at the moment, I think. In the UK it's somewhere around 85 pence a litre, which works out as $5.79 a gallon.
With a project as big and important as GCC, you'd think they'd have a server for each platform set up for all their developers to play with. Gentoo has Sparc, MIPS, PPC, etc. boxes for their developers to use for porting software.
I think you have overestimated the level of funding available to the Free Software Foundation, for one thing. Until quite recently, for example, fencepost.gnu.org (the machine that used to be used as the bastion host for the GNU project) was a Commodore Amiga!
The FSF doesn't have the money to manage and administer and maintain a big cluster of disparate machines.
This is occasionally a problem; I have had portability problems in findutils relating to NetBSD and Solaris, for example (and, less significantly, Ultrix and Unicos).
It seems to me that a smart idea would be to have some kind of system where a developer could submit a patch, which would then be sent out to a server farm, where each server would try to compile GCC with the patch, then run a test suite. Doesn't Mozilla do something like this nightly?
Yes, it's called Tinderbox.
This would be a good idea for other projects too. It would probably take a reasonably talented person about a month to set up a farm of machines to do nightly builds of all the GNU project's software. That is, of course, if the hardware infrastructure existed.
However, as other posters have commented, it's not always just accidental portability bugs. It is frequently the case that specific code has to be written to support particular platforms (this has been the case for supporting secure directory tree traversal on Solaris, for example). That extra code is going to be additional effort, and it is not unreasonable to expect that extra coding effort to be invested by those with specific interest in that problem. Sadly, this is not often the case. There are lots of important free software projects that could do with more help.
HP has recently been making the rounds promoting their new company blogging efforts.
[...] So imagine my surprise when I tried to legitimately leave a comment critical of HP at David Gee's HP blog and had my comment quickly erased and my HP passport (required to leave comments) revoked.
If the hash you're talking about is MD5, it has only 256 output bits (as opposed to SHA-1's 160), so you would need at most 2^256 input files to get a 'full set'.
While it may be the case that for some hashed values there are no bit sequences of the right length that produce the given hash result, that won't matter as they'll never be needed (since we're talking about finding bogus data with the same hash as a known file).
So the good news then is that they only need to store a maximum of 5028786540381533915569733539034509401168311539 5958 303487667351851366745611458 example blocks for full coverage...
This is not true. It might work on Kazaa but most other P2P networks use MD5 or better. Okay, they have found collisions but no one has found a way to generate file for a given key.
Actually they have found a way to find a file that produces a given hash. See
this paper and this paper. I think SHA-1 and MD5 are affected. Not sure about MD4.
SHA-256 looks like it's the way to go for now. However, if you are now designing a system which is intended to last a number of years that needs to use hashes to determine if two items are the same, then I would suggest that you use two unrelated hash functions to do the job. This is especially true if anyone else might have an incentive to fool the system.
For example, you send the company a copy of the.mp3 file you want to drive out of circulation. They feed it to a computation cluster and eventually out comes another file which has the same hash. You then publish this new file with the same filename on the victim P2P network and hope that it spreads enough to poison the P2P well, so to speak. There are a number of problems with this scheme (assuming of course that this is the sort of scheme that they offer):
The new 'collision' file might have the same MD5 hash, but is it a valid MP3 file?
All it takes to beat this scheme is for P2P software to use more than one hash function, for example
After all, even though we now know how to find collisions in MD5 and SHA-1 (quite slowly) we don't yet know an efficient way to find a single file that is a hash collision for both of them.
If the company paying the money for the 'collision' file is doing so because somebody has spread their material around the P2P network, then the file must be quite prevalent. So why would they expect the 'collision' file to preferentially spread around the network enough to displace the original file?
Assuming that you have separate documents that include the
requirements that you've agreed with the users/client, the system test
battery, how the system will be operated/managed/installed, you should
include the following in your high level design document (hint:
high-level desgin documents include pictures but not pseudocode)
Not all of these things will be appropriate for all systems.
This is not a table of contents!
Purpose & Context
why are we doing this?
helps low-level designers (who read the high-level design) to
make the inevitable trade-offs in the right way, and also
helps them to recognise a gap when they see one
how this fits in with everything else
scope (that is, what are we not trying to do?)
References
list your 'upstream' input documents
sources of extra information
ways to arbitrate conflicting statements or to help drive the
process of feeding back reports of defects in this document
what is the baseline? It's important to work from a consistent
set of documents: Know what you are trying to build; otherwise you
won't know when you haven't managed to build it
Assumptions
Check these! Often the users needs to agree to these (but make sure they actually understand the implications!)
System Overview
The Big Picture
what are the interfaces?
what smaller components make up the overall system?
within the components, what are the layers? e.g.
presentation layer
application-specifc ("business") logic
data access, protocol handling, etc.
migration to/from (this verstion of) the system
Explain the end-to-end processing of requests (or whatever)
Main body
pehaps one section per interaction
or per group of web forms,
or per batch process
or per... whatever is appropriate for the things your system does...
include mapping to significant parts of other documents; for example, if you have a database, show how each thing described in your functional spec is stored in the database.
data/business errors - the functional spec demands that these
errors must occur when the system gets this input
technical errors - might be fixed by trying them again later
handling environment problems
database went away
pneumatic drill through network cable
no response from remote system
cluster node failure
out of disk space
server obliterated by thunderbolt
...
But not an error catalogue (since you will not yet know the full
list of error conditions; don't try to put one in here otherwise when
the code is written people will represent a new kind of error as some weird variant of an old one because the old one is in the error catalogue and the catalogue is carved in stone - bad idea!)
likely failure modes - and how the design prevents things getting
worse in each case
Discussion of configurability, scalability, performance issues,
user accessibility, compliance with standards, etc.
Security
trust boundary (which things do we trust, what information are we trying to keep safe?)
authentication
authorisation
data integrity
Compliance matrix
Shows how the design meets each agreed functional (or non-functional) requirement; cross-references the functional spec to each section of the design document
Handy when you change the design document - it shows you which functional spec requirement might be affected and hence which
of the tests will need to be redone (you do have a test battery, don't you?)
what parts of the design should be included but are not yet complete?
When interviewing with someone else (normally these days I interview candidates by myself) we agree a "code word" which signals that the candidate is a no-hoper and we just need to get rid of them quickly. We pick a word that is unlikely to come up in normal conversation but which can be shoehorned in without seeming completely surreal. I've never had to use it for real, as the only real no-hoper I interviewed was on an occasion when I was interviewing alone. This is not surprising; normally the agencies and the HR dept don't pass on completely useless people.
See relevant web pages from the UK Medical Research Council, the UK Department of Health, the NIBSC and Cambridge University's Stem Cell Institute.
Research in this area is also being conducted by the UK universities of Bath and Liverpool, in collaboration with the Wellcome Trust and Smith & Nephew.
I've had a whole lot of interviews recently. Some of them with releatively senior staff. A lot of the time these guys seem to prefer candidates with a noticable similarity to themselves. I suppose to some extent, this is justifiable - since the interviewer is senior, selecting candidates who exhibity similarities to themselves will cause the company to preferentially recruit people predisposed to advancement in the company. Provided the way the company promotes actually reflects contribution to their corporate success, then the strategy can work pretty well. However, this method does obviously beg the question, what other good candiates are they missing out on?
Most of the interviews I've had were with a pair of interviewers. However, at one company I was interviewed by a larger number of people, but separately. Perhaps that kind of process is less prone to selecting just candidates who are similar to the interviewer.
I've certainly started jobs in the past where I've come into the shop and thought, "Wow. These guys are still rubbing sticks to make fire". In those cases, the arrival of new people can be really good - it can jar the organisation out of a rut and make them more creative. This happens most easily of course with smaller teams.
On previous epiodes of job hunting, I've been offered "aptitude tests" at the interview. The kind of thing I'm talking about is where the HR person gives you a printed sheet of C++ exercises. I find those a real turn-off. It means that that company cannot field an interviewer who they can rely on to gauge the technial ability of a candidate. Those question sheets also invariably contain errors. In fact every time this has happened I have done the test anyway, been offered the job, and turned it down.
I recall interviewing at a Belgian financial software company and being asked if I wanted to take the C test or the C++ test (this was in 1996, I think). I said, "both". "But you only have 30 minutes to do the test". "Fine". So I completed both, and circled the errors in the questions, too. When the company contacted me with a job offer I explained that I was not interested in working for them. They came back with an improved offer and explained that they were very keen because I had scored better than anybody else ever had on their C++ test. Of course that made me want to work there even less. If I was the best candidate they'd ever had, everybody already in the company must be worse. Would I want to work in that environment? No thanks. Just imagine the reams of poor code.
Fortunately not all my interview experiences have been like that. In some recent interviews I've been asked questions that have really made me think hard, and we've talked about the issues surrounding the question; that gives me confidence that there are other people in the company who are technically very competent and care about the sorts of things I do (correctness, elegance, performance, maintainability). So I've found a job that I think I'm going to really enjoy doing.
The only problem was that the project didn't have the money to buy a screen-scraping product. Enter the propeller-head. It was like a software version of Junkyard Wars. I took a copy of xterm(1) and ripped out all the X11 code. Then I welded it onto the Tcl interpreter. Then I added in a little bit of new functionality modelled after expect(1), except in two dimensions rather than one. Then I (metaphorically) stuck Duct Tape in all the gaps, and we had a screen-scraper.
That took about four weeks. I spent another six weeks or so trying to automate the interaction with just two Banner screens. It was murder. The whole thing was never reliable. That is the problem with screen-scraping; there is usually nobody who can enumerate for you all the possible responses that the program might make. The version of Banner I was using had lots of PL/SQL triggers in it, and so any particular transaction could fail in an almost uncountable number of ways, and each failure would generate either a message on the status line of the screen or a dialog box. Sometimes you even had to scroll the dialog box to see the full error message. It was challenging, but challenging-irritating, not challenging-fun.
For insignt into the experience of writing screen scrapers, see this short animation.
You guys have just named three distinct information sources that could be useful. See what I mean about the necessity of an "aggregator"?
Sounds ideal. Shame it's not more widely publicised. How do upstream maintainers post to it? Where's the archive? Where is it advertised?
A less obvious but perhaps more frequent problem is where security problems are discovered and announced in upstream packages, but the information doesn't flow down to all the distributions. There's no formalised or automated mechanism by which distribution security teams get alerted to relevant upstream security fixes. You might get duscussion of the problem on a mailing list which is specific to the upstream package, but the Debian Security team can't be expected to subscribe to all those lists.
Similarly though, you can't rely on upstream maintainers reliably notifying 19 (or however many) distribution security contacts for each security-relevant release. In the specific case of Debian, this sort of thing is the Debian package maitainer's responsibility. However, there are thousands of Debian packages; some of the maintainers are very responsive and some are less so. Even the responsive ones go on vacation sometimes.
I'm an upstream maintainer. I'm pretty sure that for some of the distrubutions, nobody has subscribed to the mailing list where security problems would be announced (bug-whatever@gnu.org). In this particular exmaple, Debian isn't one of them - the Debian maintainer in this specific case is very active.
However, having a single point where Linux-relevant security announcements could go would be useful. BUGTRAQ simply isn't it (partly because its mailing list software is somewhat broken, also because of the noise level due to broken out-of-office response programs, and because solving this problem isn't the goal of that mailing list). That way, at least the Debian Security team - among others - could count on being notified reliably about known problems.
Of course then you still have a workload for the security team of analysing problems, deciding on responses and preparing NMUs. That may indeed require more people - I'm not claiming that an aggregated feed of upstream security concerns and fixes solves the whole problem.
I remember earrings made from the cores of Pentium chips with the FDIV bug - the ones Intel recalled. I suppose Intel had to do something with all the recalled CPUs.
So, if your project truck number is 2, you could afford to lose one member of staff due to a random event (sickness, quitting, etc.) but not two.
As for being above the law, my cousin is a police officer. Her boss (also a police officer, obviously) was disciplined for speeding in a police car. The boss is the assistant chief constable of that police force. There must be only about 30 officers of that seniority in the whole of the UK, so it's probably safe to say that the British police are not above the law.
On the other side of this coin, a couple of weeks ago there was a newsworthy court case where a British police officer was prosecuted for speeding, and the court let him off, basically on the grounds that he needed to do what he did.
NAT imposes a number of design constraints and generally makes a lot of complex things even more difficult than they need to be.
For example, I once had to diagnose problems with an FTP transfer between two machines. This would have been easy if it were not for the fact that there were three layers of NAT (two of which translated both source and destination addresses) between the two. These NAT layers were translating the source address of the original DNS query twice, the destination address of the DNS query (three times), the source address of the DNS response packet (three times), the destination address of the DNS response packet (twice), the contents of the DNS response itself (twice), the source (twice) and destination (thrice) addresses of the resulting TCP connection for the FTP control channel, modifying the PORT commands passing over the control channel (twice, I think), and the source (three times) and destination (twice) addresses of the FTP data connection.
Suffice to say that when the FTP transfers weren't working, diagnosing where the problem lay was rather complex, especially as more than one organisation was involved (two of the NAT layers were in one organisation, and the third was in another).
You can't implement NAT fully without performing data changes at the application-level protocol layer (for example FTP PORT commands), and that's evil (in the hackish sense of the word).
Retinal scanning, iris photography, finger prints, hand vein scanning. When will they produce a biometric scanning system which is based on things the bad guys won't cut off or cut out to get into the secure facility?
Gas is about $2.50 per gallon in the USA at the moment, I think. In the UK it's somewhere around 85 pence a litre, which works out as $5.79 a gallon.
The fact that you disagree is no reason to mod articles down, check the moderation rules.
The FSF doesn't have the money to manage and administer and maintain a big cluster of disparate machines. This is occasionally a problem; I have had portability problems in findutils relating to NetBSD and Solaris, for example (and, less significantly, Ultrix and Unicos).
Yes, it's called Tinderbox. This would be a good idea for other projects too. It would probably take a reasonably talented person about a month to set up a farm of machines to do nightly builds of all the GNU project's software. That is, of course, if the hardware infrastructure existed.However, as other posters have commented, it's not always just accidental portability bugs. It is frequently the case that specific code has to be written to support particular platforms (this has been the case for supporting secure directory tree traversal on Solaris, for example). That extra code is going to be additional effort, and it is not unreasonable to expect that extra coding effort to be invested by those with specific interest in that problem. Sadly, this is not often the case. There are lots of important free software projects that could do with more help.
If the hash you're talking about is MD5, it has only 256 output bits (as opposed to SHA-1's 160), so you would need at most 2^256 input files to get a 'full set'.
9 5958 303487667351851366745611458 example blocks for full coverage...
While it may be the case that for some hashed values there are no bit sequences of the right length that produce the given hash result, that won't matter as they'll never be needed (since we're talking about finding bogus data with the same hash as a known file).
So the good news then is that they only need to store a maximum of
502878654038153391556973353903450940116831153
SHA-256 looks like it's the way to go for now. However, if you are now designing a system which is intended to last a number of years that needs to use hashes to determine if two items are the same, then I would suggest that you use two unrelated hash functions to do the job. This is especially true if anyone else might have an incentive to fool the system.
For example, you send the company a copy of the .mp3 file you want to drive out of circulation. They feed it to a computation cluster and eventually out comes another file which has the same hash. You then publish this new file with the same filename on the victim P2P network and hope that it spreads enough to poison the P2P well, so to speak. There are a number of problems with this scheme (assuming of course that this is the sort of scheme that they offer):
Not all of these things will be appropriate for all systems. This is not a table of contents!
Know what you are trying to build; otherwise you won't know when you haven't managed to build it
or per group of web forms,
or per batch process
or per