Community Test Data Repository?
BlizzyMadden inputs this query: "Currently I am working on a small utility to convert HTML to plain text. As I test this, I create more and more different types of HTML files to regression test it. I wonder to myself if these test files that I make would be beneficial to other developers who may be doing similar work. To expand on this thought, I wonder if there is a community-based repository of test data anywhere that developers and use and contribute to. Just curious if anyone knows of any project website out there that offers this."
"Such a repository would be useful for files like the following:
Complex HTML files.Files like this would be great if developers were to share them to debug their own applications."
RFT and Word files with lots of formatting.
Large text files.
Excel files with complex equations and macros.
Another good idea is to pull a couple hundred websites with Wget -r :)
OF course, slashdot belongs in the "Broken HTML No-Css Table Mess" variety of HTML (just like they call Crushed Bean No-Froth Dark Latte - a coffee)Quidquid latine dictum sit, altum videtur
If there isn't a test data project maybe you could start one. If people agree that it's a good idea then it'll grow... if not...
I believe the idea has merit and should be done. This would be useful for the developers of many FOSS applications. A "torture test" of nasty Excel files or Word files would help Open Office etc. HTML files would be good for the Mozilla team. Maybe they would be interested in providing the first few sets of data.
I'd also recommend tying the automated regression tests to this open source test data so every developer could download the source & the test data and make sure the new feature doesn't break anything...
Any new troublesome files could be added to the test data and new tests could be built to ensure that the software deals with them.
What are you listening to? (http://megamanic.blogetery.com/)
Why the hell is that a troll. In the past I've wanted 100,000 or so mailing addresses to test an indexing routine on, and have ended up spending time writing a random address generator. If I'd have been able to go to a site (like lorum ipsum), ask for 100,000 addresses in CSV format and had these downloadable as a zipped file, it'd have saved time. I'm sure I'm not the only developer this has happened to. Jeez.
Training monkeys for world domination since 1439
The idea of a testing repository is quite interesting, but, in practice, a useless one.
Such a repository would end up as no more than a garbage collection. Additionally, it is generally not too hard to create test data for most projects. Also, the chance that someone else has created test data for the exact problem you are working on is quite slim. And then there is always the most important point of them all:
If someone has already created test data for your specific problem, they have probably already solved your problem! Enter respositories like CPAN and SourceForge.
There are 10 types of people in the world. Those who understand binary and those who do not.