Re:When did mediocrity become something to shoot f
on
Kamikaze Novel Writing
·
· Score: 1
I think that one thing that a lot of people are missing here is that most professional writers don't just throw down 50,000 words of sterling prose. Many professional writers have a habit of treating their writing as a business in which, at the end of the day, they have to produce some product.
In addition, I think one of the things that is killing the arts in the United States is a lack of appreciation for amateurism. Because we think that it is not worth doing anything if you are not the next Claude Monet, Tony Morrison, John Hooker, or Rachel Podger, we have a culture with both pathological stage fright but yet is willing to play armchair critic. In the arts, pro-ams provide important services as informed consumers promoting good works by word of mouth, and fund-raising boosters for art programs.
You have to read the documentation to set it up, but swish-e is an indexing and search system that I've found to be quite effective. It can handle MSWord (with catdoc) , pdf (with xpdf) and mp3 meta tags. It's also not very hard to write a script to extract OpenOffice.org documents to stdout as well. It comes with C and perl bindings and there is a python interface as well.
Re:In the beginning was the command line...
on
Ask Neal Stephenson
·
· Score: 1
Just to expand on this question and make it more interesting.
If you had the opportunity to write an afterword to Command Line that covers the last 5 years of development, what would be included? How would you re-write the metaphor you used early in the book to contrast Windows, Macintosh, Be and Linux? What do you feel is gained and lost by the Gnome and KDE HIGs, and the addition of a unix shell under OSX?
I'm not so certain I buy his claim that nerds don't market themselves because they focus on substance. I must admit that I wasted no small amount of my meagre grocery-store earnings trying to buy the right clothes, find the right look, listen to the right music. It was not a case that I didn't market myself, it was more a case that marketing is something that I'm completely inept at.
Well, this is interesting but I'm not convinced about how much ISO standardization will have an impact. ISO's OSI networking protocol lost a standards battle to tcp/ip in spite of the ISO certification playing well with governments. SGML has been an ISO standard for a while with dialects only recently trickling into common use via XML.
Of course there is the political sniping between many European governments and Microsoft which is a factor.
But it should be noted that just because something is a standard does not mean that everyone will play nice with it. Anybody can subvert the standard through "embrace and extend." Netscape did it with HTML, Microsoft did it with Kerberos, gnu did it with most of the POSIX utilities.
I live in a country with a 300 billion dollar annual PEACETIME military budget, and they can't locate an accidentally dropped nuclear bomb in 12 feet of water to recover it?
*tinfoil hat* it is quite possible that writing it off as unrecoverable was intentional, given the military's habit of dealing with its toxic waste problems with plausible deniability.
Another difference comes to light with the recent Mozilla security vulnerablity. Evidently, the bugs were found through a bug bounty and fixed before the exploit was made public.
No moving parts to freeze? Can you say Hard Drive? Keyboard? Power Switch? Last but not least. The user. You got to have some damn heavy mittens for -85C.
The article specifically says that they used solid-state hard drives. The system was operated remotely so I imagine no keyboard was used.
In addition, -85C was only the exterior surface temperature. One computer was installed under the surface with an average operating temperature of -57C. Another experiment was warmed by waste heat from the stirling engine.
Why are the two competitive. I'm willing to bet that the same projects that distribute solar and human-powered computers to underdeveloped countries, are also allied with projects that distribute solar-powered lanterns and projects that develop safe water projects.
There is also the factor that solar technology is currently cheaper than expanding power grids into rural areas.
Well, if you want to go pedantic about typos, no, it wouldn't return anything because the file extension is.ogg.
True, however the point had very little to do with typos. The point was that it was claimed that seaching for files by date was first not possible using hierarchal file systems, and then not easy. My one-line example shows that not only is it possible with existing hierarchal file systems, but it is perhaps even easier than composing an sql query.
Do that in the root directory and it'll take ten minutes too, whether it returns anything or not, whereas this kind of db thing would be nigh instant (think locate, only with all the filters you can use with find).
Perhaps a bit of a stupid question here, why would the search be directed from the root directory? That is, why wouldn't the person conducting the search limit the search to a subset of the database, rather than search through every record of the disk? An advantage of a hierarchal file system is that it is hierarchal. It is easy to limit an operation to nesting subsets of criteria.
Now of course, you bring up a valid point in that from a user's point of view, find runs slowly. However, the problems with find running slowly have nothing to do with hierarchal file systems, and everything to do with file indexes. There is nothing inherent in the hierachal file system that makes fulltext indexes, date indexes, or keyword indexes difficult.
Argh! Don't say things like that - someone will throw down a shell script which WILL do it (probably in one line) combining find, file, grep, perl/python and some other crap.
In this case, all you need is find.
Which, while it may work for some, entirely misses the real point, is that there's no *easy* way to do this.
What is hard about?:
find . -name "*.oog" -mtime -7
It actually seems to be easier given that the expression requires fewer keystrokes, and would return a correct result rather than an SQL error.
We've got GUI bindings for opening/saving/editing files, but nothing for harnessing the power of searching. Don't say it can't be done (because it can somehow) just say that it can't be done easily, which is the bigger point.:)
The need for GUI bindings for searching files is a very different thing from changing the underlying file system. By all means, yes, lets create easy tools that permit users to conduct full-text and metatag searches of file systems.
But, and here is the big issue, there is nothing inherent in heirachal file systems that make full-text searching or keyword searching difficult.
Re:Reiserfs, storage and why do you want this?
on
Database File System
·
· Score: 1
- maintenance overhead on part of the user to create hierachy and maintain it. Every time you save a file you have to think "where do I put this?"
However, for a relational database filesystem to work, every time you save a file, you have to think "what keywords do I want on this file."
The fallacy in this argument is in believing that something magic will happen that will encourage people who have problems using descriptive heirachal directory names will suddenly use descriptive keywords.
- Finding files can be hard. Is that letter about the planning application in Documents/Letters or Documents/Planning App?
There are a fair number of applications that prove that full-text searching is entirely compatible with traditional heirachal file systems. (grep, glimpse and swish-e come immediately to mind.)
- keeping files in two or more places at once is hard (as in the previous example). You can use softlinks, but that's hardly ideal and doesn't survive moving things around.
Hardlinks are even better for this purpose. However, these problems don't magically go away with relational database systems. You replace the problem of having a file live in two or more places with the problem of maintaining the file in two or more categories.
Why not shift the burden of organising the files onto the enormously powerful computer, rather than take up valuable human mental resources.
Any form of contextual indexing that can be done with files located in a relational database, can be done on files in heirachal file system.
therefore, doing searches on a relational database filesystem (find me all music files with dates between last week and last month: SELECT * from files WHERE files.type = "music" and files.date NOW() - 7days
you _can't_ do that sort of thing on a traditional filesystem.
Ahem:
find . -name *.oog -mtime -7
That's just for the bad SQL you posted (I imagine there must be a missing operator in the where clause between "files.date" and "NOW()".)
A better example of where indexing would be would be useful is a case where you want to index the content of files. For example "I want a list of all papers authored by me where I've referenced Doe, Doe, and Doe's seminal paper 'Mental Masturbation About the Need for Relational Database Filesystems on Slashdot'". But even then, this is a type of indexing that can be easily accomodated within a heirarchal database. (For two examples, glimpse and swish-e come immediately to mind.)
To answer your two questions:
some people mentioned here that they already organise their files. great. fantastic.
HOW LONG DID IT TAKE YOU?
A pretty trivial amount of time. Each new project starts with:
mkdir newproject cd newproject
A maximum of 1 minute, even with my gimpy hands. This is about the same amount of time, and perhaps much less, that I would have to spend adding the keyword "newproject" to every file that gets added to the project.
and how long would it take to reorganise?
Well, it depends on the reorganization. But for example, I reorganized one of my key project directories (version controlled in subversion) with.
(Making two archive "tags" and starting a new branch for the "HEAD" tag.)
A nice thing about heirarchal file systems is that you can reorganize on multiple levels including collections of files. ("proposal" in my case includes about a half-dozen files.) I would argue that it would take me just as much time to recode those files using database keywords as renaming the directories they reside in.
A large part of the argument for relational database filesystems seems to be that the same types of people who are unwilling to do the work necessary to create good heirarchal file trees, will be willing to do the work to attach the metadata needed to replace heirarchal file trees. Switching from a heirachal to relational model is not going to change the GIGO rule, and the solutions around the GIGO problem (such as full-text searching, and journaling) don't depend on either model.
A heirachal model also provides some nice facilities for dealing with related collections of files. It is unclear no me how they would be implemented in a relational model, or how a relational model would deal with stardard filenames like "README", "CHANGELOG" and "index.html".
And just to add to this, most computer simulations work the same way, whether we are talking about thunderstorms, the collision that broke the moon away from the earth, or even crowds in CGI animated films. Abstraction can certainly be quite useful when what you are interested in are the forces that dominate the system at a specific scale.
How can you accurately simulate the computer that is simulating the entire universe?
The same way you simulate anything else. You simplify the problem down to a manageable number of particles that represent larger units of whatever you are simulating. Since in looks like they are interested in mass and gravity at the galactic supercluster scale, they can use particles that weigh much more than any individual star.
So the fundamental challenge for the Virgo team is to approximate that reality in a way that is both feasible to compute and fine-grained enough to yield useful insights. The Virgo astrophysicists have tackled it by coming up with a representation of that epoch's distribution of matter using 10 billion mass points, many more than any other simulation has ever attempted to use.
THESE DIMENSIONLESS POINTS have no real physical meaning; they are just simulation elements, a way of modeling the universe's matter content. Each point is made up of normal and dark matter in proportion to the best current estimates, having a mass a billion times that of our sun, or 2000 trillion trillion trillion (239) kilograms. (The 10 billion particles together account for only 0.003 percent of the observable universe's total mass, but since the universe is homogeneous on the largest scales, the model is more than enough to be representative of the full extent of the cosmos.)
1: The media does what it does every day in regards to astronomy, inflates every story that might possibly have something to do with space aliens out of proportion? (With the result that both the original claims and more nuanced followups become exaggerated.)
2: The MIB.
Never attribute to conspiracy what can be easily described as sheer incompetence.
The implications are largely academic for now. What seems to have been discovered is that there are some shortcuts in MD5 and SHA0 that make these algorithms less robust than previously expected. The existance of one "hole" in an encryption or hash algorithm suggests that there might be more waiting to be discovered. This might take 1 year or it might never happen.
So, right now the primary risk is that someday we wake up to find that enough holes have been discovered to compromise preimage resistance. If this happens, it might be easier (but probably still horrendously expensive) to perform certain kinds of attacks on digital signatures and password files.
Another risk is to find out that these algorithms have an undiscovered bias in their output. MD5 and SHA are sometimes used to generate keys for encryption algorithms. If a bias is discovered, then it might be a possible (but probably still horrendously expensive) attack on some encrypted data.
Again, we are talking about stuff that could happen tomorrow, or could never happen. The consensus I'm reading is that the odds of a worse break in MD5 being discovered have just gone up by a significant level. The odds are not big enough to justify another Y2K panic (partly because quite a bit of software has already made the transition). However, where possible it is prudent to pick SHA1 over MD5.
That's a strong and interesting point. I have to ask, though, did it really change our lifestyle more than the internet? In the space of 10 years, I've witnessed a very strong change in how we live our lives. (At least where I lived, I happily concede I don't have a strong global perspective, as such I'm open to reason.) Yes, the fight against disease is very important and has had an effect on our lives. But, from my limited views, it means less fear of getting sick. It seems pale in comparison to how much each individual networks to talk with others, and in how we get information. I mean, the very fact we're having this discussion on Slashdot. Who'd have expected that 10 years ago? Anybody outside a university?
I would argue that yes, advances in medicine over the 20th century is the innovation that has had the single biggest impact on our lives so far, and will be a driving factor behind politics and public policy that makes the stuff we talk about here on slashdot look trivial in comparison. I mean think about it:
50 years ago, retirement age, pensions and social security were based on assumptions about lifespan and productivity after the age of 65. Now we are looking at a population in America and Europe that can expect to live 20, 30, 40 years after quitting the work force.
The bad side is that Western countries are facing a crisis in their social support networks. People who currently collect pensions or government support make up a sizable voting block that will get even larger over the next 10 years.
The good side is that we have greater potential for tapping into human expertise and extended intellectual productivity.
More that just not being woried about getting sick, it means that children of the 20th century are much less likely to have seen a sibling or parent die than any generation in history. We are talking basic demographics here.
Actually, there is considerable evidence that the internet is an evolutionary change rather than a revolutionary one. The same dynamics of gender, ethnicity, class and power are played out in chatrooms and bulletin boards that were already played out in breakrooms and diners. I really don't consider myself more informed in 2004 than I was as a newspaper addict in 1984. In a large part, the internet just fills in a social networking gap that was created by car culture.
Television is the only thing I see on that list that could qualify with your statement. Everything else, though significant, is not in the same league. There are a LOT of people on this planet that if you were to send them back 100 years in the past, the net would be the thing they most ache for. (Unless they had polio:P)
I think what your are missing here is that the Internet is still primarily the playground of an elite part of the world population. The list certainly is flawed in many respects, air conditioning for example was developed in the 19th century. However, I would argue that widespread refrigeration has had a huge impact transforming the market of agriculture on a global scale.
The polio vaccine on its own is not very impressive, but when you consider the polio vaccine as part of a scientific war on disease, you come up with something that is quite a bit more transformative than the internet. Several of the deadliest killers of the last century are now either extinct in the wild or controlled leading to increased lifespans, decreased infant mortality and massive demographic shifts around the world.
The green revolution of the 50s, 60s and 70s fundamentally transformed how food is produced and distributed in ways that affect a majority of the people on this planet.
Think about it this way, how many people do you know who don't use the internet on a regular basis? Now how many people do you know that have never had an antibiotic, never had a vaccination, don't buy fresh tomatoes in Feburary, don't use products that were packaged in plastic (another invention more revolutionary than the internet to date), don't use anything that includes a transistor.
Certainly, I think that we do need intelligent searches, however, I think many of the arguments for those searches are dramatically overstated as is the contrast with existing filesystem models.
Why should I remember the filepath to a file (or in my case wich computer). That is not how I pick a book from the shelf is it. I don't need to remember the exactly title of the linux o'reilly guide. I can find it very fast by the general size and color even feel and the fact it is most likely near my desk.
Actually, I (and most people who have lots of books in my experience) do have some sort of a filing system. Glancing for look and feel is good when you have a few dozen books, but when you have a few hundred books, it becomes a bit unwieldy. Unfortunately, the current state of my life is that I can generally find a file I'm working on quite a bit quicker than a book I need to find a reference to.
The ultimate idea is for you to instantly be able to find what you want without having to remember weird filenames and paths.
You should not be using weird filenames and paths anyway now that most operating systems allow verbose filenames.
But it is going to very hard to do. All the databases I seen work on the principle: crap in crap out. The trick is not in creating a database file system. The trick is in writing code that can insert content into the database and get meaningfull info on it.
Which is one reason why I'm not convinced that more metadata will change much of anything. I'm also not convinced that this requires a new filesystem design.
Collision avoidance is one, the other is preimage resistance (the difficulty creating an input to the function that produces a known output.)
Whoops, didn't describe that well.
It is easy to produce "33ab5639bfd8e7b95eb1d8d0b87781d4ffea4d5d" if you know that the input is "Hello world". What is still unknown is if there are shortcuts that permit us to (more) quickly find a solution to sha1(x) = "33ab5639..." This solution does not necessarily need to be "Hello world."
Multiple MD5 and one SHA0 collisions were confirmed at the Crypto 2004 conference in Santa Barbara. Perhaps more important is that these collisions demonstrated the feasibility of "shortcuts" to produce a collision. At this time, these are belived to be of little practical significance because they are still computationally expensive and affect only collision avoidance. There are two aspects to MD5 and SHA that are important. Collision avoidance is one, the other is preimage resistance (the difficulty creating an input to the function that produces a known output.) However, it is quite possible that these breaks can be expanded into even larger breaks, including preimage cracking.
While not encryption, MD5 and SHA are used in a variety of ways that are important to encryption. For example PGP and GPG use hash algorithms and salt to convert plantext passphrases into pseudo-random encryption keys. So one possible threat is finding that MD5 and SHA are biased enough to make an attack feasible. It does not matter if blowfish uses 128 bit encryption if the function used to generate the key is significantly biased. Big huge "if."
As someone else pointed out, MD5 is used to encrypt passwords in some password files. If someone expands the shortcut to defeat preimage resistance, it might be easier to find a working passphrase from a password file. Again, this is a big "if."
So the one article is blowing things out of proportion. These are not the kind of breaks that would lead to a practical attack yet. The collisions were created using generated plaintexts so it is not likely that someone can slip a trojan into source code in such a way as to produce the same hash string.
The alternative suggested by the article, LaTeX, is undoubtedly not to everyone's taste either, but at least if you read the article, you will understand the deeper reason Word is frustrating.
Ohh, how I miss LaTeX. I did my qualifying exams and a couple of hefty proposals using LaTeX and they were some of the easiest writing I ever had to do. The primary reason why I don't use LaTeX has to do with the research culture I'm embedded in:
1: APA format style pretty much demands that you pay attention to presentation and content simultaneously. It drives me up the wall how bloody difficult it is to abstract out APA bibliographic citation style.
2: Too many publication venues prefer Word. There are LaTeX -> Word production paths. However none of those production paths accomodate the complex processing required by LaTeX APA style sheets.
I think that one thing that a lot of people are missing here is that most professional writers don't just throw down 50,000 words of sterling prose. Many professional writers have a habit of treating their writing as a business in which, at the end of the day, they have to produce some product.
In addition, I think one of the things that is killing the arts in the United States is a lack of appreciation for amateurism. Because we think that it is not worth doing anything if you are not the next Claude Monet, Tony Morrison, John Hooker, or Rachel Podger, we have a culture with both pathological stage fright but yet is willing to play armchair critic. In the arts, pro-ams provide important services as informed consumers promoting good works by word of mouth, and fund-raising boosters for art programs.
You have to read the documentation to set it up, but swish-e is an indexing and search system that I've found to be quite effective. It can handle MSWord (with catdoc) , pdf (with xpdf) and mp3 meta tags. It's also not very hard to write a script to extract OpenOffice.org documents to stdout as well. It comes with C and perl bindings and there is a python interface as well.
Just to expand on this question and make it more interesting.
If you had the opportunity to write an afterword to Command Line that covers the last 5 years of development, what would be included? How would you re-write the metaphor you used early in the book to contrast Windows, Macintosh, Be and Linux? What do you feel is gained and lost by the Gnome and KDE HIGs, and the addition of a unix shell under OSX?
I'm not so certain I buy his claim that nerds don't market themselves because they focus on substance. I must admit that I wasted no small amount of my meagre grocery-store earnings trying to buy the right clothes, find the right look, listen to the right music. It was not a case that I didn't market myself, it was more a case that marketing is something that I'm completely inept at.
Well, this is interesting but I'm not convinced about how much ISO standardization will have an impact. ISO's OSI networking protocol lost a standards battle to tcp/ip in spite of the ISO certification playing well with governments. SGML has been an ISO standard for a while with dialects only recently trickling into common use via XML.
Of course there is the political sniping between many European governments and Microsoft which is a factor.
But it should be noted that just because something is a standard does not mean that everyone will play nice with it. Anybody can subvert the standard through "embrace and extend." Netscape did it with HTML, Microsoft did it with Kerberos, gnu did it with most of the POSIX utilities.
Bingo. The primary business model behind the development and testing of new GMO crops is to maximize profit, not ecological sustainability.
I live in a country with a 300 billion dollar annual PEACETIME military budget, and they can't locate an accidentally dropped nuclear bomb in 12 feet of water to recover it?
*tinfoil hat* it is quite possible that writing it off as unrecoverable was intentional, given the military's habit of dealing with its toxic waste problems with plausible deniability.
Another difference comes to light with the recent Mozilla security vulnerablity. Evidently, the bugs were found through a bug bounty and fixed before the exploit was made public.
No moving parts to freeze? Can you say Hard Drive? Keyboard? Power Switch?
Last but not least. The user. You got to have some damn heavy mittens for -85C.
The article specifically says that they used solid-state hard drives. The system was operated remotely so I imagine no keyboard was used.
In addition, -85C was only the exterior surface temperature. One computer was installed under the surface with an average operating temperature of -57C. Another experiment was warmed by waste heat from the stirling engine.
Why are the two competitive. I'm willing to bet that the same projects that distribute solar and human-powered computers to underdeveloped countries, are also allied with projects that distribute solar-powered lanterns and projects that develop safe water projects.
There is also the factor that solar technology is currently cheaper than expanding power grids into rural areas.
Well, if you want to go pedantic about typos, no, it wouldn't return anything because the file extension is .ogg.
True, however the point had very little to do with typos. The point was that it was claimed that seaching for files by date was first not possible using hierarchal file systems, and then not easy. My one-line example shows that not only is it possible with existing hierarchal file systems, but it is perhaps even easier than composing an sql query.
Do that in the root directory and it'll take ten minutes too, whether it returns anything or not, whereas this kind of db thing would be nigh instant (think locate, only with all the filters you can use with find).
Perhaps a bit of a stupid question here, why would the search be directed from the root directory? That is, why wouldn't the person conducting the search limit the search to a subset of the database, rather than search through every record of the disk? An advantage of a hierarchal file system is that it is hierarchal. It is easy to limit an operation to nesting subsets of criteria.
Now of course, you bring up a valid point in that from a user's point of view, find runs slowly. However, the problems with find running slowly have nothing to do with hierarchal file systems, and everything to do with file indexes. There is nothing inherent in the hierachal file system that makes fulltext indexes, date indexes, or keyword indexes difficult.
In this case, all you need is find.
Which, while it may work for some, entirely misses the real point, is that there's no *easy* way to do this.
What is hard about?:It actually seems to be easier given that the expression requires fewer keystrokes, and would return a correct result rather than an SQL error.
We've got GUI bindings for opening/saving/editing files, but nothing for harnessing the power of searching. Don't say it can't be done (because it can somehow) just say that it can't be done easily, which is the bigger point.
The need for GUI bindings for searching files is a very different thing from changing the underlying file system. By all means, yes, lets create easy tools that permit users to conduct full-text and metatag searches of file systems.
But, and here is the big issue, there is nothing inherent in heirachal file systems that make full-text searching or keyword searching difficult.
- maintenance overhead on part of the user to create hierachy and maintain it. Every time you save a file you have to think "where do I put this?"
However, for a relational database filesystem to work, every time you save a file, you have to think "what keywords do I want on this file."
The fallacy in this argument is in believing that something magic will happen that will encourage people who have problems using descriptive heirachal directory names will suddenly use descriptive keywords.
- Finding files can be hard. Is that letter about the planning application in Documents/Letters or Documents/Planning App?
There are a fair number of applications that prove that full-text searching is entirely compatible with traditional heirachal file systems. (grep, glimpse and swish-e come immediately to mind.)
- keeping files in two or more places at once is hard (as in the previous example). You can use softlinks, but that's hardly ideal and doesn't survive moving things around.
Hardlinks are even better for this purpose. However, these problems don't magically go away with relational database systems. You replace the problem of having a file live in two or more places with the problem of maintaining the file in two or more categories.
Why not shift the burden of organising the files onto the enormously powerful computer, rather than take up valuable human mental resources.
Any form of contextual indexing that can be done with files located in a relational database, can be done on files in heirachal file system.
you _can't_ do that sort of thing on a traditional filesystem.
Ahem:
find . -name *.oog -mtime -7
That's just for the bad SQL you posted (I imagine there must be a missing operator in the where clause between "files.date" and "NOW()".)
A better example of where indexing would be would be useful is a case where you want to index the content of files. For example "I want a list of all papers authored by me where I've referenced Doe, Doe, and Doe's seminal paper 'Mental Masturbation About the Need for Relational Database Filesystems on Slashdot'". But even then, this is a type of indexing that can be easily accomodated within a heirarchal database. (For two examples, glimpse and swish-e come immediately to mind.)
To answer your two questions:
some people mentioned here that they already organise their files. great. fantastic.
HOW LONG DID IT TAKE YOU?
A pretty trivial amount of time. Each new project starts with:A maximum of 1 minute, even with my gimpy hands. This is about the same amount of time, and perhaps much less, that I would have to spend adding the keyword "newproject" to every file that gets added to the project.
and how long would it take to reorganise?
Well, it depends on the reorganization. But for example, I reorganized one of my key project directories (version controlled in subversion) with.(Making two archive "tags" and starting a new branch for the "HEAD" tag.)
A nice thing about heirarchal file systems is that you can reorganize on multiple levels including collections of files. ("proposal" in my case includes about a half-dozen files.) I would argue that it would take me just as much time to recode those files using database keywords as renaming the directories they reside in.
A large part of the argument for relational database filesystems seems to be that the same types of people who are unwilling to do the work necessary to create good heirarchal file trees, will be willing to do the work to attach the metadata needed to replace heirarchal file trees. Switching from a heirachal to relational model is not going to change the GIGO rule, and the solutions around the GIGO problem (such as full-text searching, and journaling) don't depend on either model.
A heirachal model also provides some nice facilities for dealing with related collections of files. It is unclear no me how they would be implemented in a relational model, or how a relational model would deal with stardard filenames like "README", "CHANGELOG" and "index.html".
And just to add to this, most computer simulations work the same way, whether we are talking about thunderstorms, the collision that broke the moon away from the earth, or even crowds in CGI animated films. Abstraction can certainly be quite useful when what you are interested in are the forces that dominate the system at a specific scale.
The same way you simulate anything else. You simplify the problem down to a manageable number of particles that represent larger units of whatever you are simulating. Since in looks like they are interested in mass and gravity at the galactic supercluster scale, they can use particles that weigh much more than any individual star.
What is more likely:
1: The media does what it does every day in regards to astronomy, inflates every story that might possibly have something to do with space aliens out of proportion? (With the result that both the original claims and more nuanced followups become exaggerated.)
2: The MIB.
Never attribute to conspiracy what can be easily described as sheer incompetence.
The implications are largely academic for now. What seems to have been discovered is that there are some shortcuts in MD5 and SHA0 that make these algorithms less robust than previously expected. The existance of one "hole" in an encryption or hash algorithm suggests that there might be more waiting to be discovered. This might take 1 year or it might never happen.
So, right now the primary risk is that someday we wake up to find that enough holes have been discovered to compromise preimage resistance. If this happens, it might be easier (but probably still horrendously expensive) to perform certain kinds of attacks on digital signatures and password files.
Another risk is to find out that these algorithms have an undiscovered bias in their output. MD5 and SHA are sometimes used to generate keys for encryption algorithms. If a bias is discovered, then it might be a possible (but probably still horrendously expensive) attack on some encrypted data.
Again, we are talking about stuff that could happen tomorrow, or could never happen. The consensus I'm reading is that the odds of a worse break in MD5 being discovered have just gone up by a significant level. The odds are not big enough to justify another Y2K panic (partly because quite a bit of software has already made the transition). However, where possible it is prudent to pick SHA1 over MD5.
That's a strong and interesting point. I have to ask, though, did it really change our lifestyle more than the internet? In the space of 10 years, I've witnessed a very strong change in how we live our lives. (At least where I lived, I happily concede I don't have a strong global perspective, as such I'm open to reason.) Yes, the fight against disease is very important and has had an effect on our lives. But, from my limited views, it means less fear of getting sick. It seems pale in comparison to how much each individual networks to talk with others, and in how we get information. I mean, the very fact we're having this discussion on Slashdot. Who'd have expected that 10 years ago? Anybody outside a university?
I would argue that yes, advances in medicine over the 20th century is the innovation that has had the single biggest impact on our lives so far, and will be a driving factor behind politics and public policy that makes the stuff we talk about here on slashdot look trivial in comparison. I mean think about it:
50 years ago, retirement age, pensions and social security were based on assumptions about lifespan and productivity after the age of 65. Now we are looking at a population in America and Europe that can expect to live 20, 30, 40 years after quitting the work force.
The bad side is that Western countries are facing a crisis in their social support networks. People who currently collect pensions or government support make up a sizable voting block that will get even larger over the next 10 years.
The good side is that we have greater potential for tapping into human expertise and extended intellectual productivity.
More that just not being woried about getting sick, it means that children of the 20th century are much less likely to have seen a sibling or parent die than any generation in history. We are talking basic demographics here.
Actually, there is considerable evidence that the internet is an evolutionary change rather than a revolutionary one. The same dynamics of gender, ethnicity, class and power are played out in chatrooms and bulletin boards that were already played out in breakrooms and diners. I really don't consider myself more informed in 2004 than I was as a newspaper addict in 1984. In a large part, the internet just fills in a social networking gap that was created by car culture.
Television is the only thing I see on that list that could qualify with your statement. Everything else, though significant, is not in the same league. There are a LOT of people on this planet that if you were to send them back 100 years in the past, the net would be the thing they most ache for. (Unless they had polio :P)
I think what your are missing here is that the Internet is still primarily the playground of an elite part of the world population. The list certainly is flawed in many respects, air conditioning for example was developed in the 19th century. However, I would argue that widespread refrigeration has had a huge impact transforming the market of agriculture on a global scale.
The polio vaccine on its own is not very impressive, but when you consider the polio vaccine as part of a scientific war on disease, you come up with something that is quite a bit more transformative than the internet. Several of the deadliest killers of the last century are now either extinct in the wild or controlled leading to increased lifespans, decreased infant mortality and massive demographic shifts around the world.
The green revolution of the 50s, 60s and 70s fundamentally transformed how food is produced and distributed in ways that affect a majority of the people on this planet.
Think about it this way, how many people do you know who don't use the internet on a regular basis? Now how many people do you know that have never had an antibiotic, never had a vaccination, don't buy fresh tomatoes in Feburary, don't use products that were packaged in plastic (another invention more revolutionary than the internet to date), don't use anything that includes a transistor.
Certainly, I think that we do need intelligent searches, however, I think many of the arguments for those searches are dramatically overstated as is the contrast with existing filesystem models.
Why should I remember the filepath to a file (or in my case wich computer). That is not how I pick a book from the shelf is it. I don't need to remember the exactly title of the linux o'reilly guide. I can find it very fast by the general size and color even feel and the fact it is most likely near my desk.
Actually, I (and most people who have lots of books in my experience) do have some sort of a filing system. Glancing for look and feel is good when you have a few dozen books, but when you have a few hundred books, it becomes a bit unwieldy. Unfortunately, the current state of my life is that I can generally find a file I'm working on quite a bit quicker than a book I need to find a reference to.
The ultimate idea is for you to instantly be able to find what you want without having to remember weird filenames and paths.
You should not be using weird filenames and paths anyway now that most operating systems allow verbose filenames.
But it is going to very hard to do. All the databases I seen work on the principle: crap in crap out. The trick is not in creating a database file system. The trick is in writing code that can insert content into the database and get meaningfull info on it.
Which is one reason why I'm not convinced that more metadata will change much of anything. I'm also not convinced that this requires a new filesystem design.
Or leverage one of the existing indexing systems such as ht.dig or swish-e.
Collision avoidance is one, the other is preimage resistance (the difficulty creating an input to the function that produces a known output.)
Whoops, didn't describe that well.
It is easy to produce "33ab5639bfd8e7b95eb1d8d0b87781d4ffea4d5d" if you know that the input is "Hello world". What is still unknown is if there are shortcuts that permit us to (more) quickly find a solution to sha1(x) = "33ab5639..." This solution does not necessarily need to be "Hello world."
Multiple MD5 and one SHA0 collisions were confirmed at the Crypto 2004 conference in Santa Barbara. Perhaps more important is that these collisions demonstrated the feasibility of "shortcuts" to produce a collision. At this time, these are belived to be of little practical significance because they are still computationally expensive and affect only collision avoidance. There are two aspects to MD5 and SHA that are important. Collision avoidance is one, the other is preimage resistance (the difficulty creating an input to the function that produces a known output.) However, it is quite possible that these breaks can be expanded into even larger breaks, including preimage cracking.
While not encryption, MD5 and SHA are used in a variety of ways that are important to encryption. For example PGP and GPG use hash algorithms and salt to convert plantext passphrases into pseudo-random encryption keys. So one possible threat is finding that MD5 and SHA are biased enough to make an attack feasible. It does not matter if blowfish uses 128 bit encryption if the function used to generate the key is significantly biased. Big huge "if."
As someone else pointed out, MD5 is used to encrypt passwords in some password files. If someone expands the shortcut to defeat preimage resistance, it might be easier to find a working passphrase from a password file. Again, this is a big "if."
So the one article is blowing things out of proportion. These are not the kind of breaks that would lead to a practical attack yet. The collisions were created using generated plaintexts so it is not likely that someone can slip a trojan into source code in such a way as to produce the same hash string.
The alternative suggested by the article, LaTeX, is undoubtedly not to everyone's taste either, but at least if you read the article, you will understand the deeper reason Word is frustrating.
Ohh, how I miss LaTeX. I did my qualifying exams and a couple of hefty proposals using LaTeX and they were some of the easiest writing I ever had to do. The primary reason why I don't use LaTeX has to do with the research culture I'm embedded in:
1: APA format style pretty much demands that you pay attention to presentation and content simultaneously. It drives me up the wall how bloody difficult it is to abstract out APA bibliographic citation style.
2: Too many publication venues prefer Word. There are LaTeX -> Word production paths. However none of those production paths accomodate the complex processing required by LaTeX APA style sheets.