"Address book" is a misnomer - what this is based on is email addresses you have sent email to (or specifically imported into LOAF). The app monitors your outbound mail (through a sendmail wrapper, for example) and adds all new addresses to its list of seen recipients.
Right now there is a reference implementation for Pine/procmail, we are hoping for help with implementations for Outlook, Mail.app, and other clients.
No, you're right on the money. In fact this is the approach the authors (self included) do take - the email of the owner is hashed in as one of the keys, to prevent filter theft.
First, it wasn't a different foam material, it was a blowing agent. HCFC instead of CFC, to help the ozone layer.
Second, the foam that actually broke off was hand-applied, and the report clearly states that hand-applied foam continued to use the old blowing agent.
But don't let that stop you from drawing your insightful conclusions.
This reviewer can't even write like a human being; it's all consultant-speak and management jargon. The review is a value-subtracted blather center.
I'm sure the topic of how to build computing into a business is timely, and interesting, but it doesn't sound like this book or this reviewer have anything useful to say. Otherwise they would have used English.
Shame on Slashdot for giving this consultant a front-page post to pad his resume with.
Question: When do you intend to release specifics on violations?
Answer: A lot of companies have been coming back to us asking "what is going on here? Am I running LInux that is in fact compromised...who do I talk to?"
We have told people to get advice of counsel... We have invited people who may be running Linux and are concerned to come in... under NDA we will be glad to go through the things that we have found.
In the Novell case, they called last week and were concerned about the letter... we set up a time to meet with them at 11AM yesterday... they didn't show up and later on sent out a letter saying that SCO would not meet with them, demanding that we step up and show them the goods.
I'm listening to the conference call right now - the CEO just said there are 244 people listening in, up from 7 two quarters ago. I wonder why that is?;-)
I had no trouble getting in, by identifying myself as an individual investor.
He claims "not just a line or two of code problems [in Linux], but significant code problems."
"Millions of lines of code showing up without anyone warrantying [sic] where the lines of code came from"
Its mission has been and always will be to service the ISS.
Meanwhile, the mission of the ISS is to keep the shuttle program alive by requiring constant care and feeding.
The genius of NASA is to create a self-contained system of mutually dependent white elephants, and then justify them all in terms of one another.
It reminds me of a job I had as a rookie programmer, where I wrote an unmaintainable, overfeatured and useless program, left the company, and then racked up many consulting hours patching my bastard creation.
Luckily the CEO was smarter than Congress, and saw the wisdom of axing both me and the software. I wish he'd get put in charge of NASA.
Re:Heat - energy
on
Mastering Light
·
· Score: 4, Informative
Uh, what exactly is a 'heat wave'?
Heat comes in two flavors - radiated light waves and random molecular motion. The second kind is irrelevant to this discussion. As far as the first kind goes, you can't magically make that radiated light have more energy by converting it up to a higher frequency.
The laws of conversation of energy and thermodynamics would like to have a little word with you out back...
I hadn't heard of either technique - if you have some links, I'd love to look at them.
You're completely right about web-size collections being too big to hold in RAM. It's a question of defining the problem space. My article was targeted for the hobbyist, who is far more likely to be building a search engine for a website (thousands of pages) than for the entire web.
Presumably, if you're trying to index the web, you won't be writing your search code in Perl;-)
That's not to say you can't scale vector-space search engines. But it makes things complex. There is a lot of handwaving in the article, to try and keep things simple. One example, as you point out, is neglecting to mention practical ways of reducing vector size (number of keywords). That problem goes away if you use latent semantic analysis, which reduces the dimensionality of the vectors by several orders of magnitude. And as a nice side effect, you get a search engine with better recall, that can give you relevant results without a keyword match.
But LSI has its own scaling issues, and those scaling issues have their own workarounds, and so on and so on. Put it all in an introductory article, and it gets overwhelming.
Still, I think there's room for novel algorithmic approaches in the 100-100K document range. To give you one example, we run an LSI search engine on a database of old Civil War articles for the University of Virginia. There's no way the search could scale to ten million documents, but it doesn't have to. The collection is bounded.
I suspect other people have interesting collections in that kind of size range. Not everything has to scale to the billions of documents to be useful.
Since when is linear scaling abysmal? Keep in mind that for very large collections, you can use clustering or other techniques to reduce the number of comparisons. You essentially layer an indexing step on top of things, so that you only search the most promising vectors.
Keep in mind also that vector comparisons are very, very, very fast. People have run vector space search engines on collections in the millions without running into performance issues. And the claim that vector search is faster than a database search is true, for as long as you can fit all the document vectors into RAM.
'Breaking' is a term of art in cryptography. It means finding a solution that requires less time than a brute-force search. Even a 1% improvement qualifies as a 'break', although it might not have any practical value.
It's a valid distinction to make, since a flawed algorithm may be unsafe at any key length.
Search engines tend not to make good standalone apps / components. They are very intimately tied to the data and the indexing scheme you use, especially if they are large / customizable. Looking for a standalone search engine is like looking for a standalone GUI - they exist, but more commonly are developed as part of a larger project.
The main risk I have seen in using closed-source search engines is that your data gets trapped in a format or taxonomy that you can never alter. For example you might find you need Unicode support, or phrase matching, or support for PDFs, etc. etc. etc. At least with an open source program you have some options.
For me, the main advantage of PHP is precisely that it has a subset of the features Perl offers. For writing web applications, it is often very handy to have a simpler syntax and fewer builtins - it makes it easier to manage code, removes the temptation to obfuscate, and makes it a lot easier to teach others how to code the web app ( a consideration for me, working with lots of student assistants). For most web applications, you need to basically do just the following:
Keep track of users and sessions
Print the results of database queries
Update a database
Use conditional logic to select what kind of HTML to display
For that stuff, Perl is overfeatured. Using PHP is simpler, you can embed it in your HTML with the syntax, and it makes for faster development. For serious text processing, you can always exec perl programs, or pipe output/input to Perl daemons.
All that being said, mod_perl is wonderful, too, especially if you know the language already, or have a really complex web app.
'Cyc' means 'tit' in Polish. For that matter, CIPA ( which stands for the Children's Internet Protection Act, I think ) means 'cunt'. It's probably a good idea to make sure your project name passes the laugh test with the major language families before you pour millions into it.
This was a lesson bitterly learned by the Warsaw weekly 'FART' back in the early 90's. Fart means stroke of luck in that language, but their luck ran out pretty fast.
Not to mention the marketing team behind the Chevy Nova ['won't go'], Latin American division.
Re:complexity of supercomputers approaching brain
on
Arguing A.I.
·
· Score: 3, Funny
So the brain runs at 20 Hz, huh?
Talk about an overclocking challenge! Put your ice hat on and think as hard as you can.
This is a great factoid to throw at those who still insist on fetishizing clock speed - AMD take heart!
The difference in weight would be on the same order of magnitude as the difference in weight between an empty and full hard drive. Of course, that itself depends on the ratio of ones to zeroes. The topic is ripe for further speculation.
The whole reason Windows is a pain for expert users to use is that it has been designed with novice users in mind. If I go into "Program Files", for example, I have to click on a special link to even see the contents of the directory. This isn't because Microsoft is clueless, but because so many novices have hosed their critical folders that this 'feature' is necessary.
If you look at any interface designed to maximize ease of use for expert users ( for example, air traffic control radar, or pilot controls in a large airliner ) you will see that it is very spartan, cryptic, and complex-looking. But it is perfect for conveying information once you learn how to use it.
Even plain old writing systems illustrate the point -- hieroglyphics ( with pictures of recognizable objects ) may be easier on the newbie reader, but every culture rapidly standardizes on a complex abstract writing system to favor the advanced reader.
I say let's not obsess so much about making Linux comfortable for everyone, and instead make it so good that it's irresistible to anyone using computers for non-trivial work.
Anybody else find it ironic that the government site about disabilities should present text as a GIF, effectively insuring that it can't be indexed or read by the visually impaired?
What are the weakest parts of Linux?
on
IBM Wants Linux
·
· Score: 2, Interesting
All I know about Unix-flavored systems comes through Linux. Could someone post a short list of the areas where Linux is most deficient compared to Unices like AIX?
I know that real-time applications are one issue, as well as multi-processor performance. But how much work has to be done, and what are the prospects?
Thanks in advance for not flaming the newbie.:-)
Unpredictability in complex systems
on
Mob Software
·
· Score: 5, Interesting
Someone here wondered a few days ago how an Internet worm might succeed if it were able to mutate and evolve in a Darwinian context. This writer is wondering what software will be like when all code can evolve, and interact, and what that emergent behavior might be like.
It is both a wonderful and frightening thought -- the Internet may already be sufficiently complex for self-replicating, self-modifying code to survive in the wild - and if it isn't, it won't be too long before that becomes possible.
We are all busy wondering if Microsoft and.NET will become a monoculture on the internet -- it would be quite a surprise to suddenly find little XML packets flying around in a language nobody could understand, the fruit of some bright hacker who releases a clever little self replicator, evolving at five generations an hour.
How long would it take for Darwinian code to evolve to the point where we couldn't eradicate it?
I think the biological model is worth paying attention to. A plague wipes out cathedral builders and bazaar merchants alike.
I think this brings up an important ethical question for anyone designing public forums (fora?) on the web -- if you allow anonymous postings, you must make it clear to users if you save any item of information that could lead to disclosure of their identity -- IP address, referer, username, etc.
Until there are enough of these encouraging court cases to set an iron-clad precendent, people must be told if information about their identity is going to get stored with an 'anonymous' post.
Of course, the truly paranoid (hello, slashdot readers!) already know to go through anonymizing services to prevent this kind of backtracing. But average users will appreciate knowing whether or not it is even possible to reconstruct their identity from saved information about an anonymous post.
Maybe it would even be possible to sue a site that claimed full anonymity for deceptive practices if they saved an IP address, etc.
People keep talking about Dmitri being held under awful conditions -- but does anyone know what his actual situation is? As far as I remember, he is out on bail -- where is he staying, who is with him, can his family get a visa to come visit him?
I'm sure a lot of us would be willing to pitch in with some donations to help out. Does anyone have the scoop? Anyone out there actually meet with Sklyarov? I will be happy to translate Russian->English if necessary.
Many posters have gone on and on about the brutal imprisonment Dmitri is being subjected to -- it would be nice to replace some of that conjecture with actual facts.
The owner's email is hashed in with each of the keys. Go read the about page if that doesn't make sense to you.
"Address book" is a misnomer - what this is based on is email addresses you have sent email to (or specifically imported into LOAF). The app monitors your outbound mail (through a sendmail wrapper, for example) and adds all new addresses to its list of seen recipients.
Right now there is a reference implementation for Pine/procmail, we are hoping for help with implementations for Outlook, Mail.app, and other clients.
No, you're right on the money. In fact this is the approach the authors (self included) do take - the email of the owner is hashed in as one of the keys, to prevent filter theft.
RTFR
First, it wasn't a different foam material, it was a blowing agent. HCFC instead of CFC, to help the ozone layer.
Second, the foam that actually broke off was hand-applied, and the report clearly states that hand-applied foam continued to use the old blowing agent.
But don't let that stop you from drawing your insightful conclusions.
This reviewer can't even write like a human being; it's all consultant-speak and management jargon. The review is a value-subtracted blather center.
I'm sure the topic of how to build computing into a business is timely, and interesting, but it doesn't sound like this book or this reviewer have anything useful to say. Otherwise they would have used English.
Shame on Slashdot for giving this consultant a front-page post to pad his resume with.
Question: When do you intend to release specifics on violations?
Answer: A lot of companies have been coming back to us asking "what is going on here? Am I running LInux that is in fact compromised...who do I talk to?"
We have told people to get advice of counsel... We have invited people who may be running Linux and are concerned to come in... under NDA we will be glad to go through the things that we have found.
In the Novell case, they called last week and were concerned about the letter... we set up a time to meet with them at 11AM yesterday... they didn't show up and later on sent out a letter saying that SCO would not meet with them, demanding that we step up and show them the goods.
I'm listening to the conference call right now - the CEO just said there are 244 people listening in, up from 7 two quarters ago. I wonder why that is? ;-)
I had no trouble getting in, by identifying myself as an individual investor.
He claims "not just a line or two of code problems [in Linux], but significant code problems."
"Millions of lines of code showing up without anyone warrantying [sic] where the lines of code came from"
No mention of Novell yet.
Its mission has been and always will be to service the ISS.
Meanwhile, the mission of the ISS is to keep the shuttle program alive by requiring constant care and feeding.
The genius of NASA is to create a self-contained system of mutually dependent white elephants, and then justify them all in terms of one another.
It reminds me of a job I had as a rookie programmer, where I wrote an unmaintainable, overfeatured and useless program, left the company, and then racked up many consulting hours patching my bastard creation.
Luckily the CEO was smarter than Congress, and saw the wisdom of axing both me and the software. I wish he'd get put in charge of NASA.
Uh, what exactly is a 'heat wave'?
Heat comes in two flavors - radiated light waves and random molecular motion. The second kind is irrelevant to this discussion. As far as the first kind goes, you can't magically make that radiated light have more energy by converting it up to a higher frequency.
The laws of conversation of energy and thermodynamics would like to have a little word with you out back...
I hadn't heard of either technique - if you have some links, I'd love to look at them.
;-)
You're completely right about web-size collections being too big to hold in RAM. It's a question of defining the problem space. My article was targeted for the hobbyist, who is far more likely to be building a search engine for a website (thousands of pages) than for the entire web.
Presumably, if you're trying to index the web, you won't be writing your search code in Perl
That's not to say you can't scale vector-space search engines. But it makes things complex. There is a lot of handwaving in the article, to try and keep things simple. One example, as you point out, is neglecting to mention practical ways of reducing vector size (number of keywords). That problem goes away if you use latent semantic analysis, which reduces the dimensionality of the vectors by several orders of magnitude. And as a nice side effect, you get a search engine with better recall, that can give you relevant results without a keyword match.
But LSI has its own scaling issues, and those scaling issues have their own workarounds, and so on and so on. Put it all in an introductory article, and it gets overwhelming.
Still, I think there's room for novel algorithmic approaches in the 100-100K document range. To give you one example, we run an LSI search engine on a database of old Civil War articles for the University of Virginia. There's no way the search could scale to ten million documents, but it doesn't have to. The collection is bounded.
I suspect other people have interesting collections in that kind of size range. Not everything has to scale to the billions of documents to be useful.
[disclaimer: I wrote the article]
Since when is linear scaling abysmal? Keep in mind that for very large collections, you can use clustering or other techniques to reduce the number of comparisons. You essentially layer an indexing step on top of things, so that you only search the most promising vectors.
Keep in mind also that vector comparisons are very, very, very fast. People have run vector space search engines on collections in the millions without running into performance issues. And the claim that vector search is faster than a database search is true, for as long as you can fit all the document vectors into RAM.
Note that the correct spelling is Stanislaw Lem ( Not "Stansilaw" ). That's pronounced stah-KNEE-swaf, for the curious.
'Breaking' is a term of art in cryptography. It means finding a solution that requires less time than a brute-force search. Even a 1% improvement qualifies as a 'break', although it might not have any practical value.
It's a valid distinction to make, since a flawed algorithm may be unsafe at any key length.
Search engines tend not to make good standalone apps / components. They are very intimately tied to the data and the indexing scheme you use, especially if they are large / customizable. Looking for a standalone search engine is like looking for a standalone GUI - they exist, but more commonly are developed as part of a larger project.
The main risk I have seen in using closed-source search engines is that your data gets trapped in a format or taxonomy that you can never alter. For example you might find you need Unicode support, or phrase matching, or support for PDFs, etc. etc. etc. At least with an open source program you have some options.
I like to think of the Wayback Machine as my personal backup server.
I just put all my most vital files in a web folder, and their crawlers take care of the rest.
And for encryption? Two words, baby:
ROT-13
For that stuff, Perl is overfeatured. Using PHP is simpler, you can embed it in your HTML with the syntax, and it makes for faster development.
For serious text processing, you can always exec perl programs, or pipe output/input to Perl daemons.
All that being said, mod_perl is wonderful, too, especially if you know the language already, or have a really complex web app.
Right tool, right job, etc.
'Cyc' means 'tit' in Polish. For that matter, CIPA ( which stands for the Children's Internet Protection Act, I think ) means 'cunt'. It's probably a good idea to make sure your project name passes the laugh test with the major language families before you pour millions into it.
This was a lesson bitterly learned by the Warsaw weekly 'FART' back in the early 90's. Fart means stroke of luck in that language, but their luck ran out pretty fast.
Not to mention the marketing team behind the Chevy Nova ['won't go'], Latin American division.
So the brain runs at 20 Hz, huh?
Talk about an overclocking challenge! Put your ice hat on and think as hard as you can.
This is a great factoid to throw at those who still insist on fetishizing clock speed - AMD take heart!
The difference in weight would be on the same order of magnitude as the difference in weight between an empty and full hard drive. Of course, that itself depends on the ratio of ones to zeroes. The topic is ripe for further speculation.
The whole reason Windows is a pain for expert users to use is that it has been designed with novice users in mind. If I go into "Program Files", for example, I have to click on a special link to even see the contents of the directory. This isn't because Microsoft is clueless, but because so many novices have hosed their critical folders that this 'feature' is necessary.
If you look at any interface designed to maximize ease of use for expert users ( for example, air traffic control radar, or pilot controls in a large airliner ) you will see that it is very spartan, cryptic, and complex-looking. But it is perfect for conveying information once you learn how to use it.
Even plain old writing systems illustrate the point -- hieroglyphics ( with pictures of recognizable objects ) may be easier on the newbie reader, but every culture rapidly standardizes on a complex abstract writing system to favor the advanced reader.
I say let's not obsess so much about making Linux comfortable for everyone, and instead make it so good that it's irresistible to anyone using computers for non-trivial work.
Anybody else find it ironic that the government site about disabilities should present text as a GIF , effectively insuring that it can't be indexed or read by the visually impaired?
All I know about Unix-flavored systems comes through Linux. Could someone post a short list of the areas where Linux is most deficient compared to Unices like AIX?
:-)
I know that real-time applications are one issue, as well as multi-processor performance. But how much work has to be done, and what are the prospects?
Thanks in advance for not flaming the newbie.
Someone here wondered a few days ago how an Internet worm might succeed if it were able to mutate and evolve in a Darwinian context. This writer is wondering what software will be like when all code can evolve, and interact, and what that emergent behavior might be like.
.NET will become a monoculture on the internet -- it would be quite a surprise to suddenly find little XML packets flying around in a language nobody could understand, the fruit of some bright hacker who releases a clever little self replicator, evolving at five generations an hour.
It is both a wonderful and frightening thought -- the Internet may already be sufficiently complex for self-replicating, self-modifying code to survive in the wild - and if it isn't, it won't be too long before that becomes possible.
We are all busy wondering if Microsoft and
How long would it take for Darwinian code to evolve to the point where we couldn't eradicate it?
I think the biological model is worth paying attention to. A plague wipes out cathedral builders and bazaar merchants alike.
I think this brings up an important ethical question for anyone designing public forums (fora?) on the web -- if you allow anonymous postings, you must make it clear to users if you save any item of information that could lead to disclosure of their identity -- IP address, referer, username, etc.
Until there are enough of these encouraging court cases to set an iron-clad precendent, people must be told if information about their identity is going to get stored with an 'anonymous' post.
Of course, the truly paranoid (hello, slashdot readers!) already know to go through anonymizing services to prevent this kind of backtracing. But average users will appreciate knowing whether or not it is even possible to reconstruct their identity from saved information about an anonymous post.
Maybe it would even be possible to sue a site that claimed full anonymity for deceptive practices if they saved an IP address, etc.
People keep talking about Dmitri being held under awful conditions -- but does anyone know what his actual situation is?
As far as I remember, he is out on bail -- where is he staying, who is with him, can his family get a visa to come visit him?
I'm sure a lot of us would be willing to pitch in with some donations to help out. Does anyone have the scoop? Anyone out there actually meet with Sklyarov? I will be happy to translate Russian->English if necessary.
Many posters have gone on and on about the brutal imprisonment Dmitri is being subjected to -- it would be nice to replace some of that conjecture with actual facts.