Domain: nec.com
Stories and comments across the archive that link to nec.com.
Comments · 437
-
Model for online academic publishing : CiteseerCheck out CiteSeer to get an idea of what this means for online journals.
CiteSeer combines paper library and citation database allowing one to, for example, search for free text search, find appropriate papers and, for example, rank them on the number of citations those papers have received (and in what context).
The full text is often available especially for computer science papers and the paper downloadable in pdf, ps etc formates.
You get to not only see all the papers but critically see the links between them which is somehting that is non-intuitive and very time-consuming on paper journals.
There is also a nascent comment on this paper fuction
It is very cool.
James
-
The IEEE doesn't do this
In the EE and CE fields, the main journal publisher is the IEEE. I'm pretty sure that they do not claim exclusive rights to publish research, or if they do, everyone in the field ignores it. The result is that many researchers (myself included) post all their publications on their web pages, which increases access and exposure to important works in the field. Furthermore, it permits free document search engines based on these web posts, such as Citeseer. The net result is, if you take any IEEE publication, and type the title in your favourite search engine, you will find a link to a free copy of the paper about 30-40% of the time (and this proportion is growing). I can't imagine a scientific discipline in which this wasn't done.
-
Re:Why is thisThere was a good essay about this in the May issue of Communications of the ACM: http://www.acm.org/pubs/articles/journals/cacm/20
0 1-44-5/p25-apt/p25-apt.pdf Ironically, you must be a member to view it, but kudos to the CACM for publishing it.Among the interesting bits was this:
Instead of inventing some economic models, it is much better to rely on public information provided by those who run successful FSP journals. In [4] Minton and Wellman provide a detailed analysis of the economic matters involved in the production of JAIR based on their five years of experience. They find that "the only significant cost involved in pro-ducing the journal is the cost of the review and editing process. Thus, the universities and research labs that employ JAIR's editors effectively subsidize the journal by supporting this work." In turn, Louis, Schneider, and Rehmann [3] published an account of the costs of running Documenta Mathe-matica based on their four years of experience. In their detailed analysis they reach a revealing amount of $210 per year ("including hidden costs"). So, not surprisingly, Rehmann wrote to me: "Our journal was never sponsored by anybody. Needless to say, the journal is hosted on my PC, which I have anyway."
Also, for those who haven't seen it, take a look at ResearchIndex (CiteSeer), should definitely visit it. See http://citeseer.nj.nec.com/cs It's really wonderful to be able to see the citations in context--for a good example, see http://citeseer.nj.nec.com/abiteboul97lorel.html-3. Louis, A.K., Schneider, P. and Rehmann, U. Documenta Mathementa (Aug. 31, 1999); www.mathematik.uni-bielefeld.de/~rehmann/bericht-
e ng. html.4. Minton S. and Wellman, M.P. JAIR at five. AI Magazine, 20, 3 (Sum-mer 1999); www.isi.edu/sims/minton/papers/jairfive.pdf.
- and browse the documents that are available on the web. Right now it's just computer science, but hopefully all research papers will eventually be on the web. -
Re:Why is thisThere was a good essay about this in the May issue of Communications of the ACM: http://www.acm.org/pubs/articles/journals/cacm/20
0 1-44-5/p25-apt/p25-apt.pdf Ironically, you must be a member to view it, but kudos to the CACM for publishing it.Among the interesting bits was this:
Instead of inventing some economic models, it is much better to rely on public information provided by those who run successful FSP journals. In [4] Minton and Wellman provide a detailed analysis of the economic matters involved in the production of JAIR based on their five years of experience. They find that "the only significant cost involved in pro-ducing the journal is the cost of the review and editing process. Thus, the universities and research labs that employ JAIR's editors effectively subsidize the journal by supporting this work." In turn, Louis, Schneider, and Rehmann [3] published an account of the costs of running Documenta Mathe-matica based on their four years of experience. In their detailed analysis they reach a revealing amount of $210 per year ("including hidden costs"). So, not surprisingly, Rehmann wrote to me: "Our journal was never sponsored by anybody. Needless to say, the journal is hosted on my PC, which I have anyway."
Also, for those who haven't seen it, take a look at ResearchIndex (CiteSeer), should definitely visit it. See http://citeseer.nj.nec.com/cs It's really wonderful to be able to see the citations in context--for a good example, see http://citeseer.nj.nec.com/abiteboul97lorel.html-3. Louis, A.K., Schneider, P. and Rehmann, U. Documenta Mathementa (Aug. 31, 1999); www.mathematik.uni-bielefeld.de/~rehmann/bericht-
e ng. html.4. Minton S. and Wellman, M.P. JAIR at five. AI Magazine, 20, 3 (Sum-mer 1999); www.isi.edu/sims/minton/papers/jairfive.pdf.
- and browse the documents that are available on the web. Right now it's just computer science, but hopefully all research papers will eventually be on the web. -
Online Example that WORKS
There is an existing version of this which is used extensively by the Computer Science field - CiteSeer
This site indexes hundreds of thousands of papers, journal articles, and other publications (dissertations mostly). Each is cross-referenced and linked to the others through a widearray of pattern matching steps (sentance similarity for example)
Imagine if this was extended to all other fields of research - it would probably move the fields forward by leaps just by making it much easier for researchers to locate and reference related work by their peers.
Shannon
-
Been there, done that.See my paper (with Scott Johnson), Practical Program Verification, in POPL '83.
It's quite possible to use formal methods, and tools that check them, to produce bug-free programs. But the formalism is too much for most programmers. That's the real problem. Using a verifier means writing code that you can explain in a formal notation. This is a huge pain. It is, though, quite straightforward to check for low-level problems like possible numeric overflow and subscript errors by formal methods. More recent work has extended this to race conditions and deadlocks.
Back when we were working on this, a major problem was that it took 45 minutes on a 1 MIPS VAX 11/780 to verify about 1000 lines of code. Today, that would take 2 seconds. You can be too early with a technology.
Incidentally, undecidability isn't a problem. It's possible to construct programs whose halting is formally undecidable, but they're not very useful.
Although program verification hasn't done much for software, formal techniques are widely used in IC design, where people are serious about engineering and stuff is expected to work.
-
Re:I can tell you why *I* am not using Ruby.It's not that I don't understand it. I do completely understand the idea of an iterator (with a name like that, who couldn't?) What I don't understand is where the syntax came from and why the syntax is necessary.
Let's say I want to iterate over a list in Python. I'd "map(a_function, a_list)". "map" applies a_function to each element of a_list, creating a new list as it goes along. It doesn't entail side effects; the original list isn't modified. You'll find a function similar to this in nearly every language; any implementation of lists in any language should provide such a facility (and Ruby provides that facility via its iterators.)
I prefer the Python/Lisp/O'Caml way. I, like more intelligent people who have gone before me, see iterators in object-oriented languages as a sign of weakness. Ruby's iterators exist to work around the fact that functions aren't first class objects in Ruby. They exist because a programmer can't write higher order functions in Ruby.
As beautiful as "pure" languages like Java and Haskell and Ruby are, they'll always lose in practice to languages that implement multiple programming paradigms well, like O'Caml, and Lisp, and Python. Programmers don't need to be contrained by their language, restricted to programming in a particular way or paradigm because that's the way the author/designer of the language decided it should be. In O'Caml or Python or Lisp, they're not. In Ruby, iterators are a perfect example of how they are.
Jeremy
-- -
Re:LindaI'm a little bit confused. Well first of all most of the pages linked to from that Linda page come back with 404.
The page is old (Linda was done in the 1980's), but the links to the papers and introduction still work. Linda is a well known and pretty widely used approach. JavaSpaces is loosely based on it. (The guy who invented it has the unfortunate distinction of being one of the Unabomber's victims; fortunately, he survived.) For more academic references, look here.
I'm also a bit unclear what you mean regarding asynchronous components. If I read you correctly, COM+ already supports this by way of queued components. It's simply implemented on top of message queues, which is a very good mechanism for asynchronous communication.
Sure, and you can similar things in CORBA--it's all a small matter of programming. But both CORBA and COM are very complex ways of achieving this.
What distinguishes Linda is that it is very simple, it makes asynchronous communications the default, it greatly reduces coupling among software components, and it gives you a bit of database functionality as well. I think it is much better suited to the needs of desktop applications and communications among desktop components than CORBA or COM.
-
Re:Prove?
...the task of proving optimality seems impossible. No, I take that back, it is impossibleActually, you're wrong. It is possible to prove in some cases that a given algorithm is optimal, and that a given implementation is optimal. You can discover this from a mathematically-based analysis of optimality, so that you can't use better hardware or loop unrolling, etc., to get a better solution.
Optimality is multi-faceted, since optimal solutions may exist for time constraints, space constraints, or both. Optimality is tied to computational complexity theory, since for some problem domains you can show that any solution must have a given complexity bounds, especially a theta-bound since that is the tightest bounds of a problem. Then you can show for a given machine set (abstracted assembly), with certain types of operations, that an implementation of the algorithm is in some sense optimal.
For some examples of proven optimal solutions to computational problems, see the following: static dictionary membership queries; generating minimal perfect hash functions; monte carlo estimation; scanning spanning trees of undirected graphs; hyperplane depth; simultaneous buffer and wire sizing (with implementation); maximum independent set of a circular-arc graph. The list could go on, but this gives you some idea of the breadth of solutions available. Sometimes, as with the halting problem, you can show that no solution exists, optimal or not.
As a final note, however, optimal solutions exist in algorithmics for well-defined computational problems. It is an entirely different thing to solve most real-world problems, where many non-theoretical issues enter the fray, such as how quickly can the code be written, is the design easily understood, how well can your code be maintained, does it do what the user or customer wants, does it have a good human interface, etc. These issues, rather than theoretics, dominate most of the actually programming that goes on in the world (despite what your professors may have taught you!
;-). In that sense, the language/optimality/editor/UI/paradigm flame wars will still go on for as long as people are using computers.But it is still fun, on the rare occasion, to point out to your boss that your implementation of merging accounting transactions is theoretically optimal. Not that they really care, they just want it done by Friday.
;-) -
Re:Prove?
...the task of proving optimality seems impossible. No, I take that back, it is impossibleActually, you're wrong. It is possible to prove in some cases that a given algorithm is optimal, and that a given implementation is optimal. You can discover this from a mathematically-based analysis of optimality, so that you can't use better hardware or loop unrolling, etc., to get a better solution.
Optimality is multi-faceted, since optimal solutions may exist for time constraints, space constraints, or both. Optimality is tied to computational complexity theory, since for some problem domains you can show that any solution must have a given complexity bounds, especially a theta-bound since that is the tightest bounds of a problem. Then you can show for a given machine set (abstracted assembly), with certain types of operations, that an implementation of the algorithm is in some sense optimal.
For some examples of proven optimal solutions to computational problems, see the following: static dictionary membership queries; generating minimal perfect hash functions; monte carlo estimation; scanning spanning trees of undirected graphs; hyperplane depth; simultaneous buffer and wire sizing (with implementation); maximum independent set of a circular-arc graph. The list could go on, but this gives you some idea of the breadth of solutions available. Sometimes, as with the halting problem, you can show that no solution exists, optimal or not.
As a final note, however, optimal solutions exist in algorithmics for well-defined computational problems. It is an entirely different thing to solve most real-world problems, where many non-theoretical issues enter the fray, such as how quickly can the code be written, is the design easily understood, how well can your code be maintained, does it do what the user or customer wants, does it have a good human interface, etc. These issues, rather than theoretics, dominate most of the actually programming that goes on in the world (despite what your professors may have taught you!
;-). In that sense, the language/optimality/editor/UI/paradigm flame wars will still go on for as long as people are using computers.But it is still fun, on the rare occasion, to point out to your boss that your implementation of merging accounting transactions is theoretically optimal. Not that they really care, they just want it done by Friday.
;-) -
Re:[ot]Google's data structure?
That's true if the data is changing. However most search engines do web crawls in large chunks, and index the data once in one large block. Under such conditions dynamic management of hit lists and other data structures is not necessary. Basically, the bytes are packed as tight as they can get them so that it all fits into memory.
As far as I can tell from their paper, Google manages its web crawls the same way. It partitions the data into "barrels" and indexes each separately. Once the indices are built, they aren't updated. They also extend the hit lists to include word position and some other attributes for each hit. -
Re:isn't Google always getting itself in the news?There is nothing technological that Google is doing that isn't done by other engines (Excite, Hotbot).
Really. Google uses a patented ranking algorithm, described by Page and Brin (Stanford graduate students which founded Google) in a paper titled The PageRank Citation Ranking: Bringing Order to the Web (1998) . The algorithm does very well at recognizing relevant documents. Last I looked, other search engines used mostly sets of hand-tuned hacks which did not do as well. Has this changed? I'd appreciate some references, refereed if possible.
~
-
Here is the real google info...Here at Slashdot it seems like people only can complain about a service. Most of the posts are rants without understanding of the dynamics below them.
I think we all could use more understanding of the topic. A link to the paper that started it all here.
1. When was the last time that "to" or any other preposition helped the average query. Your Grandmother does not know that this word is meaningles 99.9% of the time, so google ties to improve their relevancy.
2. Google has not sold out. Their ads are the most simple in the industry. They give access to users like you and me at reasonable rate. Who wants to wait for 345x123 pixel banner ads anyways.
3. Have you noticed the spelling feature? Google will correct your spelling. This is a function of the tons of bigrams that they have stored.
4. Here is a link to more papers [Warning: Technical] here.
-
Here is the real google info...Here at Slashdot it seems like people only can complain about a service. Most of the posts are rants without understanding of the dynamics below them.
I think we all could use more understanding of the topic. A link to the paper that started it all here.
1. When was the last time that "to" or any other preposition helped the average query. Your Grandmother does not know that this word is meaningles 99.9% of the time, so google ties to improve their relevancy.
2. Google has not sold out. Their ads are the most simple in the industry. They give access to users like you and me at reasonable rate. Who wants to wait for 345x123 pixel banner ads anyways.
3. Have you noticed the spelling feature? Google will correct your spelling. This is a function of the tons of bigrams that they have stored.
4. Here is a link to more papers [Warning: Technical] here.
-
CiteseerAnother impressive site worth looking at is Citeseer (now the more prosaic 'NEC ResearchIndex'), which has saved me a number of trips to the library for papers on Machine Learning, Artificial Intelligence, Data Compression etc.
Citeseer spiders the web looking for postscript and PDF scientific papers, cacheing whatever it finds, and cross-referencing all the references -- so that even if it doesn't have a particular paper, you can still read what was said about it when other authors cited it. Citeseer also offers documents which it thinks are related because of a similar citation history, and an 'active bibliography' of the most cited documents which cited this one. And wherever it has found a document, its cached copy is available for download.
For a typical Citeseer report page see eg http://citeseer.nj.nec.com/heckerman96tutorial.ht
m l.I have been very impressed by how much value such a system can add, compared to just the dead trees.
-
If anyones curious about what ZK means..
..be sure to read Blum, De Santis, Micali, and Persiano's Non-Interactive Zero Knowledge research index.
-
Re:"Shared Source" and the NT native API?
I'm only replying to this one to correct a blatant falsehood, to get the correction into the slashdot record.
Anonymous Coward wrote: Executive Software write part of the Windows NT/2000/XP OS, so naturally they use the NT APIs for certain things below the Win32 level, but for applications developers, there's no advantage to using the native NT API.
It's just stupid to say that. Emulation layers like Win32 always add time to any system call. The Unix world has lots of experience with Mach, an OS similar in some aspects to NT. Emulations of Unix, Linux, VMS or Sprite on top of Mach always cost an unacceptable amount of time. From Sprite on Mach: "Unfortunately, the Sprite server runs the Andrew benchmark at only 38% of the speed of native Sprite."
From Linux on the OSF Mach3 microkernel: "Often, as much as a 40% performance cost has been reported."
The reasons an application programmer might want to use the "native" API amount to two very important things:
- Speed. See the two references above.
- Function. Some things aren't exported from the "native" API to Win32. The simplest example is that the "native" API can do the equivalent of a Unix fork() system call. Win32 cannot. There are other examples of functionality in NT "native", but not in Win32.
There's a huge advantage to using the "native" API, even for a mere application programmer. That's what I loathe the most about you MS-shills: you think you know better than I do what I want to do. It's the very height of arrogance to say that any programmer shouldn't use a particular API that offers more functionality, faster.
-
Re:"Shared Source" and the NT native API?
I'm only replying to this one to correct a blatant falsehood, to get the correction into the slashdot record.
Anonymous Coward wrote: Executive Software write part of the Windows NT/2000/XP OS, so naturally they use the NT APIs for certain things below the Win32 level, but for applications developers, there's no advantage to using the native NT API.
It's just stupid to say that. Emulation layers like Win32 always add time to any system call. The Unix world has lots of experience with Mach, an OS similar in some aspects to NT. Emulations of Unix, Linux, VMS or Sprite on top of Mach always cost an unacceptable amount of time. From Sprite on Mach: "Unfortunately, the Sprite server runs the Andrew benchmark at only 38% of the speed of native Sprite."
From Linux on the OSF Mach3 microkernel: "Often, as much as a 40% performance cost has been reported."
The reasons an application programmer might want to use the "native" API amount to two very important things:
- Speed. See the two references above.
- Function. Some things aren't exported from the "native" API to Win32. The simplest example is that the "native" API can do the equivalent of a Unix fork() system call. Win32 cannot. There are other examples of functionality in NT "native", but not in Win32.
There's a huge advantage to using the "native" API, even for a mere application programmer. That's what I loathe the most about you MS-shills: you think you know better than I do what I want to do. It's the very height of arrogance to say that any programmer shouldn't use a particular API that offers more functionality, faster.
-
FTP and Finger anyone?
Back in late '87, folks would talk about network security in the way you talk about gopher extermination on the east coast. "They" have to do network security, but "we" run UNIX. UNIX is secure, there's never been a major break-in on a UNIX system from the network that wasn't due to misconfiguration or social engineering.
Of course, there had been, but folks kept it quiet. Why air dirty laundry. Most people who worked on the code knew there were holes, but they weren't top priority.
Enter Robert "Wormer" Morris. He decided to blow the lid on this show, and made one little mistake. The rest is history....
One day, Mr. Bernstien will make a mistake. Everyone does. When that day comes, I dearly hope that I'm not using an OS that would take it for granted that such a mistake will never happen.
Don't get me wrong. I like the man's coding ethic, I just think he should let someone else package, license and distribute his software for him, so that it promotes, not prevents others from using his software (in the same way that RMS should let someone else do his public appearences ;-) -
Read this
In this paper we address the invertibility of invisible watermarking schemes for resolving rightful ownerships, and present attacks which can cause confusion to rightful claims. We shall show that non-invertibility is a necessary but not sufficient condition in resolving ownership disputes. We then define quasi-invertible watermarking schemes, and, present analysis that links invertibility and quasi-invertibility to some classes of watermarking techniqueswith different properties (which may or may not require original versions in watermark decoding), as well as to the different classes of attacks we have developed.
full document
listens to Shake That Ass -- Mystikal -
Still having probs with WatermarkingSo they say yet again they'll be watermarking yet there are still many problems with the way its done.
Abstract: this paper the difficulties and limitations of watermarking as a general tool for document protection. We present first the variety of objectives of users for document protection, in a second Section, we present several profiles of attackers, then we describe the most classical attacks which have been developped, and, as a
conclusion, we present the special cases of the AVOs of the MPEG-4 norm, and the DVD as specially sensitive to attacks. The different techniques for invisible watermarking of images are mostly by adding a label to the image, either in the image bit plane or in the spectrum of the image, but some other different techniques have been proposed.
[Why is watermarking a hard problem] [mirrored]
Authority Figure: How long till watermarking protection is final?
DVD Crypto Engineer: Sir we're still working on it we're not sure
Authority Figure: Great I'll put out the word to Associated Press that we're ready to go.
Can you find the Mole?
-
Well, if you program in Java..
Surprisingly enough, you can end up with binaries which can't be built with a "current" copy of source code if you program in Java. (!) This is because its notion of "binary compatibility" is a bit broken.
Check out this paper:
-
thoughts on Paul Graham and Common Lisp
I'm a big fan of Common Lisp as well as Paul Graham (having read his book ANSI Common Lisp, I can see that he possesses both in-depth technical knowledge and a sense of humor). This article seems to match what I've read of him in quality, and I look forward to reading it in more depth when I get home.
I'm not a Lisp guru by any means, but the one thing that I always get a kick out of is how easy it is to become a low-level guru. After a few weeks of playing, I was doing things I wouldn't have dreamed of doing in C or Java. And it changes one's perspective. It all translates to machine code at the lowest level, of course, but after learning Lisp I can say without a doubt that I'm a better C programmer.
Lisp is an interesting language, because in some ways it's ridiculously high level (closures, generic functions, garbage collection, et cetera), but you're also able to get down and dirty with the cons cells with no trouble. I think this quote expresses it best:
What I like about Lisp is that you can feel the bits between your toes.
Lisp is extremely versatile. While it was originally used in AI, it's honestly the best tool for most situations I come across. (You can see one thing I've done here, and I've done some other stuff that I haven't had a chance to post yet.) Whenever I need to do more in the shell than loop through a few files, I write it in Lisp (I've written 5-line programs to leech an entire Web page's MP3 archive). Lisp is great at processing logs, the output of various subprocesses, and other such things. It's also got a wonderful OO system.-- Drew McDermott
Graham's "Blub" example holds true for everyone I've met who has a disdain for Lisp. The advanced features it provides really do go over their heads, which is sad, because these are often intelligent people. Also, they don't look beyond the syntax differences, and often have a lot of misinformation (Lisp is slow, you can't do iteration in Lisp, Lisp is for lists and AI) fed to them by CS professors or whoever. I also often see a lot of posts from newbies who want to write "C-Lisp" and give up when that doesn't work. Lisp is a different paradigm, and needs to be treated as such.
If you're interested in Lisp, I would recommend reading The Evolution of Lisp (don't be angry at the poor fonts in the PDF; they didn't use scalable TeX fonts, the weenies
:), Paul Graham's ANSI Common Lisp, and Winston and Horn's LISP, 3rd Edition (but ignore them when they disparage car and cdr), in that order.Also, don't be confused by the various Lisps out there. First, ignore Emacs Lisp. Among its quirkyisms, it's dynamically scoped, which means that if you declare a variable, every function you call will also have access to that variable. Secondly, Scheme and Common Lisp are vastly different. Scheme is much leaner, and has 1 namespace, which means you can't name a variable and a function the same thing (I dislike this, but it's a hotly contested issue). Common Lisp has a huge set of standard APIs and much more versatility prebuilt into its core, while Scheme tries to stay small so as to provide an easily implementable standard. I'm a Common Lisp man, myself, but try things out for yourself.
One final thing is that if you hang out with them, you'll realize that most of the long-time posters are extremely knowledgable and have a great sense of heritage. I've learned a great deal by simply lurking through their flamewars, since I find out a lot about what issues may crop up for an individual programmer.
If you want a bit more advocacy, see my recent posts on the subject.
--
-
Re:okay okay.... I'm not informed...
Wouldn't you just be mirroring if you wrote user data to the log?
Not quite. The log/journal is structurally different than the main data areas, with different synchronization and performance characteristics. Writing once to the log and once to the main data area is quite different than writing twice to the main data area.
However, an observation very similar to yours is behind log-structured filesystems. In other words, if you're going to write all the data to the log in a highly robust etc. way, why not just make the log the authoritative copy of the data? There's a whole lot of gunk that has to be worked out after that, such as how you find data and how you reclaim log space, but it all flows pretty cleanly from that initial idea. The result is pretty nifty for some kinds of workloads, but in general changing OS structures and their effects on I/O patterns have sort of left log-structured filesystems behind.
If you're interested in exploring further, the seminal papers in this area are The Design and Implementation of a Log-Structured File System by Rosenblum et al, and (IMO even better) An Implementation of a LogStructured File System for UNIX by Seltzer et al. Enjoy!
-
Re:okay okay.... I'm not informed...
Wouldn't you just be mirroring if you wrote user data to the log?
Not quite. The log/journal is structurally different than the main data areas, with different synchronization and performance characteristics. Writing once to the log and once to the main data area is quite different than writing twice to the main data area.
However, an observation very similar to yours is behind log-structured filesystems. In other words, if you're going to write all the data to the log in a highly robust etc. way, why not just make the log the authoritative copy of the data? There's a whole lot of gunk that has to be worked out after that, such as how you find data and how you reclaim log space, but it all flows pretty cleanly from that initial idea. The result is pretty nifty for some kinds of workloads, but in general changing OS structures and their effects on I/O patterns have sort of left log-structured filesystems behind.
If you're interested in exploring further, the seminal papers in this area are The Design and Implementation of a Log-Structured File System by Rosenblum et al, and (IMO even better) An Implementation of a LogStructured File System for UNIX by Seltzer et al. Enjoy!
-
intro question about cryptography.Kerchkhoff's Principle: The security of the crypto-system must not depend on keeping secret the crypto-algorithm. The security depends only on keeping secret the key. (written in 1883)
Why did Kerchkhoff made such a radical statement? Because over the last, oh roughly 500 years, history has told the sad tale of bold cryptographers who sold their systems as unbreakable, and grossly underestimated the inventiveness of their enemies.
Ciphers (encryption algorithms) need to be designed to withstand the most cunning of oppositions. Who's main method is thinking "out of the box" to come up with diffierental cryptanalysis, timing attacks -- timing how long an encryption takes, differential power analysis -- measuring the power consumption, impossible cryptanalysis -- figuring which differentials aren't possible).
Bruce Schneier at Counterpane Labs and Ross Anderson at Security Group at Cambridge University have several essays about how security systems fail because the enemy "breaks the rules". (Why Cryptosystems Fail, Why Cryptography Is Harder Than It Looks, etc.)
To understand more about how "security through obsurity" does more harm than good, read any one of the dozen accounts about the Engima used during World War II, and the Anglo-American (and Polish) effort which successfully analysed this "unbreakable" system. Like Code Breaking, The Code Breakers, or The Code Book.
-
Must Have Been My Post.Will someone please attempt to assert or refute, using some kind of solid logic or numbers or something, the statement that microkernels are a good idea but Mach is a bad implementation of that idea? What is done wrong in Mach, and can it be fixed?
It seems you are refering to this post of mine a while ago. For proof of Mach's deficiencies I linked to two research papers; On Microkernel Construction and The Impact of Operating System Structure on Memory System Performance in that post. If you want the capsule summary then here's a short list of Mach's deficiencies as posed by Liedtke- Inefficient kernel-user switching (i.e. system calls spend too much unecessary time in the kernel).
- Inefficient address switching (i.e. the number of cache and TLB misses in the Mach microkernel is also absurdly large).
- The performance of IPC operations and thread switching is subpar. (this is related to the above points).
- It isn't optimized for specific hardware and instead has a Hardware Abstraction Layer which slows it down considerably.
If mach is, indeed, a bad implementation of the microkernel, what would be a *good* implementation of the microkernel? Are any well-designed microkernels out there?
Neutrino and EROS to name two.
If there are, then what is it that repeatedly leads projects like xMach/HURD/OS X/mkLinux to embrace Mach as opposed to one of the competing microkernels?
The same reason most of us are using Java and C++ instead of SmallTalk, Lisp or Objective-C. Developer inertia and people falling to the more hyped and/or better sold technology.
and, btw, this is kind offtopic, but while we're VAGUELY near the subject: someone once told me that Mach has the ability to host multiple kernels on the same machine at the same time. Is this true? How does that work in terms of sharing the hardware? How do you go about doing this?
A microkernel can load different modules at runtime that may be OS emulation layers that mimic the behaviors of certain OSes even down to memory management and paging. Since you can theroetically load as many modules as you want, you can load various emulation layers and mimic several OSes at once. For instance OS X has a microkernel that loads modules to mimic BSD as well as old Apple APIs (Classic) as well as teh new stuff (Carbon). Here's a graphical look at how the MacOS X architecture.
-- - Inefficient kernel-user switching (i.e. system calls spend too much unecessary time in the kernel).
-
Re:"Mach is a bad microkernel implementation".. HO
Will someone please attempt to assert or refute, using some kind of solid logic or numbers or something, the statement that microkernels are a good idea but Mach is a bad implementation of that idea?
Gladly. And I don't even have to do it myself.
For starters, check out On u-Kernel Construction, a paper written by Jochen Liedke for the 15th ACM Symposium on Operating System Principles. It contains a thorough technical explanation of why Mach performs poorly, and provides corroborating evidence measured on multiple architectures.
Additionally, using Mach as a "hardware abstraction layer" for a userspace Unix server, rather than as a true microkernel, only compounds the kernel and related subsystems' poor performance.
-
Re:"Mach is a bad microkernel implementation".. HOIf mach is, indeed, a bad implementation of the microkernel, what would be a *good* implementation of the microkernel?
L4
Will someone please attempt to assert or refute, using some kind of solid logic or numbers or something, the statement that microkernels are a good idea but Mach is a bad implementation of that idea? What is done wrong in Mach, and can it be fixed?
search the l-k list archives for some examples. one point is that only one thread at a time can be in any module, as a result of messages being queued. mach is much slower than other microkernels when it comes to message passing. another issue is that when you shove use a monolithic unix server (eg, as do mklinux, bsd lites, next), you lose all the advantage of using a microkernel in the first place. if your single server dies, everthing's shot anyway. also, message passing is not the only way to isolate fault domains. (see here.)
i don't believe that message passing really gains you much abstration. if you had all the vm calls/messages placed in a generic way, it wouldn't let you optimize for the specific pager implementation. but, to allow an easy drop in replacement, you wouldn't be able to depend on the current vm's subtle behaviors. this is just my thought; i haven't exactly done a lot of kernel design, so take that with a grain of salt.
and, btw, this is kind offtopic, but while we're VAGUELY near the subject: someone once told me that Mach has the ability to host multiple kernels on the same machine at the same time. Is this true?
yes, it's definetly true. i know the mklinux server has been debugged using another running instance of the linux server. people have often talked about run time switching between mac os x and mklinux, but nothing's come of it. and to be safe, it would need some resource locking, eg, so that you don't have the two servers writing to the same disk partition.
-
Citeseer is amazingIf anyone is interested in mathematics or computer science and hasn't checked out Citeseer yet, go there now! The sheer quantity and quality of papers it indexes and stores is amazing -- and the intelligent cross referencing is the best I've seen.
Here's an example that I was looking at earlier this week. Want to read the paper where Biham and Shamir rediscover differential cryptanalysis? Here it is. It reports 115 citations of this paper, and if you click on the link, you can see the context it was cited, and then go on to download and read those papers. From browsing the citations, you might notice that linear cryptanalysis is another recently discovered technique. Citeseer doesn't include the original paper for this, but from the citation page we can see the papers which reference it, including this interesting one from the Australasian Journal of Combinatorics.
This is the way paper hunting should be done.
-
Citeseer is amazingIf anyone is interested in mathematics or computer science and hasn't checked out Citeseer yet, go there now! The sheer quantity and quality of papers it indexes and stores is amazing -- and the intelligent cross referencing is the best I've seen.
Here's an example that I was looking at earlier this week. Want to read the paper where Biham and Shamir rediscover differential cryptanalysis? Here it is. It reports 115 citations of this paper, and if you click on the link, you can see the context it was cited, and then go on to download and read those papers. From browsing the citations, you might notice that linear cryptanalysis is another recently discovered technique. Citeseer doesn't include the original paper for this, but from the citation page we can see the papers which reference it, including this interesting one from the Australasian Journal of Combinatorics.
This is the way paper hunting should be done.
-
Citeseer is amazingIf anyone is interested in mathematics or computer science and hasn't checked out Citeseer yet, go there now! The sheer quantity and quality of papers it indexes and stores is amazing -- and the intelligent cross referencing is the best I've seen.
Here's an example that I was looking at earlier this week. Want to read the paper where Biham and Shamir rediscover differential cryptanalysis? Here it is. It reports 115 citations of this paper, and if you click on the link, you can see the context it was cited, and then go on to download and read those papers. From browsing the citations, you might notice that linear cryptanalysis is another recently discovered technique. Citeseer doesn't include the original paper for this, but from the citation page we can see the papers which reference it, including this interesting one from the Australasian Journal of Combinatorics.
This is the way paper hunting should be done.
-
Citeseer is amazingIf anyone is interested in mathematics or computer science and hasn't checked out Citeseer yet, go there now! The sheer quantity and quality of papers it indexes and stores is amazing -- and the intelligent cross referencing is the best I've seen.
Here's an example that I was looking at earlier this week. Want to read the paper where Biham and Shamir rediscover differential cryptanalysis? Here it is. It reports 115 citations of this paper, and if you click on the link, you can see the context it was cited, and then go on to download and read those papers. From browsing the citations, you might notice that linear cryptanalysis is another recently discovered technique. Citeseer doesn't include the original paper for this, but from the citation page we can see the papers which reference it, including this interesting one from the Australasian Journal of Combinatorics.
This is the way paper hunting should be done.
-
Re:What they really want
Heh, I was going to write a cutesy post about how the speed of light had been exceeded by three hundred times in a lab in New Jersey, then I find http://www.salon.com/people/feature/2000/08/03/li
g ht/index.html, a post on Salon.com about how the scientist's results had been misrepresented by the media.On the other hand I did find this http://www.neci.nj.nec.com/homepages/lwan/demo.ht
m from the New Jersey lab. -
Links to more articles re:FPGA's
A quick web search yield this link to an abstract of an article entitled "The Natural Way To Evolve Hardware (1996)" by Adrian Thompson, Inman Harvey, and Philip Husbands. Links to the article in various formats are found in the upper right corner of that page. This page has links to several more related articles about evolutionary robotics and circuitry.
-
Links to more articles re:FPGA's
A quick web search yield this link to an abstract of an article entitled "The Natural Way To Evolve Hardware (1996)" by Adrian Thompson, Inman Harvey, and Philip Husbands. Links to the article in various formats are found in the upper right corner of that page. This page has links to several more related articles about evolutionary robotics and circuitry.
-
Re:I'm worried about this. --- Relax a little!
More modern devices are harder to destroy. If you mis-configure your I/O pins then you're in trouble, but within the array devices like the Xilinx Virtex seem much harder to destroy (I was chatting to someone who was trying just that, but failed without using the I/O pins).
As a flip side, here's a paper about destroying less secure FPGAs and ways to prevent it.
-- Michael
-
Why Cryptosystems FailAs many folks have already observed, the computational complexity of "breaking" the crypto is not even close to the most serious vulnerability in most systems. Ross Anderson has written a fabulous analysis of how ATMs are broken into. None of these attacks involve breaking a cipher system. You can find it here.
Cheers,
-b -
citeseer
For computer science papers, there's also:
http://citeseer.nj.nec.com/directory.html -
Mach is known as a bad microkernel implementation
The main reason that microkernels have not gained more acceptance in OS circles (although Windows NT is based on microkernel design) is that the most popular implementation of the concept (Mach) is also one of the most inefficient and badly designed.
In Joseph Liedtke's 1995 paper, On Microkernel Construction he points out that the myth of Microkernels being more inefficient and thus slower than monolithic kernels is because most benchmarks were done against the Mach microkernel. He stated that the Mach performed poorly at both address switching and context switching and also failed to take processors into account and be optimized thusly. As a test, Liedtke wrote a Microkernel OS called L3 in which he showed that a call to getpid which took 800 cycles of kernel time on the Mach to be performed took 15 - 100 cycles on the L3.
Also he also disproved the notion that using a microkernel leads to memory degradation due to a large number of cache misses a dissects a number of benchmarks from Chen and Bershad's paper The Impact of Operating System Structure on Memory System Performance.
Read his paper if you get the chance, it's very enlightening.
The ideas behind microkernel design are very sound and some of them have found their way into most mainstream OSes (Linux kernel modules can be seen as a take on the Microkernel architecture). Basically, having an OS where everything is hot swappable including memory management, process scheduling and device drivers is kind of cool. Also the fact that the usual OS level APIs can now be swapped out meaning you can have both POSIX layer and a Win32 layer on the same OS is rather nice.
-
Try this ...
I have had really good luck with CiteSeer.
YMMV.
-
Re:Larry Ellison was much more interesting...Unless something's changed since I stopped paying attention, QCs are Turing-equivalent -- no more, possibly less.
I thought so too, then I saw a paper by Calude, Dinneen, and Svozil.
Using the counterfactual effect, we demonstrate that with better than 50% chance we can determine whether an arbitrary universal Turing machine will halt on an arbitrarily given program. As an application we indicate a probabilistic method for computing the busy beaver function--- a classical uncomputable function. These results suggest a possibility of going beyond the Turing barrier.
Counterfactual Effect, the Halting Problem, and the Busy Beaver Function (1999) -
Re:Oh no, not this religious war!
I'd have to ask for a reference on that one. I've seen several arguments, including Bell's Inequality, but none which make so broad and definitive a statement. (ex: this)
-
Re:Not terribly new or surprising
As for multilingual text searching and summarisation, the best technology of its kind known to me is Latent Semantic Analysis - the brain child of Thomas Landauer. It's a fairly recent, but hardly secret or obscure, indexing technique that's gaining ground commercially for data mining applications. It can certainly do the the small number of things being claimed by this article. All the relevant papers are on the web.
If anyone else is interested, you can get the paper on this here. Links to various formats on the top right corner. It also has many links to related documents. -
Re:Standard is bibtex.
One very nice thing about BibTeX is that you can get BibTeX citations for papers directly from the ACM Digital Library (sometimes), from DBLP, or from Citeseer -- so you download the paper, view it, and if you like it, cite it! There are also many large repositories of BibTeX entries for all manner of papers available, especially in the DB and OO communities -- just do a google search for "inproceedings" and a favorite author if you don't believe me.
:-) -
Re:What ARE those introns...Recent Genetic Programming research has hypothesised that introns provide structural protection for fit individuals - see Peter Nordin's paper on the subject. This would carry over to "real" evolution quite nicely.
--
-
Re:There are seminal research papers on this
I just found one of Jim Gray's replication papers, he's now at Microsoft. Is this the one you were talking about?
The Dangers of Replication and a Solution (1996) -
Route through someplace else
This is a passive solution, but it will get you past the censorware.
Just use SOCKS. Find a college buddy or anyone with a box on broadband and throw up a socks 5 server. Then put SocksCap on the boxes at school. Download all of this here.
Now you can use the local applications and they will route all requests through the remote box. This is also a very good way to get around port limitations. If a school only allows port 80, but you want ICQ, just use SocksCap on port 80. Whee!
Of course, if you're paranoid you can also tunnel socks through ssh, which would encrypt your entire Internet session. Who's collecting your data now?
-Justin
-
Route through someplace else
This is a passive solution, but it will get you past the censorware.
Just use SOCKS. Find a college buddy or anyone with a box on broadband and throw up a socks 5 server. Then put SocksCap on the boxes at school. Download all of this here.
Now you can use the local applications and they will route all requests through the remote box. This is also a very good way to get around port limitations. If a school only allows port 80, but you want ICQ, just use SocksCap on port 80. Whee!
Of course, if you're paranoid you can also tunnel socks through ssh, which would encrypt your entire Internet session. Who's collecting your data now?
-Justin
-
Paper will always be with us
The paper medium has survived the "killer apps" of Radio and Television, whose to say it won't survive now? I know I myself enjoying laying in my room reading Asimov, Tolkein, and Faulkner, and the mere tactile feedback of reading a book that is yellowed with age from being from the 70s and before is enough as a reminder that whatever great authors today - Stephenson, Clancy, Crichton - they are merely standing upon the shoulders of the greats who went before them.
Incidentally, check out this study by Xerox/EuroPARC comparing computerized methods of studying versus their paper equivalent. If I recall correctly, they found paper based studying led to higher grades then their computerized equivalents. However, the computer was much more popular for items such as research. Paper and e-Paper both have their roles within society, just as technology and agriculture remain two vitally different but vitally important aspects of human culture.