Domain: nec.com
Stories and comments across the archive that link to nec.com.
Comments · 437
-
Re:Just glad it's not QWERTY.
QWERTY is better because it is consistent.
Actually, this discussion is confusing two different metrics. One metric is the maximum speed limit of a given keyboard based on letter frequencies and Fitt's law. A second metric is the learning time needed to get close to that maximum speed.
If you are interested you can get some background from this paper, for example. The paper references previous work on a "metropolis" keyboard with hexagonal keys to improve key packing. Better key packing means maximum key sizes with minimal stylus/finger travel. -
Re:Handcuffs
Yeah, but it insaults my intelligence when "Legally Blonde 2000" is called Information.
Real info was, is and remains free. -
Re:64bit performance gains...
For programs that fit nicely in a 32 bit address space, perhaps you could designate one of the 64-bit registers as a base pointer, and store all the addresses as offsets? This may be cheaper than you think, since we now have 8 additional registers to play with.
No, I'm pretty sure that would suck hard. Whatever addressing modes your CPU provides, you just lost one register. For instance, AMD64 provides [base + stride*index + offset] to access arrays. Your scheme could no longer access that in one memory reference; you'd need an extra add instruction.On the other hand, for some applications it may be useful and convenient to map large files into memory.
You should see the alloc stream facility. It has performance at least as good as a whopping big memory-mapped file, but without eating a lot of address space. It's a nice interface; too bad nobody uses it. -
Found the paper
A decent paper discussing the theory behind ISP to ISP peering is linked through Citeseer here. To download a copy of the paper, you click on the appropriate cached format in the top right corner of the page.
-
So?Big whoop, I can post a link to a billion papers too.
The vast majority of what is happening on the "bleeding edge" is happening at places other than Microsoft.
-
Not new.Open Sesame (1993!) by Charles River Analytics for the mac did stuff like this: would 'learn' when you did things and open programs for you, where you saved files, how often you rebuilt the desktop, ect.
You could also direct it by voice command. I had this program back in the day, heady stuff at the time.
Here's a pile of other stuff on Software Assistants.
-
motion blending
There's actually been quite a bit of research to do motion blending so that the transition between states are not noticeably unnatural.
So the real answer is, it's not a limitation of mocap, but current application of the technology. -
Re:Comparison of Bayesian spam filtersI've always wondered how Paul Graham has managed to get so much hype built up about his work. The idea of using Bayesian filters to classify spam had been around about 5 years prior to his "A Plan For Spam" - check out, for example, this paper by Mehran Sahami (a very cool guy who works here at Stanford as well as at Google) from 1998: http://citeseer.nj.nec.com/sahami98bayesian.html (and if you search around on Citeseer you'll undoubtedly find many other papers on spam classifying from even earlier, though not all use Naive Bayes).
Mathematically, Graham's version of Naive Bayes is pretty weak - look at the original A Plan for Spam, he chooses all kinds of random numbers based purely on trial and error, rather than backing them up with mathematical reasoning:
I want to bias the probabilities slightly to avoid false positives, and by trial and error I've found that a good way to do it is to double all the numbers in good. This helps to distinguish between words that occasionally do occur in legitimate email and words that almost never do. I only consider words that occur more than five times in total (actually, because of the doubling, occurring three times in nonspam mail would be enough). And then there is the question of what probability to assign to words that occur in one corpus but not the other. Again by trial and error I chose
That's just one paragraph, stuff like that is all over the paper. There are many more logical ways to bias the classifier away from false-positives, which I'm not sure if it's worth getting into. Having spent the summer implementing many different variations on spam filtering, I can say confidently that Graham's variation is definitely far from the best. .01 and .99. There may be room for tuning here, but as the corpus grows such tuning will happen automatically anyway. -
other ways...
red team sounds like something a closed package would need. linux and other free software offer additional options for testing. openbsd does a continuous code audit. linux has the kernel janitors. in addition there are numerous citations for fuzz - here's one.
i get the idea you want a company to do all this work and then place a certification on distros or packages. you confuse the issue with the buzzword scented "red team" references, but it really sounds like you want to use the services of such a company - or create one and create buzz for such a company. -
Re:Microsoft Research or Ripoff?
The American Sign Language translation glove was actually introduced at the 2002 Intel Science Talent Search competition by Ryan Patterson of Grand Junction, CO. Patterson's glove uses custom designed electronics to detect hand and finger movements and translate those movements from ASL into their English forms, letters and punctuation. Of course, sign language translation goes back a lot further than 2002, as early as 1995 there were working examples of this, as evidenced by the paper here
-
Re:6 degrees of separation
There is all this BS about every IP packet being traceable, but people have been researching anonymous P2P for a while now: eg, link. What's difficult about using proxies, multicast, and public/private key encryption to make the true users of a content-distribution network hard to identify? You might take a bandwidth hit, but the speed capability of networks hasn't gotten close to full potential.
-
Scientific PapersBeing an undergrad hoping to do research in this area in the next few years, I've already read a few of Och's papers and others in the field. Some of the best that I remember are:
- Improved Statistical Alignment Models (2000) - Franz Josef Och, Hermann Ney, which investigates and compares several models
- A Syntax-based Statistical Translation Model - Yamada, Knight (2001), which tries to treat sentences structurally instead of just a stream of words
- A Finite-State Approach to Machine Translation - Bangalore, Riccardi (2001), which uses a different way of looking at the problem than usual
-
Scientific PapersBeing an undergrad hoping to do research in this area in the next few years, I've already read a few of Och's papers and others in the field. Some of the best that I remember are:
- Improved Statistical Alignment Models (2000) - Franz Josef Och, Hermann Ney, which investigates and compares several models
- A Syntax-based Statistical Translation Model - Yamada, Knight (2001), which tries to treat sentences structurally instead of just a stream of words
- A Finite-State Approach to Machine Translation - Bangalore, Riccardi (2001), which uses a different way of looking at the problem than usual
-
Scientific PapersBeing an undergrad hoping to do research in this area in the next few years, I've already read a few of Och's papers and others in the field. Some of the best that I remember are:
- Improved Statistical Alignment Models (2000) - Franz Josef Och, Hermann Ney, which investigates and compares several models
- A Syntax-based Statistical Translation Model - Yamada, Knight (2001), which tries to treat sentences structurally instead of just a stream of words
- A Finite-State Approach to Machine Translation - Bangalore, Riccardi (2001), which uses a different way of looking at the problem than usual
-
Not that terribly new
- Just like all kinds of other things on Slashdot, this is early 90's technology that people here are just starting to hear about, cf. any number of citeseer refs on the subject of statistical machine translation. It even says so in the article, though I'm sure most of us didn't bother reading the whole thing.
- You do need parallel texts to make this work, i.e. things like the Canadian parlimentary transcripts (french and english), or computer/car/equipment manuals that were translated into several languages.
- I'd bet anyone a pretty penny that this is only an incremental improvement upon what everyone's been working towards the last few decades.
- It's annoying that the article was so laudatory for Mr. Och b/c it just does him and everyone else who's working on these problems a disservice when naiive people expect more than they were promised.
-
Re:Not so much a crisis...
I'd love if we could throw out the whole internet and use IPv6 instead. I'd love if we could throw out a lot of legacy technologies, like x86 and C and ASCII and gas engines. But doing this requires a lot of changes to infrastructure!
NAT is great because it's an endpoint-based solution. Stuff like uPNP can be deployed at a single site to make NAT better, and we don't have to touch a single router. (Network guys should read their own influential papers !) -
Re:subtle effect on research?
-
Yes, unfortunately
There are a lot of really crappy things about LaTeX, but it is definitely the standard. All the journals and conferences I've submitted to assume you are preparing your document in LaTeX, and give you style files to set everything up correctly. citeseer , as far as I know, can only automatically get information from LaTeX-generated PS and PDF files.
-
Re:udpp2p
Better yet, what if no file is actually ever sent, but randomish blocks of bits that must be XOR'ed together to reconstitute the file.
Clever idea. Too bad you're not the first to think of it.
-
Re:LOC MetricYeah yeah, it's the same old story. I think everyone who has really thought about LOC has had the same idea. The truth is that plain old lines-of-code is the most effective measure of software development.
The problems you are trying to solve really are not problems in practice. Most files don't have many blank lines. Comments take just as long to write as any other code, and arguably deserve to be counted. Statements broken among multiple lines usually indicate that the author thought they were complex enough to warrant multiple lines, and therefore they probably deserve to be counted that way. Et cetera.
And above all else, I don't think people consider a difference in 50% in LOC to be significant anyway, so there's no point worrying about whether people put the curly brace on a separate line. If you ever see someone say that system X is simpler because it has 25kLOC instead of 30kLOC, then you ought to mention that this could be attributed to differences in coding style.
In short: if it ain't broke, don't fix it.
I once had a code counter that went one step farther than you suggest: it simulated the effect of a very high-quality language-specific data compression algorithm, and computed the entropy of a piece of code. I'm not sure the results were any more valid than any other metric.
-
Re:Who cares?
Obviously you didn't read this link as suggested by a previous submitter.
-
Do you trust your god?
More disturbing to me is that Google is rapidly becoming the accepted repository of all knowledge - want to know something? Type it into google and accept the results as fact. It's on the internet, so it must be true!
This is becoming evident in a number of fields - but two I'm familiar with are academic research and journalism.
An academic looking for research papers is more than likely to do an online search. There is evidence that papers available online receive more attention. The quality of online research can also be questionable - papers are put online as prepress papers, before they have been through the review process required for publication in most journals.
As for journalists, I have read countless pieces in print that have been clearly researched on the internet alone. Although these are mostly features or fluff pieces, I see it in news stories too, where background material has clearly been garnered by spending a half hour with google.
Think of any contentious current affairs or political issue, and think of who is going to put the time and effort into puting material online about that issue -it's not going to be some altruistic, unbiased observer. -
Here's an important one.
Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications, also available here if you're one of the unwashed masses. It has algorithms to see if your app is facing floating point trouble.
-
Re: well...
> A G5 is faster than the fastest Intel box with Linux. Read the benchmark whitepaper. It describes the testing methodology in precise detail. In a side-by-side, controlled test, the single-processor G5 was 10% slower on integer performance but 20% faster on floating point performance than the Pentium 4 with Linux.
Apparently they never got so far as Chapter 1 in Hennesy & Patterson, where you learn the mantra of "make the common case fast". -
Re:OzAt the time you write your code you don't know the actual value of some symbols. It is varying depends on other run-time values. Thus it's called "variable".
Your question shows that you don't know what is functional programming. To understand that I advise you to read "Why Functional Programming Matters" (HTML short version).
-
Re:Analysis a bit weakI agree. The weaknesses mentioned are not Bluetooth-specific. If you use the 16 octets for the PIN, there is nothing wrong with the security, and building a Diffie-Hellman exchange on the application level to obtain the PIN automatically should not pose a problem.
There are some security issues in the E0 algorithms, but the efecive key length is still around 73-84 bits which is more than enough (article here).
-
Citeseer
Citeseer is a great source of Computer Science related papers. Best feature is that it automatically lists along side each paper, each paper that it cites, as well as papers that cite it. (And also a frequency chart to show often the paper was cited in the years after its publishing)
Good way to keep track of the latest papers in different fields. -
Article Text for those too lazy to click the linkIntroduction
WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:
- A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
- Algorithms for compressing web graphs that exploit referentiation
( la LINK),
intervalisation and codes to provide a high compression ratio:
for instance, the WebBase
graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of
about 18,500,000 pages of the
.uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio. - Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
- A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
- Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.
In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.
You are welcome to use and improve WebGraph! Installation
You just have to install the
.jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM. -
Re:Not quite...And remember how quickly RSA-129 (a 426-bit or so key) was factored by distributed.
You could break DH-128 on a PC if you had to,
...Note this isn't 128-digit DH, rath 128 bits. That works out to about about 39 decimal digits. I can't find exact numbers for modern hardware, but in 1991 LaMacchia and Odlyzko published Computation of Discrete Logarithms in Prime Fields , describing how they broke a larger, 192-bit key using about 1200 machine hours of a 25MHz R3000 chip, 40 hours on an unspecified machine (probably the SGI), and a few hours on a VAX. They conclude (over a decade ago) that using less that 200 bits is "very insecure".
The author states that the implementation could be improved upon, the 128-bit problem is smaller, and modern machines are much faster, so I was guessing a couple of days on modern desktop hardware.
Trillian does appear to change the prime, so the computation could only be used for one intercepted conversation.
-
Speed & Congestion
The whole point of their paper was that TCP breaks down when the bandwidth-delay product gets really high, because of the high number of packets "in flight" per control iteration, and because of the comparatively high (per-rtt) probability that non-congestion-induced packet losses will occur. So yeah, they are using a high-bandwidth, high-latency line with relatively few flows, but because that's the situation they're working on, not because it's a rigged test. I think the New Scientist article did a bad job in making it clear that this is about how to take advantage of obscene amounts of bandwidth, not how to squeeze performance out of more meager links.
If you look at the applications they're interested in, namely multi-terrabyte scientific data set tranfers, UDP wouldn't be an ideal choice because they need the reliability features of TCP as well as the congestion control. Also, I'd expect this to achieve similar throughput to (well-behaved) UDP streaming protocols, because they have similar origins. FAST TCP and modern congestion-controlled UDP applications both used rate-based congestion control, largely based on the ideas introduced in TCP Vegas..
-
Speed & Congestion
The whole point of their paper was that TCP breaks down when the bandwidth-delay product gets really high, because of the high number of packets "in flight" per control iteration, and because of the comparatively high (per-rtt) probability that non-congestion-induced packet losses will occur. So yeah, they are using a high-bandwidth, high-latency line with relatively few flows, but because that's the situation they're working on, not because it's a rigged test. I think the New Scientist article did a bad job in making it clear that this is about how to take advantage of obscene amounts of bandwidth, not how to squeeze performance out of more meager links.
If you look at the applications they're interested in, namely multi-terrabyte scientific data set tranfers, UDP wouldn't be an ideal choice because they need the reliability features of TCP as well as the congestion control. Also, I'd expect this to achieve similar throughput to (well-behaved) UDP streaming protocols, because they have similar origins. FAST TCP and modern congestion-controlled UDP applications both used rate-based congestion control, largely based on the ideas introduced in TCP Vegas..
-
Re:New Scientist didn't put it very well...
How about TCP Vegas? They use RTT measurements to proactively determine congestion.
-
Correction
It seems that the second patent was also filed in 1999 so Freenet's CHK might not constitute prior-art after all, however suitable prior art isn't hard to find, for example - 5 minutes of searching revealed this 1997 paper.
-
Re:Simplex and Operational Research
-
Re:Compare with computron
I seem to recall reading something on
/. years ago about computing that recycles the contents of registers to lower waste heat.
Almost. Reversible computing builds all of its primitives to prevent losing information -- evidently, this directly causes it to produce less heat. See Baker's papers for more information.
Am I on drugs?
Um... Hard to say. Perhaps this would make a good Ask Slashdot question? :-)
-Billy -
Re:Apple's Mail app...
Actually, the latent semantic analysis (LSA) that Apple uses is not a form of Bayesian reasoning; it uses a singular value decomposition (SVD) to perform generalized factor analysis. However, there is a probabilistic version of LSA out there.
-
Re:Smart Compilers
-
Re:Smart Compilers
-
Puzzles -- Dwork & Naor, 1992
btw, The idea was published in 1992:
C. Dwork and M. Naor. Pricing via processing or combatting junk mail. In Advances in Cryptology---CRYPTO '92, Springer-Verlag Berlin Heidelberg 1993
See also papers that site it.
-
Distributed Filesystems for Linux?
What about CFS , OceanStore or Ivy for a really distributed file system ?
:) -
Distributed Filesystems for Linux?
What about CFS , OceanStore or Ivy for a really distributed file system ?
:) -
Distributed Filesystems for Linux?
What about CFS , OceanStore or Ivy for a really distributed file system ?
:) -
Re:The problem with your argument.
The problem with chess as an AI problem is that it turns out there is a better way to play chess than the way humans play it. Due to constraints on mental processing, humans simply can not play with eight move look-ahead in multiple paths. Studies of grandmasters have suggested that most moves on the board they never even consider and that of those they consider they generally only look ahead two or someties three moves. The only time a grandmaster ever looks ahead more than about three moves is when they are trying to make sure that a complicated forced mate is actually a forced mate.
Chess is interesting to AI because it is a formal case of problem solving which is well understood and can be easily studied. Traditional AI researchers and adherents to more modern approaches such as connectionism both generally agree that problem solving is central to thought. A computer which simulates the way human beings play chess would be a tremendous breakthrough in AI. There are a few research projects which are attempting to do this (this one, for example). The problem is that early on in the Chess/AI initiative the goals became confused. As soon as computer chess began to show some promise, many AI researchers got diverted from the goal of trying to model human problem solving and got sucked in to the competitive game of trying to make a computer which is REALLY REALLY good at chess.
It so happens that with sufficiently powerful hardware, there exist algorithms which are extraordinarily good at chess yet which in no way model human problem solving. So long as the chess/AI initiative is primarily the pursuit of those algorithms chess will remain a non-interesting task to AI.
transient0
cognitive scientist -
Re:Some coding expertise...
A lot of these issues have been taken care of a long time ago. In 1996 several of my colleagues published a simple system for doing this in realtime (including integrating sound and video together for speech recognition) at the European Conference Computer Vision -- CiteSeer link to paper and there are several other papers in that same epoch describing similar systems. Clearly Intel has a more complete system than these papers (as you would expect given 7 years), but it's not as hard as you're making it seem.
-
Re:FX32! for Itanium
Not - necessarily. I can speculate but. I dont know the exact details about the emulation but I can guess what is happening. Over the past few years, dynamic compilation, optimization and dynamic execution layer interface projects and papers have been doing the rounds in academic community. For example dynamo where a dynamic optimizer (which takes code and performs run time optimization on it - not emulation or translation) showed that the apps in fact ran *faster* even counting the overhead of optimization. This idea spawned DELI or dynamic execution layer interface which can dynamically translate instructions *as well as* perform optimizations on them using run time information. Researchers claim that execution is *faster* than running the same app on the native machine. All these are somewhat software equivalent of transmeta.
Now the interesting thing, both dynamo and DELI are from HP labs. So was HPL-PD an architecture that is the superset of itanium, invented and evaluated by HP waaaaaaay back (itanium is in fact based on HPL-PD). Now can the dynamic execution layer emulating x86 be based on DELI ? that is a speculation. -
Re:They want this information to be freely availab
In my research field they found out that a paper that's freely available on the internet gets quoted at least three times as often as a paper that's "locked away" in a "proprietary" journal... (Couldn't find the link I was searching for for that figure, sorry).
I believe this paper of Steve Lawrence is what you were looking for
:-)--
-
Snow Crash guard dogsPasting content from floating atoll:
Take an army of the recently-described feral hunting robots . To each robot, add a GPS chip and wireless mesh networking
.Give the people and dogs smart name tags , and have your dogs exchange your "business card" with the other smart name tags. Publish the FOAF url in it, so you can immediately check for compatibility and give the new information to the dogs.
Study the discovered FOAF files , each describing individual traits ("attributes").
Instruct the feral robots to find other people with compatible personalities , but to stay near you. They'll roam around, seeking people whose interests relate to yours.
For bonus points, add solar panels to generate power as it roams around, and electronic boundaries to keep it in safe areas, away from motor traffic.
-
Re:I say publish all the details overseasMuch of the information you "wish we had" is described in educational research papers.
For example in 2000 Marc Waldman, Aviel Rubin, and Lorrie Cranor published a paper describing Publius:
Publius: A robust, tamper-evident, censorship-resistant web publishing system (2000) Marc Waldman, Aviel Rubin, Lorrie Cranor Proc. 9th USENIX Security Symposium
As you can see by the link, many others have written how-to's for anonymizing network communications. The papers are archived in CiteSeer.
-
Re:I say publish all the details overseasMuch of the information you "wish we had" is described in educational research papers.
For example in 2000 Marc Waldman, Aviel Rubin, and Lorrie Cranor published a paper describing Publius:
Publius: A robust, tamper-evident, censorship-resistant web publishing system (2000) Marc Waldman, Aviel Rubin, Lorrie Cranor Proc. 9th USENIX Security Symposium
As you can see by the link, many others have written how-to's for anonymizing network communications. The papers are archived in CiteSeer.
-
Re:Six degrees of separation
There is also an algorithmic analysis of this phenomenon by Jon Kleinberg At citeseer. This work is related to unstructured P2P networks and gives an insight why the "6 degrees of separation" occur