Domain: cmu.edu
Stories and comments across the archive that link to cmu.edu.
Comments · 2,977
-
Re:600?
not really bots. Mellon is into robotic machines. Things that like to big REALLY BIG HOLES for example. Farm equipment, military equipment.. So it wants 600 acres of land it can tear to shreds and nobody says UNCLE!
actually.. my own personal opinion is they just need a bigger place to party -
This kind of robot.
This is one of the bots that will be tested.
-
Separated at Birth?
GRACE and her Twin? Maybe just cousins.
-
CMU mobot competition
The annual Mobot competition at Carnegie Mellon seems a lot like this except the robots need to be able to stay on course that also has downhill slopes along it.
My tank from the 1998 competition woulda worked fine except for endianess problems between my roommates and I. -
Prior art...Check out the links to the Monarch project from Dave Maltz's home page.
Dave did his PhD thesis on the idea of routing packets between a bunch of wavelan cards moving all over the place. If you play up the military side of it (imagine every soldier/tank with wavelan, routing packets between them!) DARPA likes to fund this kind of stuff.
Anyways, the most fun was had when Dave and his colleagues rented a fleet of cars, put a wavelan equipped laptop in each one (since this was a while ago, they were using the original 2Mb wavelan, not this 802.11b stuff), and were driving all over Pittsburgh trying to see how well packets would get through between cars....
-
What could be geekier?
This is pretty geeky. What could be geekier? Well, for one, hooking it up to this
-
CMU research...informatix or something?
I remember being shown "current" research into this a couple years ago (winter 1999-2000) on a campus tour at Carnegie Mellon--anyone remember this? It was called "Informedia", and it promised to monitor closed captioning on all channels for keywords, and record the A/V stream as well as save the closed captioning.
Oh here we go, I found a link to it. Very interesting stuff. As it turns out, the use is to store this video in libraries...it would be recorded from WQED and similar educational stations and accessible for playback later. Very entertaining project, IMO.
Here's an early overview of the project.
"RATIONALE of the Informedia Digital Video Library Goal:
The Informedia(tm) Digital Video Library Project at Carnegie Mellon University is creating a digital library of text, images, videos and audio data available for full content retrieval. The initial testbed will be installed in several K-12 schools and students will use the Informedia System to explore multi-media data for educational purposes. The Informedia system for video libraries goes far beyond the current paradigm of video-on-demand, by retrieving a short video paragraph in response to the user's query.
(Why is this project needed, why now)
Vast digital libraries of information will soon become available on the nation's Information Superhighway as a result of emerging multimedia computing technologies. These libraries will have a profound impact on the conduct of business, professional, and personal activities. However, it is not enough to simply store and play back information as in commercial video-on-demand services. New technology is needed to organize and search these vast data collections, retrieve the most relevant selections, and effectively reuse them.
The Informedia Library project proposes to develop these new technologies and to embed them in a video library system primarily for use in education and training. The nation's schools and industry together spend between $400 and $600 billion per year on education and training, an activity that is 93% labor-intensive, with little change in teacher productivity ratios since the 1800s. The new digital video library technology will allow independent, self-motivated access to information for learning, exploration, and research. This will bring about a revolutionary improvement in the way education and training are delivered and received." -
Re:We're going to have to deal with this...
Sombody - please, come up with a better solution. Soon.
The only solution leading to peace is education.
The argument goes like this: terrorism is bred by poverty, and the only way out of poverty is literacy.
Yes, it is that simple. Now, if I could only get a computer-assisted oral reading system to fit in 4MB on a 85 MTOPS handheld, we could actually afford to package them in with Meals Ready To Eat and such humanitarian packages. If you doubt the inevetability of this, just plot Moore's law and ask how much a Texas Instruments Speak-n-Spell would set you back on eBay these days.
Eventually, educational computer systems (which may or may not have other features like email and web browsing) will be very inexpensive and commonplace. Just like cellphones are springing up in the poorest nations that still can't afford wires. As long as speech software, computer, and communications engineers are striving for improvements, things will head generally in the direction leading up to it.
As someone who has designed software at that cutting edge, for some of the largest language learning software companies in the world, I can say with some certainty that I need more money.
-
Re:as I've said on this site before
-
pictures of grace
Have a look at this!
-
Separated at Birth?
-
Well, I did it...
-
Suggestions: RecipeML, Docbook
I'd rather see the "book" released as a collection of RecipeML files, so that I could re-arrange/import/manipulate them down the road...
I care nothing about Word nor PDF. Give me docbook sources, so that I can [again] reformat to my eBook for use in the kitchen... or to my custom kitchen appliance, should I ever make that exist.
Given the state of the populous [sedentary, generally-low-metabolism males] I'd try to focus on healthier stuff. For instance, Chicken in Mango Sauce is quite tasty [just made it last night], and is better for you than corndogs.
But, I don't see why this is better than SOAR^H^H^H^Hrecipesource.
-
Re:Won't work ...As long as the picture has to be decoded before it is sent to the TV (via your SVHS/Scart/RF lead) we'll be able to record it. Admittedly not digitally
It would be just like saying "DVD's will not be copied" 5 years ago. We all know that isn't true.
Someone will always find a way around it, just as the MPAA will always find a way to stop it. This article shows that it is seemingly more difficult for the MPAA to put these procedures in place, that it is for people to circumvent them.
This is a Good Thing--it shows the government is protecting fair-use for the most part. Just as people will not stop circumventing stupid technologies that restric fair use (e.g. DeCSS), the MPAA will never stop their crusade either.
They have a flawed business model, and think we are all thieves, and while they continue to have enough money to buy senators (Fritz et al.), that image will prevail, and the leapfrogging will continue.
-
Re:AI through simulation? - the right question?
I do not share the concensus that computing power equals intelligence. It would seem that todays computers have more flops than simple insects( some numbers Just assume like 10^5 neurons, 10^2 spikes per seconds, 10^3 connections per neuron. Giving you 10^10 connection updates per second. Neural network simulators are not far away from that number. Some are even faster using parallel computers.
But still instects outperform any computer system in most recognition tasks, they show intelligent (or at least useful) behavior.
We are not in need of more flops or something. We are desperately in need of the slightest hint about how this great software... that defines our brain functions.
I bet that todays computers with the right software (that is learning, imitation etc) could seem astonishingly intelligent. We need brain power
... not computer power to understand the brain. -
Re:Similar to Mars Pathfinder
There is an interesting account of that here: What Happened on Mars?
-
Why is everybody making the user do all the work?
If you can get the account info out of the domain/ad via LDAP etc, your halfway there.
Given that IMAP can be enabled for Exchange 5.5/2k, you should then be able to automate the moving of data from Exchange into your new IMAP server (throughly recommend Cyrus), atleast for mail anyways, contacts & calendar info is next to useless outside exchange (but retrievable via IMAP). -
DMCA Counter-notice HOWTO
Prof. Dave Touretzky at Carnegie-Mellon (yes, the same one with the gallery of CSS descrambler implementations) has a nice information page for how to provide counter-notification if you get one of these form letters and you believe the copyright infringement accusation contained in it is wrong.
-
DMCA Counter-notice HOWTO
Prof. Dave Touretzky at Carnegie-Mellon (yes, the same one with the gallery of CSS descrambler implementations) has a nice information page for how to provide counter-notification if you get one of these form letters and you believe the copyright infringement accusation contained in it is wrong.
-
Just the next step
Increasingly capable autonomous vehicles have been around for a while. I was at the Robotics Institute at CMU when their vehicle was driving around local streets, in the late 1980's. Later I believe one of their vehicles drove automously from Pittsburgh to LA, or something... I don't recall exactly but I recall the video of the 'driver' sittig with his hands behind his head.
We've all seen Short Circuit too!! :O) We've also seen Popular Science articles about autonomous flyers only a few inches long with facial recognition and ability to follow someone through the streets - I think this is probably at least feasible, if not deployed yet.
The point is that autonomous vehicles have been around, increasing in capability gradually, in part as computing power increases. So I expect this project is more about integrating existing tech into a working mass-deployment plan and developing battle and survival strategies. This is going to the next phase from just making a vehicle work.
Check out the Field Robotics Center at CMU - the latter link is to the whole Robotics Institute.
I've thought for some time it might be fun to sponsor a robotic "Race across the desert" out here in Central Oregon. I think 100 miles would probably be a sufficient test. If anyone's interested, try garyb at fxt dot com. -
Just the next step
Increasingly capable autonomous vehicles have been around for a while. I was at the Robotics Institute at CMU when their vehicle was driving around local streets, in the late 1980's. Later I believe one of their vehicles drove automously from Pittsburgh to LA, or something... I don't recall exactly but I recall the video of the 'driver' sittig with his hands behind his head.
We've all seen Short Circuit too!! :O) We've also seen Popular Science articles about autonomous flyers only a few inches long with facial recognition and ability to follow someone through the streets - I think this is probably at least feasible, if not deployed yet.
The point is that autonomous vehicles have been around, increasing in capability gradually, in part as computing power increases. So I expect this project is more about integrating existing tech into a working mass-deployment plan and developing battle and survival strategies. This is going to the next phase from just making a vehicle work.
Check out the Field Robotics Center at CMU - the latter link is to the whole Robotics Institute.
I've thought for some time it might be fun to sponsor a robotic "Race across the desert" out here in Central Oregon. I think 100 miles would probably be a sufficient test. If anyone's interested, try garyb at fxt dot com. -
Re:IANAL, but..Wasn't it just a tax issue anyway?
Yup, it was. It's a $200 tax for the weapon in question and that was quite a bit of money back then, especially for a couple of dirt-poor moonshiners.
The Court had already made that decision in Sonzinsky v. United States, 300 U.S. 506 (1937), which basically says the Court is "not free to speculate as to the motives" for the law. But, the Congressional Record clearly shows that Congress was indeed concerned that it violated the 2nd Amendment. They were assured by the Attorney General that a punitive tax measure would withstand Court scrutiny, since the Harrison Narcotics Act had already succeeded.
I agree that anti-gun people using Miller to defend their position is stupid, since the Court basically said that the 2nd Ammendment *only* protects the right to own military grade hardware.
Yes, but the lower courts (the 9th Circuit is the worst offender) stretched it into meaning that it only protected the states. Other circuits followed their lead. The Supreme Court rarely takes a case except to settle conflicts among the lower courts, so they haven't had occasion to overturn any of them.
What is really stupid is that the entire premise of constitutionality of gun control hinges on a dubious interpretation of a decision in which the defense didn't even make an appearance. And, that decision was based on a precedent set by interpretation of a provision in a state constitution that was demonstrably different from the 2nd Amendment in the very aspect that justified their decision.
In the academic world, there is basically very little debate about intended meaning of the 2nd Amendment -- the individual-rights interpretation has become the "standard model". There are numerous people supporting gun control that admit it is correct, and one openly expresses concern that allowing the 2nd Amendment to be nullified so easily puts other aspects of the Bill of Rights at risk.
I think it will change in the next decade or so. It will depend on what the lower courts do and who is appointed to the Supreme Court. But, the issue will eventually return to the Supreme Court, and the evidence for the individual-rights interpretation is too strong to be ignored.
-
Re:Snort and GUI
If you do want a GUI to run on top of snort. ACID is the best that I have seen, it needs apache, php, and mysql I believe.
ACID Homepage -
Re:Snort and GUI
We use Snort on several boxes as sensors, reporting to a central database server. It's a sweet setup that's tedious, but not difficult to maintain. The database (ACID) is easy to use, and works as a PHB pacifier.
Obviously we didn't get the whole thing up and going in a day, and we still spend time updating/tweaking signatures; but it wasn't rocket science. -
No GUI for Snort? Acid!The author doesn't mention ACID, a very good and useful interface to Snort (or at least I haven't seen it). Since he also complains about the lack of GUI (Puh-leese, an IDS is not for interns!), I suppose he hasn't heard of it. Quoting the website:
The Analysis Console for Intrusion Databases (ACID) is a PHP-based analysis engine to search and process a database of security events generated by various IDSes, firewalls, and network monitoring tools. The features currently include:
- Query-builder and search interface for finding alerts matching on alert meta information (e.g. signature, detection time) as well as the underlying network evidence (e.g. source/destination address, ports, payload, or flags).
- Packet viewer (decoder) will graphically display the layer-3 and layer-4 packet information of logged alerts
- Alert management by providing constructs to logically group alerts to create incidents (alert groups), deleting the handled alerts or false positives, exporting to email for collaboration, or archiving of alerts to transfer them between alert databases.
- Chart and statistics generation based on time, sensor, signature, protocol, IP address, TCP/UDP ports, or classification
- using Snort (www.snort.org)
- Snort alerts
- tcpdump binary logs
- Cisco PIX
- ipchains
- iptables
- ipfw
-
Re:I don't really agree here...
So just how does Coda support High Availability? While yes, that are its features and it does support server replication, disconnected operation, low bandwith connections, etc, it is technically STILL in developement and can thus have crashes and buggy behavior in many instances. I know.... I have worked with Coda, developed software (CodaVis) for it and am at Carnegie Mellon right now.
Now that said, Coda is GREAT! IT supports a number of features that no other Open FS does and it works pretty well for the research purposes I need it for (look up Internet Suspend/Resume here). -
Re:I know this is terribly Politically Incorrect b
To learn to read. It's got text to speech.
To learn to read, you need speech recognition
People are forgetting Moore's law. We had the technology to pepper the third-world with these years ago, and in an indirect way, we did. Now we must follow through.
-
Re:slashdotted!2.2 Defining SLOC
The ``physical source lines of code'' (physical SLOC) measure was used as the primary measure of SLOC in this paper. Less formally, a physical SLOC in this paper is a line with something other than comments and whitespace (tabs and spaces). More specifically, physical SLOC is defined as follows: ``a physical source line of code is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.'' Comment delimiters (characters other than newlines starting and ending a comment) were considered comment characters. Data lines only including whitespace (e.g., lines with only tabs and spaces in multiline strings) were not included.
Note that the ``logical'' SLOC is not the primary measure used here; one example of a logical SLOC measure would be the ``count of all terminating semicolons in a C file.'' The ``physical'' SLOC was chosen instead of the ``logical'' SLOC because there were so many different languages that needed to be measured. I had trouble getting freely-available tools to work on this scale, and the non-free tools were too expensive for my budget (nor is it certain that they would have fared any better). Since I had to develop my own tools, I chose a measure that is much easier to implement. Park [1992] actually recommends the use of the physical SLOC measure (as a minimum), for this and other reasons. There are disadvantages to the ``physical'' SLOC measure. In particular, physical SLOC measures are sensitive to how the code is formatted. However, logical SLOC measures have problems too. First, as noted, implementing tools to measure logical SLOC is more difficult, requiring more sophisticated analysis of the code. Also, there are many different possible logical SLOC measures, requiring even more careful definition. Finally, a logical SLOC measure must be redefined for every language being measured, making inter-language comparisons more difficult. For more information on measuring software size, including the issues and decisions that must be made, see Kalb [1990], Kalb [1996], and Park [1992].
Note that this required that every file be categorized by language type (so that the correct syntax for comments, strings, and so on could be applied). Also, automatically generated files had to be detected and ignored. Thankfully, my tool ``sloccount'' does this automatically. 2.3 Estimation Models
This decision to use physical SLOC also implied that for an effort estimator I needed to use the original COCOMO cost and effort estimation model (see Boehm [1981]), rather than the newer ``COCOMO II'' model. This is simply because COCOMO II requires logical SLOC as an input instead of physical SLOC.
Basic COCOMO is designed to estimate the time from product design (after plans and requirements have been developed) through detailed design, code, unit test, and integration testing. Note that plans and requirement development are not included. COCOMO is designed to include management overhead and the creation of documentation (e.g., user manuals) as well as the code itself. Again, see Boehm [1981] for a more detailed description of the model's assumptions. Of particular note, basic COCOMO does not include the time to develop translations to other human languages (of documentation, data, and program messages) nor fonts.
There is reason to believe that these models, while imperfect, are still valid for estimating effort in open source / free software projects. Although many open source programs don't need management of human resources, they still require technical management, infrastructure maintenance, and so on. Design documentation is captured less formally in open source projects, but it's often captured by necessity because open source projects tend to have many developers separated geographically. Clearly, the systems must still be programmed. Testing is still done, although as with many of today's proprietary programs, a good deal of testing is done through alpha and beta releases. In addition, quality is enhanced in many open source projects through peer review of submitted code. The estimates may be lower than the actual values because they don't include estimates of human language translations and fonts.
Each software source code package, once uncompressed, produced zero or more ``build directories'' of source code. Some packages do not actually contain source code (e.g., they only contain configuration information), and some packages are collections of multiple separate pieces (each in different build directories), but in most cases each package uncompresses into a single build directory containing the source code for that package. Each build directory had its effort estimation computed separately; the efforts of each were then totalled. This approach assumes that each build directory was developed essentially separately from the others, which in nearly all cases is quite accurate. This approach slightly underestimates the actual effort in the rare cases where the development of the code in separate build directories are actually highly interrelated; this effect is not expected to invalidate the overall results.
For programmer salary averages, I used a salary survey from the September 4, 2000 issue of ComputerWorld; their survey claimed that this annual programmer salary averaged $56,286 in the United States. I was unable to find a publicly-backed average value for overhead, also called the ``wrap rate.'' This value is necessary to estimate the costs of office space, equipment, overhead staff, and so on. I talked to two cost analysts, who suggested that 2.4 would be a reasonable overhead (wrap) rate. Some Defense Systems Management College (DSMC) training material gives examples of 2.3 (125.95%+100%) not including general and administrative (G&A) overhead, and 2.81 when including G&A (125% engineering overhead, plus 25% on top of that amount for G&A) [DSMC]. This at least suggests that 2.4 is a plausible estimate. Clearly, these values vary widely by company and region; the information provided in this paper is enough to use different numbers if desired. These are the same values as used in my last report. 2.4 Determining Software Licenses A software license determines how that software can be used and reused, and open source software licensing has been a subject of great debate. The Software Release Practice HOWTO [Raymond 2001] discusses briefly why license choices are so important to open source / free software projects:
The license you choose defines the social contract you wish to set up among your co-developers and users
...Who counts as an author can be very complicated, especially for software that has been worked on by many hands. This is why licenses are important. By setting out the terms under which material can be used, they grant rights to the users that protect them from arbitrary actions by the copyright holders.
In proprietary software, the license terms are designed to protect the copyright. They're a way of granting a few rights to users while reserving as much legal territory is possible for the owner (the copyright holder). The copyright holder is very important, and the license logic so restrictive that the exact technicalities of the license terms are usually unimportant.
In open-source software, the situation is usually the exact opposite; the copyright exists to protect the license. The only rights the copyright holder always keeps are to enforce the license. Otherwise, only a few rights are reserved and most choices pass to the user. In particular, the copyright holder cannot change the terms on a copy you already have. Therefore, in open-source software the copyright holder is almost irrelevant -- but the license terms are very important.
Well-known open source licenses include the GNU General Public License (GPL), the GNU Library/Lesser General Public License (LGPL), the MIT (X) license, the BSD license, and the Artistic license. The GPL and LGPL are termed ``copylefting'' licenses, that is, the license is designed to prevent the code from becoming proprietary. See Perens [1999] for more information comparing these licenses. Obvious questions include ``what license(s) are developers choosing when they release their software'' and ``how much code has been released under the various licenses?''
An approximation of the amount of software using various licenses can be found for this particular distribution. Red Hat Linux uses the Red Hat Package Manager (RPM), and RPM supports capturing license data for each package (these are the ``Copyright'' and ``License'' fields in the specification file). I used this information to determine how much code was covered by each license. Since this field is simply a string of text, there were some variances in the data that I had to clean up, for example, some entries said ``GNU'' while most said ``GPL''. In some cases Red Hat did not include licensing information with a package. In that case, I wrote a program to attempt to determine the license by looking for certain conventional filenames and contents.
This is an imperfect approach. Some packages contain different pieces of code with difference licenses applying to different pieces. Some packages are ``dual licensed'', that is, they are released under more than one license. Sometimes these other licenses are noted, while at other times they aren't. There are actually two BSD licenses (the ``old'' and ``new'' licenses), but the specification files don't distinguish between them. Also, if the license wasn't one of a small set of common licenses, Red Hat tended to assigned nondescriptive phrases such as ``distributable''. My automated techniques were limited too, in particular, while some licenses (e.g., the GPL and LGPL) are easy to recognize automatically, BSD-like and MIT-like licenses vary the license text and so are more difficult to recognize automatically (and some changes to the license would render them non-open source, non-free software). Thus, when Red Hat did not identify a package's license, a program dual licensed under both the BSD and GPL license might only be labelled as having the GPL using these techniques. Nevertheless, this approach is sufficient to give some insight into the amount of software using various licenses. Future research could examine each license in turn and categorize them; such research might require several lawyers to determine when two licenses in certain circumstances are ``equal.''
One program worth mentioning in this context is Python, which has had several different licenses. Version 1.6 and later (through 2.1) had more complex licenses that the Free Software Foundation (FSF) believes were incompatible with the GPL. Recently this was resolved by another change to the Python license to make Python fully compatible with the GPL. Red Hat Linux 7.1 includes an older version of Python (1.5.2), presumably because of these licensing issues. It can't be because Red Hat is unaware of later versions of Python; Red Hat uses Python in its installation program (which it developed and maintains). Hopefully, the recent resolution of license incompatibilities with the GPL license will enable Red Hat to include the latest versions of Python in the future. In any case, there are several different Python-specific licenses, all of which can legitimately be called the ``Python'' license. Red Hat has labelled Python itself as having a ``Distributable'' license, and package Distutils-1.0.1 is labelled with the ``Python'' license; these labels are kept in this paper.
-
Re:slashdotted!2.2 Defining SLOC
The ``physical source lines of code'' (physical SLOC) measure was used as the primary measure of SLOC in this paper. Less formally, a physical SLOC in this paper is a line with something other than comments and whitespace (tabs and spaces). More specifically, physical SLOC is defined as follows: ``a physical source line of code is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.'' Comment delimiters (characters other than newlines starting and ending a comment) were considered comment characters. Data lines only including whitespace (e.g., lines with only tabs and spaces in multiline strings) were not included.
Note that the ``logical'' SLOC is not the primary measure used here; one example of a logical SLOC measure would be the ``count of all terminating semicolons in a C file.'' The ``physical'' SLOC was chosen instead of the ``logical'' SLOC because there were so many different languages that needed to be measured. I had trouble getting freely-available tools to work on this scale, and the non-free tools were too expensive for my budget (nor is it certain that they would have fared any better). Since I had to develop my own tools, I chose a measure that is much easier to implement. Park [1992] actually recommends the use of the physical SLOC measure (as a minimum), for this and other reasons. There are disadvantages to the ``physical'' SLOC measure. In particular, physical SLOC measures are sensitive to how the code is formatted. However, logical SLOC measures have problems too. First, as noted, implementing tools to measure logical SLOC is more difficult, requiring more sophisticated analysis of the code. Also, there are many different possible logical SLOC measures, requiring even more careful definition. Finally, a logical SLOC measure must be redefined for every language being measured, making inter-language comparisons more difficult. For more information on measuring software size, including the issues and decisions that must be made, see Kalb [1990], Kalb [1996], and Park [1992].
Note that this required that every file be categorized by language type (so that the correct syntax for comments, strings, and so on could be applied). Also, automatically generated files had to be detected and ignored. Thankfully, my tool ``sloccount'' does this automatically. 2.3 Estimation Models
This decision to use physical SLOC also implied that for an effort estimator I needed to use the original COCOMO cost and effort estimation model (see Boehm [1981]), rather than the newer ``COCOMO II'' model. This is simply because COCOMO II requires logical SLOC as an input instead of physical SLOC.
Basic COCOMO is designed to estimate the time from product design (after plans and requirements have been developed) through detailed design, code, unit test, and integration testing. Note that plans and requirement development are not included. COCOMO is designed to include management overhead and the creation of documentation (e.g., user manuals) as well as the code itself. Again, see Boehm [1981] for a more detailed description of the model's assumptions. Of particular note, basic COCOMO does not include the time to develop translations to other human languages (of documentation, data, and program messages) nor fonts.
There is reason to believe that these models, while imperfect, are still valid for estimating effort in open source / free software projects. Although many open source programs don't need management of human resources, they still require technical management, infrastructure maintenance, and so on. Design documentation is captured less formally in open source projects, but it's often captured by necessity because open source projects tend to have many developers separated geographically. Clearly, the systems must still be programmed. Testing is still done, although as with many of today's proprietary programs, a good deal of testing is done through alpha and beta releases. In addition, quality is enhanced in many open source projects through peer review of submitted code. The estimates may be lower than the actual values because they don't include estimates of human language translations and fonts.
Each software source code package, once uncompressed, produced zero or more ``build directories'' of source code. Some packages do not actually contain source code (e.g., they only contain configuration information), and some packages are collections of multiple separate pieces (each in different build directories), but in most cases each package uncompresses into a single build directory containing the source code for that package. Each build directory had its effort estimation computed separately; the efforts of each were then totalled. This approach assumes that each build directory was developed essentially separately from the others, which in nearly all cases is quite accurate. This approach slightly underestimates the actual effort in the rare cases where the development of the code in separate build directories are actually highly interrelated; this effect is not expected to invalidate the overall results.
For programmer salary averages, I used a salary survey from the September 4, 2000 issue of ComputerWorld; their survey claimed that this annual programmer salary averaged $56,286 in the United States. I was unable to find a publicly-backed average value for overhead, also called the ``wrap rate.'' This value is necessary to estimate the costs of office space, equipment, overhead staff, and so on. I talked to two cost analysts, who suggested that 2.4 would be a reasonable overhead (wrap) rate. Some Defense Systems Management College (DSMC) training material gives examples of 2.3 (125.95%+100%) not including general and administrative (G&A) overhead, and 2.81 when including G&A (125% engineering overhead, plus 25% on top of that amount for G&A) [DSMC]. This at least suggests that 2.4 is a plausible estimate. Clearly, these values vary widely by company and region; the information provided in this paper is enough to use different numbers if desired. These are the same values as used in my last report. 2.4 Determining Software Licenses A software license determines how that software can be used and reused, and open source software licensing has been a subject of great debate. The Software Release Practice HOWTO [Raymond 2001] discusses briefly why license choices are so important to open source / free software projects:
The license you choose defines the social contract you wish to set up among your co-developers and users
...Who counts as an author can be very complicated, especially for software that has been worked on by many hands. This is why licenses are important. By setting out the terms under which material can be used, they grant rights to the users that protect them from arbitrary actions by the copyright holders.
In proprietary software, the license terms are designed to protect the copyright. They're a way of granting a few rights to users while reserving as much legal territory is possible for the owner (the copyright holder). The copyright holder is very important, and the license logic so restrictive that the exact technicalities of the license terms are usually unimportant.
In open-source software, the situation is usually the exact opposite; the copyright exists to protect the license. The only rights the copyright holder always keeps are to enforce the license. Otherwise, only a few rights are reserved and most choices pass to the user. In particular, the copyright holder cannot change the terms on a copy you already have. Therefore, in open-source software the copyright holder is almost irrelevant -- but the license terms are very important.
Well-known open source licenses include the GNU General Public License (GPL), the GNU Library/Lesser General Public License (LGPL), the MIT (X) license, the BSD license, and the Artistic license. The GPL and LGPL are termed ``copylefting'' licenses, that is, the license is designed to prevent the code from becoming proprietary. See Perens [1999] for more information comparing these licenses. Obvious questions include ``what license(s) are developers choosing when they release their software'' and ``how much code has been released under the various licenses?''
An approximation of the amount of software using various licenses can be found for this particular distribution. Red Hat Linux uses the Red Hat Package Manager (RPM), and RPM supports capturing license data for each package (these are the ``Copyright'' and ``License'' fields in the specification file). I used this information to determine how much code was covered by each license. Since this field is simply a string of text, there were some variances in the data that I had to clean up, for example, some entries said ``GNU'' while most said ``GPL''. In some cases Red Hat did not include licensing information with a package. In that case, I wrote a program to attempt to determine the license by looking for certain conventional filenames and contents.
This is an imperfect approach. Some packages contain different pieces of code with difference licenses applying to different pieces. Some packages are ``dual licensed'', that is, they are released under more than one license. Sometimes these other licenses are noted, while at other times they aren't. There are actually two BSD licenses (the ``old'' and ``new'' licenses), but the specification files don't distinguish between them. Also, if the license wasn't one of a small set of common licenses, Red Hat tended to assigned nondescriptive phrases such as ``distributable''. My automated techniques were limited too, in particular, while some licenses (e.g., the GPL and LGPL) are easy to recognize automatically, BSD-like and MIT-like licenses vary the license text and so are more difficult to recognize automatically (and some changes to the license would render them non-open source, non-free software). Thus, when Red Hat did not identify a package's license, a program dual licensed under both the BSD and GPL license might only be labelled as having the GPL using these techniques. Nevertheless, this approach is sufficient to give some insight into the amount of software using various licenses. Future research could examine each license in turn and categorize them; such research might require several lawyers to determine when two licenses in certain circumstances are ``equal.''
One program worth mentioning in this context is Python, which has had several different licenses. Version 1.6 and later (through 2.1) had more complex licenses that the Free Software Foundation (FSF) believes were incompatible with the GPL. Recently this was resolved by another change to the Python license to make Python fully compatible with the GPL. Red Hat Linux 7.1 includes an older version of Python (1.5.2), presumably because of these licensing issues. It can't be because Red Hat is unaware of later versions of Python; Red Hat uses Python in its installation program (which it developed and maintains). Hopefully, the recent resolution of license incompatibilities with the GPL license will enable Red Hat to include the latest versions of Python in the future. In any case, there are several different Python-specific licenses, all of which can legitimately be called the ``Python'' license. Red Hat has labelled Python itself as having a ``Distributable'' license, and package Distutils-1.0.1 is labelled with the ``Python'' license; these labels are kept in this paper.
-
Interesting with regard to my DMCA threats...
You guys might remember the DMCA Threats I received over my program embed (for TrueType fonts). I think this case has some interesting consequences for my fight, because I published my program in 1997 and haven't modified the page since. 1997 was before the DMCA became law. If in fact that publishing was found to be the only act of "trafficking", then they would not be able to sue me because of the ban on ex post facto laws in the constitution.
Of course, that defense is much weaker than the dozens of other reasons why their threats are totally stupid.
;) -
State of the art in robot localization and mappingIt sounds so easy, but localization in mobile robots is actually a very difficult problem. GPS is great for some applications (for example, helicopter robots but (of course) it doesn't work indoors and it doesn't work well at all in built-up areas (due to lack of line-of-sight and multipath problems - just like your cellphone).
One of the main potential military applications of robots is working in built-up areas, because these are so hazardous for soldiers. DARPA sponsors a LOT of work in this area, for example the MARS program.
The current most successful approaches are all broadly statistical, providing a means to "see through" the noise, drift and variations in robot sensor readings. Sebatian Thrun's group at CMU has some of the best work in this area (for an overview, see this review paper. Andrew Howard at USC has some cool movies here showing his technique based on a physical spring/damper metaphor. Great stuff.
This problem is here to stay. If you have ideas, join a grad school program and help out!
-
State of the art in robot localization and mappingIt sounds so easy, but localization in mobile robots is actually a very difficult problem. GPS is great for some applications (for example, helicopter robots but (of course) it doesn't work indoors and it doesn't work well at all in built-up areas (due to lack of line-of-sight and multipath problems - just like your cellphone).
One of the main potential military applications of robots is working in built-up areas, because these are so hazardous for soldiers. DARPA sponsors a LOT of work in this area, for example the MARS program.
The current most successful approaches are all broadly statistical, providing a means to "see through" the noise, drift and variations in robot sensor readings. Sebatian Thrun's group at CMU has some of the best work in this area (for an overview, see this review paper. Andrew Howard at USC has some cool movies here showing his technique based on a physical spring/damper metaphor. Great stuff.
This problem is here to stay. If you have ideas, join a grad school program and help out!
-
Re:GA for optimization, not solution
Just to follow up on my previous post, here are some resources:
- Memetic Algorithms' Home Page
- A Simple Heuristically Guided Search for the Timetable Problem
- (extract from the comp.ai.genetic FAQ)
- An honour's thesis on the topic
- Taiwanese site (in English) with links to papers, etc
- paper and paper available on citeseer
No, this is not a problem for the faint of heart.
-
scores
Here you can find the scores of last years competitions.
As you can see, UNSW totally flattened everyone. -
Re:a new idea?It's not a new idea. See Greg Aist's MIT Media Lab report on the subject. Dr. Aist is the one who helped prove that computer-assisted oral reading works about as well as one-on-one human reading tutoring.
The real question is, who knows how to build the smallest speech recognition hardware and software system effective in such situations, e.g., which requires the least amount of CPU, cache, and battery support? Speech recognition is not easy, but a 200 MHz system with the kind of cache common in Pentium systems is overkill. StrongArm and other RISCs without FPUs aren't that great for the task, although fixed-point versions of the DSP routines involved are feasable.
-
Re:a new idea?It's not a new idea. See Greg Aist's MIT Media Lab report on the subject. Dr. Aist is the one who helped prove that computer-assisted oral reading works about as well as one-on-one human reading tutoring.
The real question is, who knows how to build the smallest speech recognition hardware and software system effective in such situations, e.g., which requires the least amount of CPU, cache, and battery support? Speech recognition is not easy, but a 200 MHz system with the kind of cache common in Pentium systems is overkill. StrongArm and other RISCs without FPUs aren't that great for the task, although fixed-point versions of the DSP routines involved are feasable.
-
Re:a new idea?It's not a new idea. See Greg Aist's MIT Media Lab report on the subject. Dr. Aist is the one who helped prove that computer-assisted oral reading works about as well as one-on-one human reading tutoring.
The real question is, who knows how to build the smallest speech recognition hardware and software system effective in such situations, e.g., which requires the least amount of CPU, cache, and battery support? Speech recognition is not easy, but a 200 MHz system with the kind of cache common in Pentium systems is overkill. StrongArm and other RISCs without FPUs aren't that great for the task, although fixed-point versions of the DSP routines involved are feasable.
-
Re:Notice how the truly stupid throw that word aro
Read this then go look in the mirror and realize how much of a fool you just made of yourself.
Just because you post AC doesn't mean you're not a moron. -
Robobowl III
Carnegie Mellon University's hypothetical Robotics Channel covers yet another exciting robot sporting event.
-
Links from my bookmark list...I've browsed other replies, and I think they've missed the following:
- Programming Language Research - Links maintained by a CMU student.
- Compilers.Net
- Lambda the Ultimate - I found this from Meerkat. While somewhat more esoteric than straight up parsing talk, I'm seeing it spawn alot of programming language discussion across blogs.
*Smirk*
-
Downside to Code Reviews
Many others have pointed out that studies consistently show that formal reviews (especially of specifications and designs) are the most cost effective ways of removing defects. Others have provided references to the classic books on the subject. Anyone considering doing formal reviews should read them. I personnally like Tom Gilb's books.
There is a downside to consider, however, which is little mentioned, even in the formal review literature. Formal reviews require a particular type of company culture, and not all companies have or want that kind of culture. Trying to introduce formal reviews in a company that has an incompatible culture will be some mixture of painful, counter productive and political suicide.
The idea that a company would, in any way, be opposed to using the most cost effective way of removing defects seems bizarre. The truth is, not all companies care about product quality. Sure, everyone will say they care, but words are cheap. To find out what a company really cares about, see what decisions they make under pressure. See what they sacrifice, and what they keep.
- Do they cut testing to release the product on time? That means schedule is more important than quality.
- Do they refuse to return an error riddled specification to the customer so the errors are corrected before work even begins? That means saying yes (a 'can do' attitude) is more important than quality.
- When engineers start pointing out defects, are the whistleblowers labelled as 'troublemakers'? That means internal politics or a harmonious working environment are more important than quality.
The difficulty is, that before you can introduce formal reviews into an organisation, that organisation must already be highly committed to quality. Quite simply, many organisations have to introduce other, fundamental, improvements before they can use the advanced technique of formal reviews. The Capability Maturity Model (CMM) produced by the Software Engineering Institute (SEI) is a useful way of prioritising these improvements. I recommend it; I've used it in a project, and ISO 9001/9000-3 in another, and I conclude that CMM is the better of the two. They have a website.
-
The Point of Testing is NOT to find bugs ...
The Point of Testing is NOT to find bugs in your code, but rather in your software engineering process! if testing discovers a problem with your code that means your software engineering process is flawed and needs to be changed so as to ensure that this kind of bug (1) will never be able to occur again and (2) cannot hide in all your old code.
it sounds like your SW engineering process is totally crap an urgently needs to be changed.
striving for error free testing sounds complicated at first and requires discipline but it really pays off, economically and otherwise!
www.sei.cmu.edu/
i would refuse to work for shops that don't strive for and regularly achieve 100% error free software at the point before testing. -
Re:Machining Parts
That reminds me of this number:
485650789657397829309841894694286137707442087351 35 79240196520736686985134010472374469687974399261175 10973777701027447528049058831384037549709987909653 95522701171215702597466699324022683459661960603485 17424977358468518855674570257125474999648219418465 57100841190862597169479707991520048667099759235960 61320725973797993618860631691447358830024533697278 18139147979555133999493948828998469178361001825978 90103160196183503434489568705384520853804584241565 48248893338047475871128339598968522325446084089711 19771276941207958624405471613210050064598201769617 71809478113622002723448272249323259547234688002927 77649790614812984042834572014634896854716908235473 78356619721862249694316227166639390554302415647329 24855248991225739466548627140482117138124388217717 60298412552446474450558346281448833563190272531959 04392838737640739168912579240550156208897871633759 99107887084908159097548019285768451988596305323823 49055809203299960323447114077601984716353116171307 85760848622363702835701049612595681846785965333100 77017991614674472549272833486916000647585917462781 21269007351830924153010630289329566584366200080047 67789679843820907976198594936463093805863367214696 95975027968771205724996666980561453382074120315933 77030994915274691835659376210222006812679827344576 09380203044791227749809179559383871210005887666892 58448700470772552497060444652127130404321182610103 59118647666296385849508744849737347686142088052944 3
An illegal prime number :) -
Re:Is this a trend?
Everything2, which has been around for a few years now, has had a ton of gatherings of people in various areas. Unfortunately, during the one gathering I could have attended (New York, January 5, 2002) I was out of town. Bummer. I was able to see pictures of the aftermath, including all the fellow users (noders) I could have schmoozed with.
Everything2 has spawned some particularly close ties: there has been one marriage between two users, and there is a semi-official "compound" of users who now live together in New York. There's even talk about taking over a small town in Kansas on behalf of Everything2.
Me, I haven't been quite so lucky. I have never had a person come up to me and introduce him/herself as being from Everything2, even though I've been about 100 feet from one user in a computer cluster at Carnegie Mellon University. -
Atomic Energy Merit Badge requirement?
Hm...I have this one. Here is a page with info and the requirements (and, interestingly, a link to this same article): Atomic Energy.
At first I thought this wouldn't actually fulfill any of the requirements, but another look (it's been awhile) shows that you CAN do a model of a reactor and label all of the parts. The article about him didn't mention anything about labels, and some MB counselors can be real sticklers about the wording of the requirements. Betcha he didn't get any credit for it, or had to go back and label his parts! On the other hand...it didn't say "non-functional model using soup cans, timbles, and elbow macaroni", either. Guess it would have been alright, providing he had his parts and their functions clearly labeled. -
Social Networks progress in the open source world
Social Networks has been pretty slow to come to open source world. One of the few pieces of software I know that uses them is the R project, which now has some social network analysis tools.
For visualization, though, I'm currently unaware of any open-source tools. Krackplot has a free web interface, and there is a simple Java program that uses spring-based algorithms for node positioning, but I know of nothing open-source that uses Krackplot's simulated annealing algorithm.
In general, social network analysis can be very useful, but it's results are often subject to misinterpretation. For example, a social isolate in a business might be isolated for a good reason (they are doing research, for example), so you wouldn't want to tell them to integrate themselves more. But in general, it's a great tool to get another look at data you would not normally find out about. -
This has been around for years... Even for wavelanCarnegie Mellon University has had a wireless network for years now. A few years ago all of the academic buildings had full coverage, and in the past year this has been extended to dorms and most outdoor areas.
The computer science department at CMU as well as the Human Computer Interaction Institute (HCII) and the department of Electrical and Computer Engineering have been putting out papers on actual implementations of campus location systems. Most deal with its use for contextual/location aware computing (one of the more recent papers). Although some have dealt with the privacy implications (I should know, I was an author of one published at IEEE Wireless 2001). Project Aura deals with quite a bit of reasearch around what can be done positivly with this technology as well.
As one last thing, I wrote software to poll wavepoints and figure out a location over 1.5 years ago... It was less than 50 lines of C, so I have trouble being impressed by this.
-
This has been around for years... Even for wavelanCarnegie Mellon University has had a wireless network for years now. A few years ago all of the academic buildings had full coverage, and in the past year this has been extended to dorms and most outdoor areas.
The computer science department at CMU as well as the Human Computer Interaction Institute (HCII) and the department of Electrical and Computer Engineering have been putting out papers on actual implementations of campus location systems. Most deal with its use for contextual/location aware computing (one of the more recent papers). Although some have dealt with the privacy implications (I should know, I was an author of one published at IEEE Wireless 2001). Project Aura deals with quite a bit of reasearch around what can be done positivly with this technology as well.
As one last thing, I wrote software to poll wavepoints and figure out a location over 1.5 years ago... It was less than 50 lines of C, so I have trouble being impressed by this.
-
This has been around for years... Even for wavelanCarnegie Mellon University has had a wireless network for years now. A few years ago all of the academic buildings had full coverage, and in the past year this has been extended to dorms and most outdoor areas.
The computer science department at CMU as well as the Human Computer Interaction Institute (HCII) and the department of Electrical and Computer Engineering have been putting out papers on actual implementations of campus location systems. Most deal with its use for contextual/location aware computing (one of the more recent papers). Although some have dealt with the privacy implications (I should know, I was an author of one published at IEEE Wireless 2001). Project Aura deals with quite a bit of reasearch around what can be done positivly with this technology as well.
As one last thing, I wrote software to poll wavepoints and figure out a location over 1.5 years ago... It was less than 50 lines of C, so I have trouble being impressed by this.
-
Unanswered questions
This is a great idea, but the NYT article leaves a number of questions unanswered.
First: It says they used 'software' to extend the range of the system. I don't see how that's possible unless there's some software tweak that increases the transmitter's output power beyond legal limits. Even then, I question whether the transmitter could handle such overdrive for extended periods as a device designed under FCC Part 15.
Now, with that said: It -is- possible to enhance existing WiFi hardware with a better antenna, but the transceiver in question would have to have a connector for an external antenna designed right in. You can't just attach something with a clip-lead, and hope it'll work; Not at 2.4 GHz!
Next up: I've checked Etherlinx's web site as well. It is, if possible, even less detail-rich than the article. I plan to send an E-mail query to try and dig some details out of them.
Another point: Something that the WiFi peddlers are all neglecting to mention is that 2.4 GHz is (among other things) an amateur ('ham') radio band, and that ATV (Amateur Television) on that band is getting to be mighty popular, especially in the Bay Area. Slashdot has already run an article on the issue of low-power interference on 2.4 gigs... I can't help but wonder how well a big WiFi network would deal with the output signal from an ATV repeater when said signal could range anywhere from a couple of watts to the amateur max limit of a thousand watts.
And no, there is no regulation protecting Part 15 devices from interference. Quite the opposite. Read the label on any such device, and you will find that it is 'required to accept any interference, including that which may cause undesired operation.'
Just as one example, Carnegie Mellon University has, apparently, already taken this problem into account. Note this article from their Computing Services folk. They don't even want other 2.4 gig devices in operation on campus because of their own WiFi network.
Finally, the issue of security on WiFi has already been beat to death, but I'll mention it again anyway. I don't believe it's possible right now, outside of using some heavy-hitting 3rd party encryption hardware at each end of a link, to get security that's as good as that available on hardwire networks (One word: AirSnort). If anyone can prove me wrong on that point, please do so and I will cheerfully shut up about it! ;-)
The 'death' of cable or DSL? Not bloody likely. Not until it can offer the same security as hardwire, be interference-free in both transmission and reception, offer the same SPEED as you can get from hardwire, and can do so for a price that won't run us all into the poorhouse.