There's lots of other subtle biases. For example, in the "grep" example, the sh code simply called grep. If he wanted to be pure about the scripting he'd not have had any way of doing a RE in sh (since it doesn't have true RE handling builtin, it only has globs through 'case') and if he was going to use external calls then why didn't he just do 'exec grep "$@"' or otherwise afford the use of external command execution to the other languages too?
OK, here it is:
#!/bin/sh
# 2004-06-13T12:33:55+0000
# pth shgrep - a minimal shell grep implementation
# Copyright (C) 2004 Pan Tarhei Hosé, PhD.
# http://developers.slashdot.org/~Pan%20T.%20Hose/
# http://developers.slashdot.org/comments.pl?sid=110 875&cid=9411049
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
[ a$1 == a ] && echo "Usage: $0 pattern < file" && exit 1
while:; do read a || exit; [ "${a/$1/}" == "$a" ] || echo $a; done
Just a quick hack I wrote right now in less than a minute,
I am sure one could write it better. A quick test:
Two languages missing are:
Io [...],
REXX [...],
ficl [...].
You have provided great examples.
I would add another two:
Unlambda,
bf
and maybe also Ook.
Furthermore, let us not forget about
Assembly.
Seriously, I strongly believe that if
kids today had learned those languages
and tried to understand how computers really work,
we will have much less Flash/JavaScript/PHP/MySQL "elite"
(or "leet," if you will)
websites shamefully vulnerable to trivial
cross-site scripting and SQL-injection exploits.
The problem is that script kiddies today don't want to learn
anything, be it
REXX, Unlambda, IMCC, Perl 6 or even valid ANSI C for God's sake.
We have to do something about it. I agree with you.
Perl can do a lot, but nothing is more painful than having to look at a perl source code. While a program can be written semi readable, when compared to some others, like PHP or python, they typically make me want to stick very large needles in my eyes.
This supposed Perl
"readability" myth,
or the lack thereof,
is one of the reasons we will have
Perl 6.
You are not up to date with Perl development, are you?
Added bonus:
switch the live and earth and you make it impossible for terrorists to climb on the tank
This is not funny.
Personally I find
British MaD scientists
and their RPG electric armor +1
quite frightening.
"The new electric armour is made up of a highly-charged capacitor that is connected to two separate metal plates on the tank's exterior.
When an RPG warhead fires its jet of molten copper, it penetrates both the outer plate and the insulation of the inner plate."
Just imagine a cluster of those capacitors... Scary.
You mean a nested webserver, that only works as long as you keep your browserwindow open? Gee' that's technology!
Actually, I have seen this very idea in Perl--a
CGI script or a mod_perl module
using HTTP::Daemon
or raw IO::Socket::INET
sockets to start a temporary http daemon listening on
a random port for the purpose of serving
graphics made on the fly embedded in a generated web page.
Very good for statistics and charts
so you can serve everything--HTML and graphics--with one instance
of script/module without the need to include complex data in URIs of embedded images, which would run some other script to generate graphics, and without the problem of getting the right dimensions of images if they are not constant. This is actually quite a good idea.
For those who don't know Parrot,
I forgot to mention
that Parrot is a VM which will serve as a backend
for such languages as Perl5, Perl6,
Python, Ruby, Tcl, et al.
IMCC is an intermediate language
which will be a target of those high-level languages' compilers
and Parrot itself will serve as a
portable interpreter
of compiled byte-code, with JIT support on many platforms.
Performance is much more a matter of structure (exponential complexity) than language (poor linear complexity). As to level, "high level" languages limit you to their implementation of a few concepts. Depending on where the heavy lifting is, Perl could easily outperform optimized C.
Speaking about Perl and Assembly,
it is important to mention
that there are modern
object-oriented assembly languages with asynchronous I/O,
events, threads,
multiple inheritance,
garbage collection, built-in Unicode support,
etc.
See:
Parrot Assembly,
and IMCC:
"IMC stands for Intermediate Code; IMCC stands for Intermediate Code Compiler. You will also see the term PIR which is for Parrot Intermediate Representation and means the same as IMC, but for some each Parrot developer has his favorite term. PIR was the original term, where IMC seems to be the vernacular. It is an intermediate language that compiles either directly to Parrot Byte code, or translates to Parrot Assembly language. It is the preferred target language for compilers for the Parrot Virtual Machine. PIR is halfway between a High Level Language (HLL) and Parrot Assembly (PASM)."
How Is IMCC different than Parrot Assembly language?
"PASM is an assembly language, raw and low-level. PASM does exactly what you say, and each PASM instruction represents a single VM opcode. Assembly language can be tough to debug, simply due to the amount of instructions that a high-level compiler generates for a given construct. Assembly language typically has no concept of basic blocks, namespaces, variable tracking, etc. You must track your register usage and take care of saving/restoring values in cases where you run out of registers. This is called spilling.
"IMC is medium level and a bit more friendly to write or debug. IMCC also has a builtin register allocator and spiller. IMC has the concept of a "subroutine" unit, complete with local variables and high-level sub call syntax. IMCC also allows unlimited symbolic registers. It will take care of assigning the appropriate register to your variables and will usually find the most efficient mapping so as to use as few registers as possible for a given piece of code. If you use more registers than are currently available, IMCC will generate instructions to save/restore (spill) the registers for you. This is a significant piece of every compiler.
"While it is possible to write more efficient code by hand directly in PASM, it is rare. IMC is still very close to PASM as far as granularity. It is also common for IMCC to generate instructions that use less registers than handwritten PASM. This is good for cache performance."
For a good introduction to Parrot, read
Parrot: Some Assembly Required by Simon Cozens.
There is a great article (also on ONLamp.com)
Building a Parrot Compiler by Dan Sugalski
(I have no idea why it wasn't posted on the Slashdot front page).
(By the way,
for those who read off-line,
here is a
printable version
of the linked Why Learning Assembly Language Is Still a Good Idea article in one piece.)
The similarity between open source and the academic process with their 'you share, I share' principles is shown by the human genome project.
Very true.
"If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas." -- George Bernard Shaw (1856 - 1950)
This is propably even more insightful when applied to
biotechnology than to software at large.
Speaking about biotechnology and free software,
check out
the bioperl project:
"Officially organized in 1995 and existing informally for several years prior, The Bioperl Project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
"Facilitated by the Open Bioinformatics Foundation we work closely with our friends and colleagues across many projects including biojava.org, biopython.org, DAS, bioruby.org, biocorba.org, EnsEMBL and EMBOSS.
"The Bioperl server provides an online resource for modules, scripts, and web links for developers of Perl-based software for life science research. We can also provide web, FTP and CVS space for individuals and organizations wishing to distribute or otherwise make freely available standalone scripts & code."
In order to save some memory on my system, I started rewritting the script into C, using GTK2 (a good excuse to learn this library). After implementing most of the functionality, I found that it took about 17MB. I wonder how much memory it would use if I ported it to motif (or athena widgets).
This can be quite misleading.
If some other process is already using GTK on your system--like,
say, the Gimp--then running your program does not really uses much more memory, because most of that memory "used" by your program
(mapped to its process) is in the shared object which is
already loaded anyway.
(Provided your program is dynamically linked with GTK.)
This is why
adding memory used by processes
can (and usually does) give more than there really is memory
on the system, including swap.
For example, run this from the shell:
cat/proc/*/stat | cut "-d " -f23 | perl '-e$s+=$_ while<>;print int$s/10**6'; echo MB of memory is used by `ls/proc | wc -l` processes; free -tm | perl '-nleprint"but only $2MB of real $1MB total memory (RAM + swap) is really used."if/^T\S+\s+(\d+)\s+(\d+)/'
It was supposed to be all in one big line,
but it's ugly, so let's turn it into a script:
#!/bin/sh
cat/proc/*/stat | cut "-d " -f23 \
| perl -e '$s+=$_ while<>; print int $s/10**6'
echo MB of memory is used by `ls/proc | wc -l` processes
free -tm \
| perl -ne 'print "but only $2MB out of $1MB "if/^T\S+\s+(\d+)\s+(\d+)/'
echo total memory is really used.
On my system,
a Debian desktop with two weeks of uptime,
it prints:
1564MB of memory is used by 187 processes
but only 315MB out of 752MB total memory is really used.
This machine has only 256MB of RAM
and is using only 67MB of swap--this
is hardly 1.5GB which is supposedly "used" by all of those
processes.
Is the Linux desktop getting heavier and slower?
Yes, indeed.
Is the desktop in general getting heavier and slower?
Again, it is.
I like using GNUstep/Window Maker on my *nix boxes. It looks great and it's a lean, mean window moving machine.
That's exactly what I am using right now:
Debian GNU/Linux, Window Maker and Galeon.
So no, my desktop is not getting heavier,
nor it is getting slower.
So I guess the correct answer to the very question
whether the Linux desktop is getting heavier and slower
should be:
Whose Linux desktop?
It's time that Linus fold. Brown clearly has him by the teeth and isn't going to let go until Linux admits what has been so clearly proven to us. Linus must reveal his theft of code from Santa Claus [...]
"Oh, the white airbags don't work? Here, let me paint it blue."
Great, that's exactly what I need just before my death:
a blue screen of death!
On the other hand,
I always suspected that my last words would be
"Damn you, Bill Gates!"
That's true.
As a matter of fact I do happen
to have an/32 ip block
("/32 routable space," if you will,
or a "Class D network"
with subnet mask no less than 255.255.255.255)
and also another/8 one--namely 127.0.0.0/8--a real Class A network
with subnet mask of 255.0.0.0,
i.e. all IP addresses from 127.0.0.1
up to 127.255.255.255,
exactly 16777215 (sic!) routable IP addresses,
which I proudly administer, and which happily
"capture naughty traffic" on a daily basis
(like there was no tomorrow, in fact)
thanks to images.google.com.
That is why I find this article especially interesting
and insightful.
Other than core system configuration and core libraries the whole system uses, I ideally think *any app should be totally confined to one directory level. IMO this is one thing Windows does right.
That's true and very Score:5, Insightful.
The only two things
which should not be in the application's own directory
(i.e. in standard/opt/application_name on Unix)
are configuration files and public libraries.
After all, why would anything besides the config files,
libraries and executables should be in any fixed place?
Oh, yes, the executables,
it's good to have them in PATH without
having every program add its own dir to PATH.
So anyway, there are only three things
which should not be in the application's own directory,
but everything other than configuration,
libraries, executables and manual pages,
that's four, four things,
just four things,
configuration files,
libraries, executables, man pages and
logs, the five things,
the only five things which could be in a central place are
configuration files,
libraries, executables, man pages, logs
and temporary files, I mean six, that's six things,
libraries, executables, man pages, logs,
temp files and pid files,
seven, only seven, the only seven things
are temp files, pid files, lock files...
Amongst the things...
Amongst those directories...
are such elements as fear, surprise.... I'll come in again.
If this post does not get moderated +5 informative, the mods are on crack.
I think I can honestly say I wholeheartedly agree with you.
It took me over two hours to write,
finding all of the relevant links
and manually formatting the HTML,
but I didn't do all of that hard work--very
hard work, I might add--only to get
moderated as Score:5, Informative--not at all!--but
rather to provide some useful information
to the Slashdot community.
Please don't thank me.
I only did my duty as a Slashdotter.
I am glad I could be helpful.
Thank you.
"Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"
I agree. I hope
Google will finally put some work
into refining their search results.
I mean, they are probably the worst search engine ever!
Now, Yahoo, MSN, Overture, Altavista... Those are much better.
But Google?! Please...
OK, here it is:
#!/bin/sh
0 875&cid=9411049
:; do read a || exit; [ "${a/$1/}" == "$a" ] || echo $a; done
# 2004-06-13T12:33:55+0000
# pth shgrep - a minimal shell grep implementation
# Copyright (C) 2004 Pan Tarhei Hosé, PhD.
# http://developers.slashdot.org/~Pan%20T.%20Hose/
# http://developers.slashdot.org/comments.pl?sid=11
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
[ a$1 == a ] && echo "Usage: $0 pattern < file" && exit 1
while
Just a quick hack I wrote right now in less than a minute, I am sure one could write it better. A quick test:
pth@ws0:43:~/sh/shgrep$ ./shgrep arse < /usr/share/dict/words
arsehole
arseholes
arsenal
arsenals
arsenate
arsenic
arsenide
[...]
Seems to work fine. Is it pure enough?
You have provided great examples. I would add another two: Unlambda, bf and maybe also Ook. Furthermore, let us not forget about Assembly. Seriously, I strongly believe that if kids today had learned those languages and tried to understand how computers really work, we will have much less Flash/JavaScript/PHP/MySQL "elite" (or "leet," if you will) websites shamefully vulnerable to trivial cross-site scripting and SQL-injection exploits. The problem is that script kiddies today don't want to learn anything, be it REXX, Unlambda, IMCC, Perl 6 or even valid ANSI C for God's sake. We have to do something about it. I agree with you.
This supposed Perl "readability" myth, or the lack thereof, is one of the reasons we will have Perl 6. You are not up to date with Perl development, are you?
This is not funny. Personally I find British MaD scientists and their RPG electric armor +1 quite frightening. "The new electric armour is made up of a highly-charged capacitor that is connected to two separate metal plates on the tank's exterior. When an RPG warhead fires its jet of molten copper, it penetrates both the outer plate and the insulation of the inner plate." Just imagine a cluster of those capacitors... Scary.
Actually, I have seen this very idea in Perl--a CGI script or a mod_perl module using HTTP::Daemon or raw IO::Socket::INET sockets to start a temporary http daemon listening on a random port for the purpose of serving graphics made on the fly embedded in a generated web page. Very good for statistics and charts so you can serve everything--HTML and graphics--with one instance of script/module without the need to include complex data in URIs of embedded images, which would run some other script to generate graphics, and without the problem of getting the right dimensions of images if they are not constant. This is actually quite a good idea.
Can this web server run PHP applications? If so, can it run the TCP/IP stack and a web server in-- Oh, God, my head!
For those who don't know Parrot, I forgot to mention that Parrot is a VM which will serve as a backend for such languages as Perl5, Perl6, Python, Ruby, Tcl, et al. IMCC is an intermediate language which will be a target of those high-level languages' compilers and Parrot itself will serve as a portable interpreter of compiled byte-code, with JIT support on many platforms.
Speaking about Perl and Assembly, it is important to mention that there are modern object-oriented assembly languages with asynchronous I/O, events, threads, multiple inheritance, garbage collection, built-in Unicode support, etc. See: Parrot Assembly, and IMCC:
"IMC stands for Intermediate Code; IMCC stands for Intermediate Code Compiler. You will also see the term PIR which is for Parrot Intermediate Representation and means the same as IMC, but for some each Parrot developer has his favorite term. PIR was the original term, where IMC seems to be the vernacular. It is an intermediate language that compiles either directly to Parrot Byte code, or translates to Parrot Assembly language. It is the preferred target language for compilers for the Parrot Virtual Machine. PIR is halfway between a High Level Language (HLL) and Parrot Assembly (PASM)."
How Is IMCC different than Parrot Assembly language?"PASM is an assembly language, raw and low-level. PASM does exactly what you say, and each PASM instruction represents a single VM opcode. Assembly language can be tough to debug, simply due to the amount of instructions that a high-level compiler generates for a given construct. Assembly language typically has no concept of basic blocks, namespaces, variable tracking, etc. You must track your register usage and take care of saving/restoring values in cases where you run out of registers. This is called spilling.
"IMC is medium level and a bit more friendly to write or debug. IMCC also has a builtin register allocator and spiller. IMC has the concept of a "subroutine" unit, complete with local variables and high-level sub call syntax. IMCC also allows unlimited symbolic registers. It will take care of assigning the appropriate register to your variables and will usually find the most efficient mapping so as to use as few registers as possible for a given piece of code. If you use more registers than are currently available, IMCC will generate instructions to save/restore (spill) the registers for you. This is a significant piece of every compiler.
"While it is possible to write more efficient code by hand directly in PASM, it is rare. IMC is still very close to PASM as far as granularity. It is also common for IMCC to generate instructions that use less registers than handwritten PASM. This is good for cache performance."
For a good introduction to Parrot, read Parrot: Some Assembly Required by Simon Cozens. There is a great article (also on ONLamp.com) Building a Parrot Compiler by Dan Sugalski (I have no idea why it wasn't posted on the Slashdot front page).
(By the way, for those who read off-line, here is a printable version of the linked Why Learning Assembly Language Is Still a Good Idea article in one piece.)
The similarity between open source and the academic process with their 'you share, I share' principles is shown by the human genome project.
Very true. "If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas." -- George Bernard Shaw (1856 - 1950)
This is propably even more insightful when applied to biotechnology than to software at large.
Speaking about biotechnology and free software, check out the bioperl project:
"Officially organized in 1995 and existing informally for several years prior, The Bioperl Project is an international association of developers of open source Perl tools for bioinformatics, genomics and life science research.
"Facilitated by the Open Bioinformatics Foundation we work closely with our friends and colleagues across many projects including biojava.org, biopython.org, DAS, bioruby.org, biocorba.org, EnsEMBL and EMBOSS.
"The Bioperl server provides an online resource for modules, scripts, and web links for developers of Perl-based software for life science research. We can also provide web, FTP and CVS space for individuals and organizations wishing to distribute or otherwise make freely available standalone scripts & code."
Very interesting.
This can be quite misleading. If some other process is already using GTK on your system--like, say, the Gimp--then running your program does not really uses much more memory, because most of that memory "used" by your program (mapped to its process) is in the shared object which is already loaded anyway. (Provided your program is dynamically linked with GTK.) This is why adding memory used by processes can (and usually does) give more than there really is memory on the system, including swap. For example, run this from the shell:
It was supposed to be all in one big line, but it's ugly, so let's turn it into a script:
On my system, a Debian desktop with two weeks of uptime, it prints:
This machine has only 256MB of RAM and is using only 67MB of swap--this is hardly 1.5GB which is supposedly "used" by all of those processes.
If I were to be awaken after having been sleeping forever, my first question would be:
"Has Duke-- ah, never mind..."
That's true. I am using a good old 1.2.5 right now.
User-Agent: Mozilla/5.0 Galeon/1.2.5 (X11; Linux i586; U;) Gecko/20020623 Debian/1.2.5-0.woody.1
It's old. It's small. It's ugly. It's responsive. It's fast.
Is the Linux desktop getting heavier and slower? Yes, indeed. Is the desktop in general getting heavier and slower? Again, it is.
That's exactly what I am using right now: Debian GNU/Linux, Window Maker and Galeon. So no, my desktop is not getting heavier, nor it is getting slower. So I guess the correct answer to the very question whether the Linux desktop is getting heavier and slower should be: Whose Linux desktop?
Realistic human graphics? You mean pornography?
...to listen to the universe. I do it pretty much all the time.
You mean, the Santa Claus Operation?
Amazing. They have 404 accomplishments. Pretty impressive, I must say.
Great, that's exactly what I need just before my death: a blue screen of death! On the other hand, I always suspected that my last words would be "Damn you, Bill Gates!"
Debian.
That's true. As a matter of fact I do happen to have an /32 ip block
("/32 routable space," if you will,
or a "Class D network"
with subnet mask no less than 255.255.255.255)
and also another /8 one--namely 127.0.0.0/8--a real Class A network
with subnet mask of 255.0.0.0,
i.e. all IP addresses from 127.0.0.1
up to 127.255.255.255,
exactly 16777215 (sic!) routable IP addresses,
which I proudly administer, and which happily
"capture naughty traffic" on a daily basis
(like there was no tomorrow, in fact)
thanks to images.google.com.
That is why I find this article especially interesting
and insightful.
That's true and very Score:5, Insightful. The only two things which should not be in the application's own directory (i.e. in standard /opt/application_name on Unix)
are configuration files and public libraries.
After all, why would anything besides the config files,
libraries and executables should be in any fixed place?
Oh, yes, the executables,
it's good to have them in PATH without
having every program add its own dir to PATH.
So anyway, there are only three things
which should not be in the application's own directory,
but everything other than configuration,
libraries, executables and manual pages,
that's four, four things,
just four things,
configuration files,
libraries, executables, man pages and
logs, the five things,
the only five things which could be in a central place are
configuration files,
libraries, executables, man pages, logs
and temporary files, I mean six, that's six things,
libraries, executables, man pages, logs,
temp files and pid files,
seven, only seven, the only seven things
are temp files, pid files, lock files...
Amongst the things...
Amongst those directories...
are such elements as fear, surprise.... I'll come in again.
Well... Yes, indeed. [1] [2] [3] [4]
I think I can honestly say I wholeheartedly agree with you. It took me over two hours to write, finding all of the relevant links and manually formatting the HTML, but I didn't do all of that hard work--very hard work, I might add--only to get moderated as Score:5, Informative--not at all!--but rather to provide some useful information to the Slashdot community. Please don't thank me. I only did my duty as a Slashdotter. I am glad I could be helpful. Thank you.
"Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"
I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...
Am I the only one who thinks that with such headlines it is not surprising that we have no lifes?
"Hey, baby! Did you hear that webmasters pounce on wiki sandboxes?"
"OMG! WTF?"
Sad but true.