guy i worked with was doing an internal directory system for a major multinational telecom equipment manufacturer. one of the functions was giving information about various corporate locations in long-form text. we had a manufacturing facility in taiwan. he anticipated the issue, so did what he thought was the best option: he used the official ISO standard on the subject. should be safe, right? nice international standard, and the name even gets picked by the entity represented. so who could argue with that?
well, the official ISO name for taiwan is "Taiwan, Province of China". he got no end of abusive and arguably threatening email on the subject. pointing to the international standard was no use; pointing out that taiwan themselves got to pick what went in the standard was no use. and people from both taiwan and the mainland were upset. eager to reach a nicer solution, he went to what probably seemed like a reasonable authority - the taiwanese consulate in the US - and asked what probably seemed like a reasonable question - what do you claim is the name of your country? it took weeks to get a response. and that wasn't time spent trying to get someone on the phone; he managed to talk to people pretty quickly. but they all responded with something to the effect of "um, i'm not sure. i'll check and get back to you." he eventually got an answer that irritated a different but substantially smaller set of people and went with that.
mostly good advice. you might consider using ssh keys instead of passwords, depending on your environment. the only thing i'd outright disagree with is pre-denying IP ranges based on a guess of where you're likely to log in from. i've had to leave the country on business unexpectedly on very short notice; it'd suck to have been locked out when i landed.
copy and paste is (was? what's the right verb tense with beta code out there?) a legitimate UI issue; app locking isn't UI.
compare the SMS interface. the chat-like representation blows away pretty much everything else. application selection beat most things (on par with palm), scrolling beat pretty much everything, zooming beat pretty much everything, soft keyboard beat pretty much everything (despite still being the weakest part of the UI, imho), popup notification (even before 3.0 beat most things (and is even better now). all of these have a measurable effect on time per action and errors per action.
before the iPhone, most vendors mostly didn't bother with UI design (Nokia did, somewhat; Microsoft sorta did, but making the bet that familiarity would yield better results than changing the model to fit the device; bad bet).
this was not uniform. maybe you really do have one of the few phones by a vendor who cared about UI. but the industry, as a whole, gave the entire topic of UI very short shift. the iPhone and Android finally have vendors paying attention.
most mobile phones, from free up through the $300 models with WinCE on them, have just plain horrid UIs. the iPhone came along and, far from being perfect or even close to it, didn't totally suck. and people were amazed, and wanted to know why their phone didn't not suck. Android also doesn't suck, so now people will have more options. and the best thing the iPhone's done for the industry is making other manufacturers realize that, in the near future, their interfaces are going to have to not suck, too, if they want to remain competitive.
i'm really tired of this whole "it's just marketing" meme. 9 times out of 10 it just indicates that the speaker doesn't know anything about usability. better than half the times it also indicates that they don't know anything about engineering generally.
give the gnu folks some time to add more options. you really will need a separate manual eventually. they'll incorporate most of the uses in the interview as --prepend-input or --bookend-input. right before they make the man page just point to the info page, and start adding networking support like the stupid gnu awk has.
my cat, from Plan 9, has 0 options. cat concatenates files. end of story.
if you really don't feel comfortable in any of those languages, C is the only real choice. both C++ and Java, but especially C++, are absolutely horrid languages to start with. C++ will positively rot your brain. C is still very widely used, although it doesn't get the hype and isn't growing very fast (comparatively), and perhaps more importantly is a great basis for learning - and later, actually programming in - other languages. it much more clearly illustrates the important concepts common to a very wide set of languages, without obscuring things in a ton of junk.
i've done hiring for a handful of projects now, mostly C, Perl, and Limbo (one C++ project, but that was only because i knew the developer was smart enough to ignore most of C++ which isn't C). finding Limbo programmers is, well, very hard, but that's not really all that important if you're hiring real engineers or computer scientists. i've never been on a project where we needed a bunch of code monkeys; that sort of thing has a different set of constraints. even for the perl stuff, i hired real CS people who didn't know perl before hiring code monkeys who did. cognitive ability, mental agility, and real CS skill all trump knowing a particular language. if you know C well, you're in a much better position to learn whatever's needed for the job at hand than if you learn C++ or Java.
note that i'm assuming here that you want to be a Computer Scientist or an Engineer, not just a code monkey. if that's a bad assumption, learn C# and VB and we probably shouldn't talk to each other about technology very much.
also, complain to your CS department. you got screwed.
I have some background with C++ so I'm not starting entirely from scratch.
it'd be better for you if you were. the more C++ you know the more your brain gets twisted. if you know enough other languages, adding C++ on top doesn't have to be killer, but i have a hard time thinking of a language that'd be worse to know as your only one. COBOL, maybe.
They tell you this bat just sorta showed up, but really it was planned all along. The bat is a planned counter-measure: he's going after that spider they lost last year! NASA needs to take the spider out before it learns how to work the bag of tools it stole.
i think at this point, we just have (potentially unresolvable) philosophical differences. you talk about linking applications against a particular library for the shell, and the shell automatically transforming data, and such without recognizing that such things are antithetical to the design, intent, and evolution of the Unix shell. you seem to think that the primary reason for representing things as text is human readability, not that it's an easy, consistent way for applications to deal with things, which isn't why unix does it (or didn't used to be). and a debugger is you're example of a tool where intelligence is good, when it isn't even really a "tool" in the Pike/Kernighan/Unix sense.
i mean it when i say your ideas are interesting, but i think it's a mistake both in implementation and in thinking about it for yourself to frame them in terms of "growing" the Unix shell or whatnot. you really are talking about a parallel universe. but just because i think it's antithetical to Unix's design doesn't mean i'd think it's bad. the original Mac, for example, was very anti-Unix, but the coherent internal design philosophy worked very well for it. similarly, there was an "anti-Mac" design philosophy going around for a while that was similarly logically coherent that i would have loved to explore an implementation of. the opposite of a great idea is often another great idea. just don't try to cram them together; that's always a mess.
the fork thing is just a red herring; that was suggested only as one method of optimization if fork really were a big problem. i didn't believe it was, so the suggestion is really irrellevent.
it is true that perl can be used in such a way that you get better "out of the box" portability with your applications with fewer dependencies. but even in the cases where that's relevent, it's a rare perl program of any size that'll do that. just removing the availability of the backtick operator lops off a bunch.
you're certainly right that any language can be abused, but some lend themselves to it more than others. perl begs for it.
well, i didn't see your message until just after midnight, so i guess it's technically tomorrow. certainly didn't take a day. this is the first thing i came up with which satisfied your requirements; it actually does much more than the minimum work to satisfy that, but this was what i thought of first. it took less than five minutes to write, about one minute of which was because i forgot awk's NF isn't addressed as $NF, and about one was because i kept typing "sed" when i meant "seq".
this OS X box is working in an external firewire disk. i don't happen to have 100,000 small files lying around to poke at, so i started by creating them, in a hugely seek-heavy way. typing this now, i realize i should've scripted my editor to dump segments; oh, well. as expected, the seeks on disk I/O made the creation step take a long time. regardless, the code is nice and easy to read, very straight-forward. each tool does just its job, and the shell puts things together nicely.
: Yud; pwd /Volumes/Twin/sandbox : Yud; ls : Yud; date Fri Mar 6 00:09:53 EST 2009 : Yud; for (i in `{seq 1 100000}) {sed $i^q/usr/share/dict/words | tail -10 > $i} : Yud; date Fri Mar 6 00:52:39 EST 2009 : Yud; >/tmp/youarea : Yud; start=`{9 date -n} : Yud; for (i in *) { echo $i: echo -n '5-letter words:' ; grep '^.....$' $i | wc -l echo -n 'fake multi-words:' ; awk 'BEGIN {FS="k"}; NF==2' $i | wc -l if (grep -q dick $i) echo dick! }>>/tmp/youarea : Yud; stop=`{9 date -n} : Yud; wc -l/tmp/youarea 300020/tmp/youarea : Yud; echo $stop - $start | hoc 2357
that's a bit under 45 minutes to generate all the files, and a bit less than that again to process them, from my firewire 400 disk. how much of that do you think was fork? my money says disk I/O dominates; perl will do zero for you there.
were i doing this for more than a throw-away project, there's a number of obvious optimizations, like loading at least the words file into RAM up front, which would improve performance, too. but these times aren't too bad. it'd be interesting to know how much of it was disk-bound, too; i suspect a good bit.
i don't write perl for the same reason i don't eat moldy bread (if i had to i might, but if there's anything better around, i'll do that instead), but if you'd like to provide a perl program that does the same thing, i'd be happy to benchmark it for you. you're aiming at 1/1000th the time. that gives you about 2.7 seconds to do the file creation and 2.4 seconds to do the processing. good luck. let me know when you haver perl code for me to try that comes with an order of magnitude of your target.
and perhaps i'm being discourteous, but your presumption and ignorance was overwhelming. you need educating.
By the "tools model of environment and application building" - do you mean the Unix tradition of "many small, simple, single-job utilities"?
yes.
To me, that model can only be made better by giving these tools a more powerful standard of communication...
there's some truth to that. but it's not clear how you can do that without increasing the complexity of the individual tools, and making them more aware of their environment. the tools model works as well as it does precisely because the individual tools are, well, dumb.
The thing about that kind of design decision is that it's easy - you don't have to think too hard about the design direction, or do much work to get people to use it - but it's not necessarily the best decision in terms of what you can do with it.
again, you're right in a significant way, but something else is missed. the point is that it's "easier" for every tool. it works best in an environment where you're going to have not a dozen but a hundred tools, written by different people. as the number of tools increases, the incremental cost of increased complexity requirements goes up by that multiple.
It's not a design that scales well...
i think that's exactly wrong, actually. it scales very well, because the incremental costs are so low. although, really, it depends what dimension you're trying to scale across, and i suspect we mean different things here.
...as a text format becomes more complex the problem of handling it (parsing, serializing, etc.) becomes more complex as well - and the advantages of it being a text format start to disappear.
i'm not entirely sure what you mean here, but it sounds like an entirely valid criticism against tools that do a bad job of implementing the model. look at ping, or ps. on modern Unix systems, their output isn't just the data, but some "pretty" header and footer lines. if you want to use this as input to another program, you're right - you need to pre-process the output, and that adds complexity. but that's because those tools are breaking the model: as Doug McIlroy said:
Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.
compare the behavior of these tools on a modern Unix to Plan 9, where this tenet is much more consistently followed.
i'm not sure what you mean by the shell providing automatic translation. if the tools follow the above rule, that's a no-op, mostly, and to the extent that some translation is needed, it seems like that'd be a general enough use to warrant a tool.
i'd rather be asked to add a field to a cron taking plain text input than xml. looked at launchd? i've tried.
A shell that can understand and process the data output of the commands it runs makes the tools easier to understand.
i think that's less likely to be true than you think. if you try to do something "unusual", it's likely that the shell will try to impose its view of the world on you. "intelligence", in software tools, is generally a bad thing.
A shell that establishes standard policies for a program to publish its valid command line switches makes the program easier to use.
this would be a good capability to have, certainly but i don't really see what it has to do with a shell. you'd need every tool to support it (or at least all the tools you want to get the benefit from), for starters. and what would be the role of the shell here? in order for it to make any sort of practical use of the information, you'd need to understand lots of semantics about those options, not just what they are.
for example, let's say the ease of use of scripting means that until a program runs longer than, say, ten minutes to do a job you will prefer script over compiled lnaguages. the equivalent bash program if it does any serious parsing might take 3 hours. This is why one might still use a script language even when "speed" is an issue.
what a stupid non-example. not only is there no example in your example, but there's so many other things wrong with it i don't even know where to start.
first, you've picked a totally arbitrary number for the scripting/compiled schism. how exactly did you come up with that? and is it really true in all cases? what're the constraints? and, of course, the dichotomy is largely false. with on-the-fly compilers (not to mention the old fashioned kind that just apply to new languages), many "scripting languages" are compiled languages. is Java faster because it's compiled? faster than what? for what tasks? i don't know much about bash's internals, but several shells actually include on-the-fly compilers for their input. so, again, i'm curious if you've actually measured anything, or are just talking out your rear. how the heck are you structuring these shell programs that you have enough fork calls to make things 1000x slower? grep certainly isn't going to slow you down itself, nor is awk (i'd be a little surprised if it wasn't the opposite, actually). i've read entire lp systems written in shell scripts that don't have that many forks. and when writing in shell scripts, the shell itself is, for the most part, glue; most of the work happens in other programs (things like sed, awk, and so on), as you note. all of which, of course, are compiled in pretty much every environment. but i guess, sure, you could construct an example where it "might take 3 hours". or it might take 3 minutes. show me some examples. show me the results of your timing trials. otherwise i'll just counter with "perl takes ten times longer to write than rc, runs half as fast, and the average bug count is twice as high". see? i can make up numbers, too.
shell script is preferable to perl because it's easier to read, easier to write reasonably, and has a low overhead cost. your "1000 times slower" number is a figment of your imagination.
perl exactly falls apart when the complexity of the task increases. the language is a mess. i mean, really: compared to most anything out there, it's a hideous mess. the average quality of perl code i've seen is substantially below any language except VB and friends. one can write reasonable perl code, sure - i work with a guy who does a pretty good job - but it's very hard. and that's true both in the sense that there's a strong temptation to write poor code, given the dozens of ways to do everything, and in the sense that being clear in the language slows you down more than being clear in a more properly-structured language.
your biggest concern is fork? really? if it's that bad, you need to get on a more reasonable platform (or try, say, using statically-linked binaries). but being able to hand off tasks to tools designed for them is a huge benefit. you think making socket connections in perl is a reasonable thing to do? oh, lord.
i find it very telling that your big comparison is bash, perhaps the most bloated shell available. if you want a nice, simple, clean, and clear shell, try rc. if the complexity really is as dramatic as you say, and the speed is as important as you imply, you want to be writing in a systems language. try C.
you say i should think of perl as a shell on steroids. if by that you mean it's got a heart problem, looks ridiculous, tends to be overly aggressive, grows out of proportion to utility, and increases depression and suicide, okay, i'll give it to you.
teeth?!? you had teeth? we had to have ours pulled to form the console switches for our micro! tell that to kids these days, and they won't believe you.
PowerShell improves on the functionality offered by Unix shells in that it gives you a richer set of expressive fundamentals; the shell is smarter, knows more about the environment, and that lets it do more with what's around.
the problem is that this isn't simply an extension of the Unix shell; it's actually antithetical to it. the issue isn't the "shell" per se, but the "tools" model of environment and application building. the unix shells are all intentionally fairly ignorant of the environment: they rely on some basic conventions about how programs behave, but beyond that they just move bytes (typically text) around. you can't "improve" the unix shell environment by making it have a more PowerShell-like vocabulary, because it's a vocabulary for things it simply has no knowledge of in the first place.
this is actually a really good thing within the unix- or tools-based environment. in this model, it's a lot easier to generate new bits of functionality without having to muck around with the existing stuff. adding new functionality to the environment doesn't involve changing your shell, just adding a new tool. each tool is easy to digest and understand, and that ability to fully understand the building blocks is what leads to things getting put together in new, interesting, and unexpected ways.
if, as you say, you're interested in looking into how that environment could best be improved upon, i'd advise you to check out Plan 9 or Inferno. they take the "everything is a file" idea from unix and extend it much, much farther. the result is the same sort of "feel" of unix, as far as user interaction goes, but a much, much richer set of expressive abilities. think about Linux's (Plan9-inspired) proc file system, extended to almost every resource in the system (and presented in a more consistent way).
there are interesting ideas in PowerShell; this is in no way intended to be Microsoft bashing. i do, philosophically and practically, believe the tools-based approach to be superior in terms of flexibility, extensibility, and learnability, but that's shouldn't be read as an assertion that other ideas shouldn't be tried. all i'm saying is that shoehorning PowerShell concepts into a Unix shell isn't likely to give you the results you want.
honestly, it's more similar to something like AppleScript. which is great: i like AppleScript's power, but have often found it to be pretty awkward to use and wished it "felt" more like a "regular" shell. maybe PowerShell can do that, which would be great.
except iTunes and the iPod are not examples of that model. rather, using the iTunes Music Store to get content for your iPod is an example of "Anybody can provide, but we do it very well."
i regularly buy content from emusic.com to play on my iPod and iTunes, because if you buy enough the price works out very well. i also have gotten content off things like BitTorrent (both legally and illegally). but having also bought a bunch of stuff through iTMS, they really do provide a much better service. you know what you're getting (i've never found anything there mislabeled); you know the quality is going to be good (quality of the encoding; i make no claims as to your taste in music); you get useful previews of things before you buy ("is this the right live version of All the Single Ladies?"); you know the format will be compatible (not really an issue for audio, but a bigger deal for video); integration is pretty much automatic (some 3rd party things come close, but never as good, and most are very poor); you can buy just what you want, on your schedule (as opposed to something like eMusic). you don't even seem to pay any significant premium (compared to any other legal means) for these benefits.
up until a month or so ago, maybe you could've made some sort of inverse argument, if you had a weird definition of "provide", based on the fact that most iTMS content only played on the iPod, but now even that's gone. the iTMS model is almost exactly what we want to see: open competition, with the vendor competing based on providing the best experience, not based on technical lock-in.
first off, i think the CC licenses (not the deeds) read much better than the *GPL* licenses, and i think this is mostly because the reason GPL is the way it is has to do with reasons other than strictly needing to be long. when GPL3 was being drafted, Stallman and Moglen laid out the four purposes it was supposed to serve; only the first one was properly the role of a license, per se.
i haven't personally read any of the Apache licenses, so i can't comment.
i also don't think it's fair to say that noncommercial software licensing has failed; plenty of applications are distributed under those terms. i think it's a niche role, rightly not taking the place of anything used for more wide-scale distribution, but it's out there and is useful.
to be fair, it certainly isn't an either/or situation, and i don't think the GP post was intending to imply it was. given the huge stores of cultural value locked away behind insane copyright laws, fixing that system is certainly a much bigger deal.
i think you did well until that last one: the answer is really "maybe".
in any legal, moral, or conversational context, the definition of plagiarism is fuzzy. if 99% of the new work is original, i doubt anyone would consider it plagiarism; if 1% is original, nearly everyone (including relevant legal contexts) would say it is. where that bound is varies with context, but there's a large grey border rather than a thin hard line. also, note that public domain dedications affect the legal status of things, but not anything else. schools, for their part, are usually less concerned about the legal status than the moral or intellectual status of the work (at least when evaluating things like plagiarism in an admissions context). a public domain dedication might make it entirely reasonable for me to grab your essay, stamp my name on it, and pass it off as my own, but nobody outside a court room is likely to respect that. especially not if you're willing to stand up and say "um, no." which brings us to the most important point, at least as far as the original question: schools have their own standards for what is or isn't plagiarism. some try to be very strict with citations, and your example use would certainly run afoul of those rules. the answer depends on the definition of the term in that particular context.
more to the point: can someone point me at anything in any US jurisdiction that actually says public domain dedications "don't work"? i've heard this lots over the past few months, but never before then, and it seems to be just plain fabricated.
i know that's the official position, but i disagree, for several reasons, and have software licensed under CC licenses.
first of all, the whole point of the CC licenses is that they make it easy for users to understand what they're getting. CC isn't doing anything "new", legally: licenses with the same effects have existed forever. the same need for clarity exists in software licenses as it does in other domains; CC's win here doesn't stop holding true just because we're talking about software. second, maybe i don't want a "Free" license for whatever reason (like, say, i'm being paid to provide different terms). CC has the advantage of providing easy to understand terms regardless of the degree or type of "Freeness" desired. third, for the non-"Free" licenses in the CC suite, i think it's generally false that there are pre-existing software licenses that are widely known/used and cover the same ground. for example, i'd like to offer some code under either (at the recipient's discretion) CC's BY-NC or BY-SA. BY-SA is normally seen as covered for software by the GPL (but see my next point), but what about BY-NC? i've licensed work (not code in this case, but it certainly could have been) under BY-ND, too. if the CC licenses didn't exist or weren't known to me, i'd almost certainly have written my own (and less well) with the same effects. finally, i take issue with the idea that there are "plenty" of good software licenses out there. most that're larger than the BSD/MIT licenses suck: they're overly long, poorly organized, and often poorly written.
i'm disappointed the CC has an official position recommending against using their work for software; they've done great stuff and i'd love to see it used much more widely. unless or until i see something with a similar breadth of coverage more targeted at software, i'll continue to use their licenses where the terms match what i need (which is almost always).
guy i worked with was doing an internal directory system for a major multinational telecom equipment manufacturer. one of the functions was giving information about various corporate locations in long-form text.
we had a manufacturing facility in taiwan. he anticipated the issue, so did what he thought was the best option: he used the official ISO standard on the subject. should be safe, right? nice international standard, and the name even gets picked by the entity represented. so who could argue with that?
well, the official ISO name for taiwan is "Taiwan, Province of China". he got no end of abusive and arguably threatening email on the subject. pointing to the international standard was no use; pointing out that taiwan themselves got to pick what went in the standard was no use. and people from both taiwan and the mainland were upset.
eager to reach a nicer solution, he went to what probably seemed like a reasonable authority - the taiwanese consulate in the US - and asked what probably seemed like a reasonable question - what do you claim is the name of your country?
it took weeks to get a response. and that wasn't time spent trying to get someone on the phone; he managed to talk to people pretty quickly. but they all responded with something to the effect of "um, i'm not sure. i'll check and get back to you." he eventually got an answer that irritated a different but substantially smaller set of people and went with that.
mostly good advice. you might consider using ssh keys instead of passwords, depending on your environment. the only thing i'd outright disagree with is pre-denying IP ranges based on a guess of where you're likely to log in from. i've had to leave the country on business unexpectedly on very short notice; it'd suck to have been locked out when i landed.
copy and paste is (was? what's the right verb tense with beta code out there?) a legitimate UI issue; app locking isn't UI.
compare the SMS interface. the chat-like representation blows away pretty much everything else. application selection beat most things (on par with palm), scrolling beat pretty much everything, zooming beat pretty much everything, soft keyboard beat pretty much everything (despite still being the weakest part of the UI, imho), popup notification (even before 3.0 beat most things (and is even better now). all of these have a measurable effect on time per action and errors per action.
before the iPhone, most vendors mostly didn't bother with UI design (Nokia did, somewhat; Microsoft sorta did, but making the bet that familiarity would yield better results than changing the model to fit the device; bad bet).
this was not uniform. maybe you really do have one of the few phones by a vendor who cared about UI. but the industry, as a whole, gave the entire topic of UI very short shift. the iPhone and Android finally have vendors paying attention.
what matters is that the interface doesn't suck.
most mobile phones, from free up through the $300 models with WinCE on them, have just plain horrid UIs. the iPhone came along and, far from being perfect or even close to it, didn't totally suck. and people were amazed, and wanted to know why their phone didn't not suck. Android also doesn't suck, so now people will have more options. and the best thing the iPhone's done for the industry is making other manufacturers realize that, in the near future, their interfaces are going to have to not suck, too, if they want to remain competitive.
i'm really tired of this whole "it's just marketing" meme. 9 times out of 10 it just indicates that the speaker doesn't know anything about usability. better than half the times it also indicates that they don't know anything about engineering generally.
give the gnu folks some time to add more options. you really will need a separate manual eventually. they'll incorporate most of the uses in the interview as --prepend-input or --bookend-input. right before they make the man page just point to the info page, and start adding networking support like the stupid gnu awk has.
my cat, from Plan 9, has 0 options. cat concatenates files. end of story.
C.
if you really don't feel comfortable in any of those languages, C is the only real choice. both C++ and Java, but especially C++, are absolutely horrid languages to start with. C++ will positively rot your brain. C is still very widely used, although it doesn't get the hype and isn't growing very fast (comparatively), and perhaps more importantly is a great basis for learning - and later, actually programming in - other languages. it much more clearly illustrates the important concepts common to a very wide set of languages, without obscuring things in a ton of junk.
i've done hiring for a handful of projects now, mostly C, Perl, and Limbo (one C++ project, but that was only because i knew the developer was smart enough to ignore most of C++ which isn't C). finding Limbo programmers is, well, very hard, but that's not really all that important if you're hiring real engineers or computer scientists. i've never been on a project where we needed a bunch of code monkeys; that sort of thing has a different set of constraints. even for the perl stuff, i hired real CS people who didn't know perl before hiring code monkeys who did. cognitive ability, mental agility, and real CS skill all trump knowing a particular language. if you know C well, you're in a much better position to learn whatever's needed for the job at hand than if you learn C++ or Java.
note that i'm assuming here that you want to be a Computer Scientist or an Engineer, not just a code monkey. if that's a bad assumption, learn C# and VB and we probably shouldn't talk to each other about technology very much.
also, complain to your CS department. you got screwed.
it'd be better for you if you were. the more C++ you know the more your brain gets twisted. if you know enough other languages, adding C++ on top doesn't have to be killer, but i have a hard time thinking of a language that'd be worse to know as your only one. COBOL, maybe.
They tell you this bat just sorta showed up, but really it was planned all along. The bat is a planned counter-measure: he's going after that spider they lost last year! NASA needs to take the spider out before it learns how to work the bag of tools it stole.
no, he meant MAC. he really liked the guy's makeup.
i think at this point, we just have (potentially unresolvable) philosophical differences. you talk about linking applications against a particular library for the shell, and the shell automatically transforming data, and such without recognizing that such things are antithetical to the design, intent, and evolution of the Unix shell. you seem to think that the primary reason for representing things as text is human readability, not that it's an easy, consistent way for applications to deal with things, which isn't why unix does it (or didn't used to be). and a debugger is you're example of a tool where intelligence is good, when it isn't even really a "tool" in the Pike/Kernighan/Unix sense.
i mean it when i say your ideas are interesting, but i think it's a mistake both in implementation and in thinking about it for yourself to frame them in terms of "growing" the Unix shell or whatnot. you really are talking about a parallel universe. but just because i think it's antithetical to Unix's design doesn't mean i'd think it's bad. the original Mac, for example, was very anti-Unix, but the coherent internal design philosophy worked very well for it. similarly, there was an "anti-Mac" design philosophy going around for a while that was similarly logically coherent that i would have loved to explore an implementation of. the opposite of a great idea is often another great idea. just don't try to cram them together; that's always a mess.
no, that wouldn't make me happy. ;-)
the fork thing is just a red herring; that was suggested only as one method of optimization if fork really were a big problem. i didn't believe it was, so the suggestion is really irrellevent.
it is true that perl can be used in such a way that you get better "out of the box" portability with your applications with fewer dependencies. but even in the cases where that's relevent, it's a rare perl program of any size that'll do that. just removing the availability of the backtick operator lops off a bunch.
you're certainly right that any language can be abused, but some lend themselves to it more than others. perl begs for it.
well, i didn't see your message until just after midnight, so i guess it's technically tomorrow. certainly didn't take a day. this is the first thing i came up with which satisfied your requirements; it actually does much more than the minimum work to satisfy that, but this was what i thought of first. it took less than five minutes to write, about one minute of which was because i forgot awk's NF isn't addressed as $NF, and about one was because i kept typing "sed" when i meant "seq".
this OS X box is working in an external firewire disk. i don't happen to have 100,000 small files lying around to poke at, so i started by creating them, in a hugely seek-heavy way. typing this now, i realize i should've scripted my editor to dump segments; oh, well. as expected, the seeks on disk I/O made the creation step take a long time. regardless, the code is nice and easy to read, very straight-forward. each tool does just its job, and the shell puts things together nicely.
that's a bit under 45 minutes to generate all the files, and a bit less than that again to process them, from my firewire 400 disk. how much of that do you think was fork? my money says disk I/O dominates; perl will do zero for you there.
were i doing this for more than a throw-away project, there's a number of obvious optimizations, like loading at least the words file into RAM up front, which would improve performance, too. but these times aren't too bad. it'd be interesting to know how much of it was disk-bound, too; i suspect a good bit.
i don't write perl for the same reason i don't eat moldy bread (if i had to i might, but if there's anything better around, i'll do that instead), but if you'd like to provide a perl program that does the same thing, i'd be happy to benchmark it for you. you're aiming at 1/1000th the time. that gives you about 2.7 seconds to do the file creation and 2.4 seconds to do the processing. good luck. let me know when you haver perl code for me to try that comes with an order of magnitude of your target.
and perhaps i'm being discourteous, but your presumption and ignorance was overwhelming. you need educating.
yes.
there's some truth to that. but it's not clear how you can do that without increasing the complexity of the individual tools, and making them more aware of their environment. the tools model works as well as it does precisely because the individual tools are, well, dumb.
again, you're right in a significant way, but something else is missed. the point is that it's "easier" for every tool. it works best in an environment where you're going to have not a dozen but a hundred tools, written by different people. as the number of tools increases, the incremental cost of increased complexity requirements goes up by that multiple.
i think that's exactly wrong, actually. it scales very well, because the incremental costs are so low. although, really, it depends what dimension you're trying to scale across, and i suspect we mean different things here.
i'm not entirely sure what you mean here, but it sounds like an entirely valid criticism against tools that do a bad job of implementing the model. look at ping, or ps. on modern Unix systems, their output isn't just the data, but some "pretty" header and footer lines. if you want to use this as input to another program, you're right - you need to pre-process the output, and that adds complexity. but that's because those tools are breaking the model: as Doug McIlroy said:
compare the behavior of these tools on a modern Unix to Plan 9, where this tenet is much more consistently followed.
i'm not sure what you mean by the shell providing automatic translation. if the tools follow the above rule, that's a no-op, mostly, and to the extent that some translation is needed, it seems like that'd be a general enough use to warrant a tool.
i'd rather be asked to add a field to a cron taking plain text input than xml. looked at launchd? i've tried.
i think that's less likely to be true than you think. if you try to do something "unusual", it's likely that the shell will try to impose its view of the world on you. "intelligence", in software tools, is generally a bad thing.
this would be a good capability to have, certainly but i don't really see what it has to do with a shell. you'd need every tool to support it (or at least all the tools you want to get the benefit from), for starters. and what would be the role of the shell here? in order for it to make any sort of practical use of the information, you'd need to understand lots of semantics about those options, not just what they are.
what a stupid non-example. not only is there no example in your example, but there's so many other things wrong with it i don't even know where to start.
first, you've picked a totally arbitrary number for the scripting/compiled schism. how exactly did you come up with that? and is it really true in all cases? what're the constraints?
and, of course, the dichotomy is largely false. with on-the-fly compilers (not to mention the old fashioned kind that just apply to new languages), many "scripting languages" are compiled languages. is Java faster because it's compiled? faster than what? for what tasks?
i don't know much about bash's internals, but several shells actually include on-the-fly compilers for their input. so, again, i'm curious if you've actually measured anything, or are just talking out your rear.
how the heck are you structuring these shell programs that you have enough fork calls to make things 1000x slower? grep certainly isn't going to slow you down itself, nor is awk (i'd be a little surprised if it wasn't the opposite, actually). i've read entire lp systems written in shell scripts that don't have that many forks.
and when writing in shell scripts, the shell itself is, for the most part, glue; most of the work happens in other programs (things like sed, awk, and so on), as you note. all of which, of course, are compiled in pretty much every environment.
but i guess, sure, you could construct an example where it "might take 3 hours". or it might take 3 minutes. show me some examples. show me the results of your timing trials. otherwise i'll just counter with "perl takes ten times longer to write than rc, runs half as fast, and the average bug count is twice as high". see? i can make up numbers, too.
shell script is preferable to perl because it's easier to read, easier to write reasonably, and has a low overhead cost. your "1000 times slower" number is a figment of your imagination.
perl exactly falls apart when the complexity of the task increases. the language is a mess. i mean, really: compared to most anything out there, it's a hideous mess. the average quality of perl code i've seen is substantially below any language except VB and friends. one can write reasonable perl code, sure - i work with a guy who does a pretty good job - but it's very hard. and that's true both in the sense that there's a strong temptation to write poor code, given the dozens of ways to do everything, and in the sense that being clear in the language slows you down more than being clear in a more properly-structured language.
your biggest concern is fork? really? if it's that bad, you need to get on a more reasonable platform (or try, say, using statically-linked binaries). but being able to hand off tasks to tools designed for them is a huge benefit. you think making socket connections in perl is a reasonable thing to do? oh, lord.
i find it very telling that your big comparison is bash, perhaps the most bloated shell available. if you want a nice, simple, clean, and clear shell, try rc. if the complexity really is as dramatic as you say, and the speed is as important as you imply, you want to be writing in a systems language. try C.
you say i should think of perl as a shell on steroids. if by that you mean it's got a heart problem, looks ridiculous, tends to be overly aggressive, grows out of proportion to utility, and increases depression and suicide, okay, i'll give it to you.
amen, brother, amen.
teeth?!? you had teeth? we had to have ours pulled to form the console switches for our micro! tell that to kids these days, and they won't believe you.
PowerShell improves on the functionality offered by Unix shells in that it gives you a richer set of expressive fundamentals; the shell is smarter, knows more about the environment, and that lets it do more with what's around.
the problem is that this isn't simply an extension of the Unix shell; it's actually antithetical to it. the issue isn't the "shell" per se, but the "tools" model of environment and application building. the unix shells are all intentionally fairly ignorant of the environment: they rely on some basic conventions about how programs behave, but beyond that they just move bytes (typically text) around. you can't "improve" the unix shell environment by making it have a more PowerShell-like vocabulary, because it's a vocabulary for things it simply has no knowledge of in the first place.
this is actually a really good thing within the unix- or tools-based environment. in this model, it's a lot easier to generate new bits of functionality without having to muck around with the existing stuff. adding new functionality to the environment doesn't involve changing your shell, just adding a new tool. each tool is easy to digest and understand, and that ability to fully understand the building blocks is what leads to things getting put together in new, interesting, and unexpected ways.
if, as you say, you're interested in looking into how that environment could best be improved upon, i'd advise you to check out Plan 9 or Inferno. they take the "everything is a file" idea from unix and extend it much, much farther. the result is the same sort of "feel" of unix, as far as user interaction goes, but a much, much richer set of expressive abilities. think about Linux's (Plan9-inspired) proc file system, extended to almost every resource in the system (and presented in a more consistent way).
there are interesting ideas in PowerShell; this is in no way intended to be Microsoft bashing. i do, philosophically and practically, believe the tools-based approach to be superior in terms of flexibility, extensibility, and learnability, but that's shouldn't be read as an assertion that other ideas shouldn't be tried. all i'm saying is that shoehorning PowerShell concepts into a Unix shell isn't likely to give you the results you want.
honestly, it's more similar to something like AppleScript. which is great: i like AppleScript's power, but have often found it to be pretty awkward to use and wished it "felt" more like a "regular" shell. maybe PowerShell can do that, which would be great.
except iTunes and the iPod are not examples of that model. rather, using the iTunes Music Store to get content for your iPod is an example of "Anybody can provide, but we do it very well."
i regularly buy content from emusic.com to play on my iPod and iTunes, because if you buy enough the price works out very well. i also have gotten content off things like BitTorrent (both legally and illegally). but having also bought a bunch of stuff through iTMS, they really do provide a much better service. you know what you're getting (i've never found anything there mislabeled); you know the quality is going to be good (quality of the encoding; i make no claims as to your taste in music); you get useful previews of things before you buy ("is this the right live version of All the Single Ladies?"); you know the format will be compatible (not really an issue for audio, but a bigger deal for video); integration is pretty much automatic (some 3rd party things come close, but never as good, and most are very poor); you can buy just what you want, on your schedule (as opposed to something like eMusic). you don't even seem to pay any significant premium (compared to any other legal means) for these benefits.
up until a month or so ago, maybe you could've made some sort of inverse argument, if you had a weird definition of "provide", based on the fact that most iTMS content only played on the iPod, but now even that's gone. the iTMS model is almost exactly what we want to see: open competition, with the vendor competing based on providing the best experience, not based on technical lock-in.
first off, i think the CC licenses (not the deeds) read much better than the *GPL* licenses, and i think this is mostly because the reason GPL is the way it is has to do with reasons other than strictly needing to be long. when GPL3 was being drafted, Stallman and Moglen laid out the four purposes it was supposed to serve; only the first one was properly the role of a license, per se.
http://mailman.uwc.ac.za/pipermail/nextgen-users/2005-June/000005.html
i haven't personally read any of the Apache licenses, so i can't comment.
i also don't think it's fair to say that noncommercial software licensing has failed; plenty of applications are distributed under those terms. i think it's a niche role, rightly not taking the place of anything used for more wide-scale distribution, but it's out there and is useful.
to be fair, it certainly isn't an either/or situation, and i don't think the GP post was intending to imply it was. given the huge stores of cultural value locked away behind insane copyright laws, fixing that system is certainly a much bigger deal.
i think you did well until that last one: the answer is really "maybe".
in any legal, moral, or conversational context, the definition of plagiarism is fuzzy. if 99% of the new work is original, i doubt anyone would consider it plagiarism; if 1% is original, nearly everyone (including relevant legal contexts) would say it is. where that bound is varies with context, but there's a large grey border rather than a thin hard line.
also, note that public domain dedications affect the legal status of things, but not anything else. schools, for their part, are usually less concerned about the legal status than the moral or intellectual status of the work (at least when evaluating things like plagiarism in an admissions context). a public domain dedication might make it entirely reasonable for me to grab your essay, stamp my name on it, and pass it off as my own, but nobody outside a court room is likely to respect that. especially not if you're willing to stand up and say "um, no."
which brings us to the most important point, at least as far as the original question: schools have their own standards for what is or isn't plagiarism. some try to be very strict with citations, and your example use would certainly run afoul of those rules. the answer depends on the definition of the term in that particular context.
more to the point: can someone point me at anything in any US jurisdiction that actually says public domain dedications "don't work"? i've heard this lots over the past few months, but never before then, and it seems to be just plain fabricated.
i know that's the official position, but i disagree, for several reasons, and have software licensed under CC licenses.
first of all, the whole point of the CC licenses is that they make it easy for users to understand what they're getting. CC isn't doing anything "new", legally: licenses with the same effects have existed forever. the same need for clarity exists in software licenses as it does in other domains; CC's win here doesn't stop holding true just because we're talking about software.
second, maybe i don't want a "Free" license for whatever reason (like, say, i'm being paid to provide different terms). CC has the advantage of providing easy to understand terms regardless of the degree or type of "Freeness" desired.
third, for the non-"Free" licenses in the CC suite, i think it's generally false that there are pre-existing software licenses that are widely known/used and cover the same ground. for example, i'd like to offer some code under either (at the recipient's discretion) CC's BY-NC or BY-SA. BY-SA is normally seen as covered for software by the GPL (but see my next point), but what about BY-NC? i've licensed work (not code in this case, but it certainly could have been) under BY-ND, too. if the CC licenses didn't exist or weren't known to me, i'd almost certainly have written my own (and less well) with the same effects.
finally, i take issue with the idea that there are "plenty" of good software licenses out there. most that're larger than the BSD/MIT licenses suck: they're overly long, poorly organized, and often poorly written.
i'm disappointed the CC has an official position recommending against using their work for software; they've done great stuff and i'd love to see it used much more widely. unless or until i see something with a similar breadth of coverage more targeted at software, i'll continue to use their licenses where the terms match what i need (which is almost always).
great, just what the libertarians need: another excuse to never shut up.
":-)" or something.