Taco Bell Programming

My order by Wingman+5 · 2010-10-24 09:49 · Score: 4, Funny

Can I get a server logging system, hold the email notifications. Can I get extra rotating log files with that?

which language is best? by phantomfive · 2010-10-24 09:49 · Score: 5, Insightful

Reminds me of a job interview I did once with an old guy, he had around 30 different programming languages on his resume. I asked him which programming language was his favorite, expecting it to be something like Lisp or Forth, but he said, "shell script." I was a bit surprised, but he said, "it lets me tie pieces in from everywhere and do it all with the least amount of code."

I wasn't entirely convinced, but he did have the resume. Seems Mr Dziuba is from the same school of thought. I read the full introduction to the DevOps page and I'm still not entirely sure what it's about. We should work together and deliver on time, or something like that.

--
Qxe4

Re:which language is best? by visualight · 2010-10-24 10:02 · Score: 4, Insightful

The DevOps thing is yet another crock of shit on par with 'managing programmers is like herding cats' and web2.0

--
Samsung took back my unlocked bootloader because Google wants me to rent movies. They're both evil.
Re:which language is best? by Anonymous Coward · 2010-10-24 10:12 · Score: 5, Funny

The DevOps thing is yet another crock of shit on par with 'managing programmers is like herding cats' and web2.0
I volunteer at a cat rescue. Herding cats is much easier than dealing with programmers.
Re:which language is best? by martin-boundary · 2010-10-24 10:59 · Score: 4, Insightful

Sadly, Mr Dziuba has the right idea but uses terrible examples in his blogpost.
Wget for crawling tens of millions of web pages using a 10 line script? He doesn't understand crawling at scale.
There's a lot more to it than just following links. For example, lots of servers will block you if you start ripping them in full, so you need to have a system in place to crawl sites over many days/weeks a few pages at a time. You also want to distribute the load over several IP addresses, and you need logic to handle things like auto generated/tar pits/temporarily down sites, etc. And of course you want to coordinate all that while simultaneously extracting the list of URLs that you'll hand over to the crawlers next.
His other example is also bullshit. Tens of millions of webpages are not that much for a single PC, it hardly justifies using MapReduce, especially if you're only going to process pages independently with zero communication between processes.
MapReduce is all about cutting the dataset into chunks, then alternating between 1) an (independent) processing phase on each chunk, and 2) a communication phase where the partial results are combined. And where this really pays off is when you have so much data that you need a distributed filesystem.
Re:which language is best? by ShakaUVM · 2010-10-24 11:02 · Score: 4, Insightful

>>I asked him which programming language was his favorite, expecting it to be something like Lisp or Forth, but he said, "shell script."
Shell script is awesome for a large number of tasks. It can't do everything (otherwise we'd just teach scripting and be done with a CS degree in a quarter), but there's a lot of times when someone thinks they're going to have to write this long program involving a lot of text parsing and you just go, "Well, just Cut out everything except the field you want, pipe it through sort|uniq, and then run an xargs on the result." You get done in an hour (including writing, args checking, and debugging) what another person might spend a week doing in C (which is spectacularly unsuited for such tasks anyway).
Re:which language is best? by gangien · 2010-10-24 12:37 · Score: 2, Insightful

(otherwise we'd just teach scripting and be done with a CS degree in a quarter)
because the programming language has so much to do with CS?
Re:which language is best? by Anonymous Coward · 2010-10-24 13:26 · Score: 2, Interesting

Uhh, what? I can do everything you just mentioned in shell, using standard UNIX utilities and cron for retries. You do not need a magical distributed application to simply crawl websites. You can even have it crawl from multiple boxes if you really want, I can think of very simple UNIX solutions to that too. Web crawling isn't that magical.
Re:which language is best? by MightyMartian · 2010-10-24 14:07 · Score: 2, Interesting

And for real age, it's something that's been known since Unix went into wide-scale usage in the 1970s. The original Bourne shell with the toolset of the time, while obviously limited in some respects, was pretty damned powerful. Pop in some of the newer updates like bash and you have a helluva an environment.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:which language is best? by itlurksbeneath · 2010-10-24 14:13 · Score: 5, Insightful

Bingo. CS has nothing to do with programming languages. It's about PROGRAMMING. Lots of CS grads still don't get this. They are typically the mediocre programmers that move on to project management (or something else that doesn't involve programming) fairly quickly. Or they end up doing horrible ASP web apps and Microsoft Access front ends.

--
Have you ever considered piracy? You'd make a wonderful Dread Pirate Roberts.
Re:which language is best? by ShakaUVM · 2010-10-24 18:02 · Score: 2, Insightful

You prolly should pipe through uniq before sort. You'll get the same results, but sort will be passed less ata leading to faster execution and smaller memory footprint.
IIRC, uniq only collapses adjacent lines that are identical. So hence sort|uniq.
Maybe there's a flag or something that will change that behavior? It'd probably have to do something on the order of sort anyway, though.
Re:which language is best? by Steauengeglase · 2010-10-24 18:10 · Score: 2, Funny

Meh, as long as you have a hunk of meat and a fishing pole, both tasks are the same.
Re:which language is best? by flnca · 2010-10-24 18:15 · Score: 2, Interesting

what another person might spend a week doing in C (which is spectacularly unsuited for such tasks anyway).
A skilled C programmer also needs less than 1 hour for something like that. The standard C library has a lot of text processing functions (like sscanf()), plus it has a qsort(). Ever wonder why the C I/O library is suitable for managing database files? All the field functions in fscanf()/fprintf() etc. are suitable for database management.

Also, C is still one of the prime choice languages for writing compilers, which do a lot of text processing.
Re:which language is best? by ShakaUVM · 2010-10-24 19:40 · Score: 5, Insightful

>>A skilled C programmer also needs less than 1 hour for something like that.
Hmm, well if you want to time yourself, here's a common enough task that I automate with shell scripts. I just timed myself. Including logging in, doing a detour into man and a 'locate access_log' to find the file, it took a bit less than 4 minutes.
tail -n 100 /var/log/apache2/access_log | cut -f1 -d" " | sort | uniq
Grabs the end of the access_log and shows you the last few ip addresses that have connected to your site. I do something like this occasionally. Optionally pipe it into xargs host to do DNS lookups on them, if that's how you prefer to roll.
I'm honestly curious how long it will take you to do it in C, with/without the DNS lookup. Post source if you don't mind.
Re:which language is best? by meyekul · 2010-10-24 22:47 · Score: 5, Funny

tail -n 100 /var/log/apache2/access_log | cut -f1 -d" " | sort | uniq
...
I'm honestly curious how long it will take you to do it in C, with/without the DNS lookup. Post source if you don't mind.
Not long at all...

system("tail -n 100 /var/log/apache2/access_log | cut -f1 -d' ' | sort | uniq");
Re:which language is best? by ShakaUVM · 2010-10-24 23:48 · Score: 3, Interesting

Show me a new grad who is good at programming and I'll bet they didn't learn programming at university. A lot of new grads *think* they are good at programming. But apart from a handful here and there that cut their teeth on other projects, a new grad writing good code out of the gate is virtually unheard of. Hell, most people with 5 years working experience are crap.
Most "real" CS people have been playing around with writing code since a young age. I'd written motion prediction code for a robot, an Axis and Allies simulator, a full AI suite, and a bunch of other stuff before I started college, but I think the university classes really polished my skills. Finite math taught me how to think about structuring loops so they always run correctly, my Theory class let me think about FSMs, CFGs, and Turing machines in a more logical manner, my programming languages and compilers classes really made me understand what was really happening when I hit cc (and also helped explain some of the bizarre compiler errors I'd seen over the years when my own compiler did the same thing), and most importantly, the UCSD CS TAs were absolute Nazis about proper coding technique. Not arbitrarily so, but if you've ever seen some code that made you want to punch someone, that's the sort of thing they knock 25% of your grade off for. Honestly, it really helped.
You're right, though - Computer Science is a very weird mismash of different stuff all jumbled together.
>>And even given the complete failure to actually learn anything that could be called science in their computer science degree 95% of the graduating class hasn't written more than 10K lines of code in their entire life.
Mmm, just looking at my class assignments (that I saved) across 16 classes (quarters, not semesters), I wc at 20k lines of code. This doesn't count stuff that I wrote for fun, for work, or stuff that I deleted because it doesn't matter any more. The actual number should be several times that, that I wrote for school.
IMO, if you're not writing software as a CS student, you're doing something wrong.
Re:which language is best? by itlurksbeneath · 2010-10-25 00:48 · Score: 2, Insightful

Actually, I have a masters in CS. I was trying to make a lexical pun of sorts saying it's not about "programming languages" but about "programming", which, in my mind, is more about the problem solving and design than the actual implementation of a particular program. Once you learn how to program - how to solve problems and design a solution - implementing it in any particular language is just a matter of getting the syntax right.
The programming language itself is a tool. Any particular problem can have solutions in any number of languages, and certain languages are more suited than others for particular problem classes.

--
Have you ever considered piracy? You'd make a wonderful Dread Pirate Roberts.
Re:which language is best? by Schadrach · 2010-10-25 01:31 · Score: 2, Funny

The problem being when the "piece of meat" in one case sues for sexual harassment. =p
Re:which language is best? by bigrockpeltr · 2010-10-25 01:52 · Score: 2, Interesting

what most people fail to realise is that tail,cut,sort,uniq are most likely written in C so why reinvent the wheel? that is the only reason to use shell scripting (when there are existing tools) but surely good luck implementing your command from scratchpurely in bash. technically in C you can just write

system("tail -n 100 /var/log/apache2/access_log | cut -f1 -d\" \" | sort | uniq");
and achieve the same result in 1 line as well :P

--
$ unzip, strip, touch, finger, grep, mount, fsck, more, yes,fsck,fsck,fsck,umount, sleep
Re:which language is best? by Andrew+Cady · 2010-10-25 04:01 · Score: 2, Informative

Wget for crawling tens of millions of web pages using a 10 line script? He doesn't understand crawling at scale.

Wget is made for crawling at scale.

There's a lot more to it than just following links. For example, lots of servers will block you if you start ripping them in full, so you need to have a system in place to crawl sites over many days/weeks a few pages at a time.
wget --random-wait

You also want to distribute the load over several IP addresses
The way I do this with wget is to use wget to generate a list of URLs, then launch a separate wget process with varying source IPs specified with --bind-address. It would, however, be trivial to add a --randomize-bind-address option to wget source.

and you need logic to handle things like auto generated/tar pits/temporarily down sites, etc.
What makes you think you can't handle these things with wget?

And of course you want to coordinate all that while simultaneously extracting the list of URLs that you'll hand over to the crawlers next.

Again, why do you think wget is inadequate to this? It's not.
Any custom-coded wget alternative will be implementing a great deal of wget. Most limitations of wget can be avoided by launching multiple wget processes, putting a bit of intelligence into the glue that does so. If that isn't enough, it probably makes sense to make minor alterations to wget source instead of coding something new.
My point here is just that wget is way more awesome than you give credit.

Re:8 keywords? by Anonymous Coward · 2010-10-24 09:52 · Score: 5, Insightful

exactly.

Those 8 keywords are + - > [ ] . ,

More crap from Ted Dziuba. by Anonymous Coward · 2010-10-24 09:52 · Score: 3, Insightful

Good grief, I think this is yet another useless article from the Ted Dziuba/Jeff Atwood/Joel Spolsky crowd. They spew out article after article after article with, in my opinion, bullshit "insights" that don't hold any water in the real world. Yet they've developed such a large online following, mainly of "web designers", "JavaScript programmers" and "NoSQL DBAs", that it tricks a lot of people in the industry into thinking what they say actually has some usefulness, when it usually doesn't.

Yeah, it's great when we can write a few shell or Perl scripts to perform simple tasks, but sometimes that's just not sufficient. Sometimes we do have to write our own code. While UNIX offers a very practical and powerful environment, we shouldn't waste our time trying to convolute its utilities to all sorts of problems, especially when it'll be quicker, easier and significantly more maintainable to roll some tools by hand.

Re:Software is not food by Anonymous Coward · 2010-10-24 09:57 · Score: 5, Funny

"Compilers are like boyfriends, you miss a period and they go crazy on you."

I limit myself to 2 bits by topham · 2010-10-24 10:06 · Score: 2, Funny

I limit myself to two bits. A 0 and a 1.

Why would I need 8?

Re:I limit myself to 2 bits by The_mad_linguist · 2010-10-24 14:22 · Score: 2, Funny

I limit myself to two ones, very carefully timed.

Re:8 keywords? by hardburn · 2010-10-24 10:07 · Score: 4, Insightful

Ook! Ook?

--
Not a typewriter

Once again, The Onion shows us the way by sootman · 2010-10-24 10:08 · Score: 3, Funny

From over a decade ago: Taco Bell's Five Ingredients Combined In Totally New Way

I think of that every time Taco Bell adds a "new" item to their menu.

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.

From TFA by Jaime2 · 2010-10-24 10:10 · Score: 3, Interesting

From the article:

I made most of a SOAP server using static files and Apache's mod_rewrite. I could have done the whole thing Taco Bell style if I had only manned up and broken out sed, but I pussied out and wrote some Python.

It seems that only software he knows counts as "Taco Bell ingredients". I'd trust Axis (or any other SOAP library) much more than sed to parse a web service request. Heck, if you discount code that you don't directly maintain, SOAP requires very little code other than the functionality of the service itself. I had a boss like this once. He would let you do anything as long as you used tools he was familiar with, but if you brought in a tool that he didn't know, you had to jump through a thousand extra testing hoops. He stopped doing actual work and got into management in the early 90's, so he pretty much didn't know any modern tool. He once made me do a full regression test on a 50KLOC application to get approval to add an index to a Microsoft SQL Server table.

Re:From TFA by metamatic · 2010-10-24 14:32 · Score: 2, Interesting

Heck, if you discount code that you don't directly maintain, SOAP requires very little code other than the functionality of the service itself.
However, any time you change the API--even to make a change that no client should notice--you have to regenerate the glue code from the WSDL and recompile all your client programs. Which is why these days, I build REST-based web services.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak

Simplicity by SimonInOz · 2010-10-24 10:11 · Score: 5, Insightful

The complexity people seem to delight in putting into things always amazes me. I was recently working at a major bank (they didn't like me eventually as I'm bad at authority structures). Anyway the area I was working on involved opening bank accounts from the web site. Complicated, right? The new account holder has to choose the type of account they want (of about 7), enter their details (name, address, etc), and press go. Data gets passed about, the mainframe makes the account, and we return the new account number.

Gosh.

So why, oh tell me why, did they use the following list of technologies (usually all on the same jsp page) [I may have missed some]
HTML
CSS
JSP (with pure java on the page)
Javascript (modifying the page)
JQuery
XML
XSLT
JDBC with Hibernate
JDBC without Hibernate
Custom Tag library
Spring (including AOP)
J2EE EJBs
JMS

Awesome. All this on each of the countless pages, each custom designed and built. Staggering. In fact, the site needed about 30 pages, many of them minor variations of each other. The whole thing could have been built using simple metadtata. It would have run faster, been easier to debug and test (the existing system was a nightmare), and easily changeable to suit the new business requirements that poured in.

So instead of using one efficient, smart programmer for a while, then limited support after that, they had a team of (cheap) very nervous programmers, furiously coding away, terrified lest they break something. And yes, there were layers and layers of software, each overriding the other as the new programmer didn't understand the original system, so added their own. Palimpsest, anyone?

And yet, despite my offers to rebuild the whole thing this way (including demos), management loved it. Staggering.

But I still like to keep things simple. And yes, my name is Simon. And yes, I do want a new job.

--
"Cats like plain crisps"

Re:Simplicity by MichaelSmith · 2010-10-24 10:14 · Score: 5, Insightful

Complexity creates bugs
Bugs create employment

--
http://michaelsmith.id.au
Re:Simplicity by swamp+boy · 2010-10-24 10:24 · Score: 3, Interesting

Sounds like your coworkers are busily filling out their resumes with all the latest fad software tools. Like you, I despise such thinking, and it's why I pass on any job opportunity where 'web apps' and 'java' are used in the same description.
Re:Simplicity by NorbrookC · 2010-10-24 10:53 · Score: 3, Insightful

It seems to me that the point is that programmers have a variant of "if all you have is a hammer, every problem is a nail" saying. In this case, they have a huge toolbox, so every time they need to drive a nail, it means that they must design and use a methodology that will, eventually, cause the nail to be pushed into place, instead of just reaching for the hammer and getting the job done.
Re:Simplicity by SimonInOz · 2010-10-24 12:20 · Score: 3, Interesting

“Debugging is twice as hard as writing code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian Kernighan”

--
"Cats like plain crisps"

Re:Code reuse, junk food example? by Tablizer · 2010-10-24 10:13 · Score: 5, Interesting

I've found the best reuse comes from simple modules, not from complex ones that try to do everything. The one that tries to do everything will still be missing the one feature you need. It's easier to add the features you need to the simple one because it's, well, simpler. With the fancier one you have to work around all the features you don't need to add those that you do need, creating more reading time and more mistakes.

--
Table-ized A.I.

Re:Software is not food by GiveBenADollar · 2010-10-24 10:13 · Score: 4, Insightful

You can easily have a little more or less salt, sugar or flour in your food. However, software is not so forgiving. Change one character and you screw up badly..

Just try substituting a tsp with a tbsp of salt in your favorite recipe and then tell me food it forgiving.

Unexpected by DWMorse · 2010-10-24 10:18 · Score: 3, Interesting

Unexpected comparison of trained coders / developers, many with certifications and degrees, to untrained sub-GED Taco Bell employee... well... frankly, knuckle-draggers.

Also, I don't care if your code is minimal and profitable, if it gives me a sore stomach as Taco Bell does, I'm opting for something more complex and just... better. Better for me, better for everyone.

I get the appeal of promoting minimalistic coding styles with food concepts, and it's a refreshing change from the raggedy car analogies... but come on. Taco Bell? Really??

--
There's a spot in User Info for World of Warcraft account names? Really?

Re:Code reuse, junk food example? by syousef · 2010-10-24 10:33 · Score: 4, Insightful

I've found the best reuse comes from simple modules, not from complex ones that try to do everything. The one that tries to do everything will still be missing the one feature you need. It's easier to add the features you need to the simple one because it's, well, simpler. With the fancier one you have to work around all the features you don't need to add those that you do need, creating more reading time and more mistakes.

Agreed. With most complex frameworks there is also the additional overhead of having to do things in a particular way. If you try to do it differently or need to add a feature that wasn't designed for in the original framework, you often find yourself fighting it rather than working with it. At that point you should ditch the framework, but often it's not your decision to make, and then cost of redoing things once the framework is removed makes it impractical.

--
These posts express my own personal views, not those of my employer

It's easy to overthink even in the simplest cases by Meriahven · 2010-10-24 10:39 · Score: 3, Insightful

I once had a pair of command line tools that both printed lists of words (usernames, actually, one per row), and I wanted to find out how many unique ones there were. Obviously, the right hand side part of the pipeline was going to be something along the lines of " | sort -u | wc -l", but then I got utterly stuck by the left hand side. How can I combine the STDOUTs of two processes? Do I really need to resort to using temporary files? Is there really no tool to do the logical opposite of the "tee" command?

You are probably thinking: "Oh, you silly person, that's so trivial, you must be very incompetent", but in case you aren't, you might want to spend a minute trying to figure it out before reading on. I even asked a colleague for help before realizing that the reason I could not find a tool for the task was quite an obvious one: such a tool does not exist. Or actually it kinda does, but only in an implied sense: what I was hoping to achieve could be done by the humble semicolon and a pair of parens. I only had to put the two commands in parens to run them in a subshell, put a semicolon in between, so one will run after the other is finished, and I was done. I guess it was just that the logical leap from "This task is so simple, there must be a tool for this" to "just run the commands one after another" was too big for my feeble mind to accomplish.

So I guess the moral of the story is, even if you want to use just one simple tool, you may be overthinking it :-)

What is Devops? by hawguy · 2010-10-24 10:41 · Score: 3, Insightful

I read the linked Devops article and know even less about than before I read the article. It's full of management buzzwords and I'm sure a CIO would love it, but what does it mean?

How does Devops help?

The Devops movement is built around a group of people who believe that the application of a combination of appropriate technology and attitude can revolutionize the world of software development and delivery.

...

Beyond this multi-disciplinary approach, the Devops movement is attempting to encourage the development of communication skills, understanding of the domain in which the software is being written, and, crucially, a sensitivity and passion for the underlying business, and for ensuring it succeeds.

oh yeah, that clears it up. All it takes is a passion for the underlying business and it's sure to succeed!

Re:Reference ? by lostmongoose · 2010-10-24 10:49 · Score: 2, Insightful

I'm more disturbed by the fact that you know what dead donkey ass tastes like...

Jim Gaffigan's experience: by gblackwo · 2010-10-24 11:00 · Score: 3, Funny

Mexican food's great, but it's essentially all the same ingredients, so there's a way you'd have to deal with all these stupid questions. "What is nachos?" "...Nachos? It's tortilla with cheese, meat, and vegetables." "Oh, well then what is a burrito?" "Tortilla with cheese, meat, and vegetables." "Well then what is a tostada?" "Tortilla with cheese, meat, and vegetables." "Well then what i-" "Look, it's all the same shit! Why don't you say a spanish word and I'll bring you something." - Jim Gaffigan

Let me tell you a story by Giant+Electronic+Bra · 2010-10-24 11:07 · Score: 5, Interesting

Once, about 20 years ago, I worked for a company who's line of business generated a VERY large amount of data which for legal reasons had to be carefully reduced, archived, etc. There were various clusters of VMS machines which captured data from different processes to disk, from where it was processed and shipped around. There were also some of the 'new fangled' Unix machines that needed to integrate into this process. The main trick was always constantly managing disk space. Any single disk in the place would probably have 2-10x its capacity worth of data moving on and off it in an given day. It was thus VITAL to constantly monitor disk usage in pretty much real time.

On VMS the sysops had developed a system to manage all this data which weighed in at 20-30k lines of code. This stuff generated reports, went through different drives and figured out what was going in where, compared it to data from earlier runs, created deltas, etc. It was a fairly slick system, but really all it did was iterate through directories, total up file sizes, and write stuff to a couple report files, and send an email if a disk was filling up too fast.

So one day my boss asks me to write basically the same program for the Unix cluster. I had a reputation as the guy that could figure out weird stuff. Even had played a small amount with Unix systems before. So I whipped out the printed Man pages and started reading. Now I figured I'd have to write a whole bunch of code, after all I'm duplicating an application that has like 30k lines of code in it, not gigantic but substantial. Pretty soon though I learned that every command line app in Unix could feed into the other ones with a pipe or a temp file. Pretty soon I learned that those apps produced ALL the data that I wanted and produced it in pretty much the format that I needed. All that I really had to do was glue it together properly. Pretty soon I (thank God it starts with A) I found awk, and then sed. 3 days after that I had 2 awk scripts, a shell script that ran a few things through sed, a cron job, and a few other bits. It was maybe 100 lines of code, total. It did MORE than the old app. It was easy to maintain and customize. It saved a LOT of time and money.

There's PLENTY to recommend the KISS principle in software design. Not every problem can be solved with a bit of shell coding of course, but it is always worth remembering that those tools are tried and true and can be deployed quickly and cheaply. Often they beat the pants off fancier approaches.

One other thing to remember from that project. My boss was the one that wrote the 30k LoC monstrosity. The week after I showed her the new Unix version, I got downsized out the door. People HATE it when you show them up...

--
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson

Re:Let me tell you a story by oldhack · 2010-10-24 11:48 · Score: 2, Funny

You had it coming, smart ass.

--
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Re:Let me tell you a story by 19thNervousBreakdown · 2010-10-24 11:53 · Score: 3, Interesting

How, exactly, are they brittle? I've heard this term used a number of times, but never actually seen a prediction of brittleness be an accurate predictor of any amount of bugs, maintenance issues, or really any negative outcome. As far as I can tell, it's just a weasel word to be used when you don't like something for aesthetic reasons or understand it fully.
So, prove me wrong. Explain exactly what's bad about using code that's been more heavily used and tested in production systems than just about anything else for more than 20 years.

--
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Re:Let me tell you a story by Jaime2 · 2010-10-24 11:59 · Score: 2, Insightful

BTW, awk is a programming language. Really, all you did was to write their process in a different language, not convert it from a custom program to some built in tools.

As a side note, I have a hard time with the concept that it took the VMS guys 30000 lines of code to do what could be done with a handful of regular expressions. They were either really bad at it, or it had grown for years and nobody had the guts to purge the dead code.
Re:Let me tell you a story by Giant+Electronic+Bra · 2010-10-24 12:55 · Score: 3, Informative

Sure, awk is a programming language. It is also a command line tool. A bit more flexible than most, but you can't really draw a line between something that is a programming language and something that is a 'built in tool'.
I really have no idea WHY their code was so large. It was all written in FORTRAN and VMS is honestly a nightmarishly complicated operating environment. A lot of it is probably related to the fact that Unix has a much simpler and more orthogonal environment. Of course this is also WHY Unix killed VMS dead long ago. Simplicity is a virtue. This is why Windows still hasn't entrenched itself forever in the server room. It lacks the simple elegance of 'everything is a byte stream' and 'small flexible programs that simply process a stream'. Those are powerful concepts upon which can be built a lot of really complex stuff in a small amount of code.

--
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
Re:Let me tell you a story by symbolic · 2010-10-24 13:08 · Score: 2, Insightful

This kind of story makes me laugh when I see/hear anecdotes that have management talking about metrics like LoC.
Re:Let me tell you a story by Securityemo · 2010-10-24 13:32 · Score: 3, Insightful

The likelihood of pipe I/O changing in the basic UNIX toolkit is near zero. Or is it just that you (and/or managers and other "people-person" types) need someone to sign the dotted line for you to feel certain that things are as they should?

--
Emotions! In your brain!
Re:Let me tell you a story by Americano · 2010-10-24 13:36 · Score: 4, Insightful

This seems to be an odd criticism.
It's like calling perl/python/C subroutines "brittle" because if you change the arguments or return values of any of them, all hell can break loose.
'Brittle' to me means that ridiculous assumptions don't hold true *often*, leading to breakage, like this gem I found the other day in a developer's installation script:
envfile = `ls -1tr /tmp/*.tar | tail -1`
cp ${envfile} /apps/prod
tar -xvf /apps/prod/${envfile}
In other words - the envfile is "the single most recent tar file in the /tmp directory," with no checks or verification to ensure that it was the right one before you blasted it on into production.
That's what I'd call 'brittle' programming, anyway - likely to break, and break spectacularly because you haven't thought through your requirements clearly, or bothered to verify that inputs are reasonable and sane.
Re:Let me tell you a story by 19thNervousBreakdown · 2010-10-24 14:20 · Score: 2, Funny

It's like you've purposefully made an entire post full of weasel words, and even sentences! "Metaphorically if you try to bend them at all they shatter rather spectacularly, they are brittle." Well done, sir.

--
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Re:Let me tell you a story by badboy_tw2002 · 2010-10-24 14:28 · Score: 2, Insightful

Why would you assume he's talking about pipe I/O. If you're talking portability and dependencies, then yes, I'd say something that uses a bunch of smaller tools might have more brittleness than something that is entirely contained in code controlled by the maintainer. Its really not that far a stretch to say that you upgrade a machine and a newer version of some utility changes some output your script depends on, and boom, your process now comes to a halt. That's what brittle means in this situation.
Re:Let me tell you a story by LingNoi · 2010-10-24 16:15 · Score: 4, Insightful

and how is that any different from a python library being updated and changing your program. Completely pointless argument.
Re:Let me tell you a story by LingNoi · 2010-10-24 16:20 · Score: 2, Insightful

C++ libraries are brittle because as a whole, and as far as their individual parts go, they assume a certain input set and generate a certain output set. If these assumptions turn out to be incorrect they will fail, sometimes spectacularly, and often it will take a serious amount of time and effort to determine exactly where the problem is.
Fixed that for you. In other words your argument can be applied to any programming anywhere.
Re:Let me tell you a story by Kjella · 2010-10-24 17:19 · Score: 3, Insightful

LOCs is roughly as meaningless as valuing a document by its word count. You could spend tons on research on something summed up in a few pages, or get an endless word diarrhea of mindless babble spewed out at 300 WPM. But people need to measure progress. Yes, I've seen how it gets when nobody measures progress and everyone pretends the last 10% of code will suddenly turn a turd into a gem, if so expect the people with survival skills to disappear some 80% into the project. Another disastrous variation is to leave it entirely up to the subjective opinion of the manager, which in any sizable company means your career depends on your favor with the PHB and his lying skills compared to the other PHBs.
Saying it's bad is like shooting fish in a barrel. Coming up with a good system of objectively measuring code design and quality that works in a large organization is ridiculously hard. Particularly since everybody tries to wiggle out of the definitions and go with what you measure, if you made avoiding LoC a metric then the lines would compacted to the point of obfuscation with hideous cross calling to save lines. You want people to hit a sane level of structuring and code reuse, neither LoC bloat nor 4k compos.

--
Live today, because you never know what tomorrow brings
Re:Let me tell you a story by cratermoon · 2010-10-24 17:33 · Score: 3, Informative

Well, Linux IS Unix, just without the trademark, but I didn't really come here to correct your misconception on that.
What I wanted to highlight was the reality behind your statements "we have fifty times as many Windows servers as the other two combined" and "The building where I work has a ratio of about 1 production Windows server for every four employees. If you count non-production servers, we have more Windows servers than people."
This is most certainly not because Windows is so much better or more popular than the other platforms at your place of work. Any experienced sysadmin who is not a Microsoft apologist will confirm that for any typical datacenter server function, it's necessary to have more instances of Windows to get the same capacity, reliability and uptime as few instances of other server operating systems. It's just the nature of the Microsoft stack that effective load-sharing and failover are a necessity in capacity planning. Anyone who argues that a single instance of Windows is equal to a single instance of AIX or Linux has simply never been part of real world datacenter administration.
In short, your employer may have a lot more Windows servers than anything else, but that certainly doesn't mean Windows is better or more popular -- it just demonstrates how the TCO of Windows is terrible.
Re:Let me tell you a story by dkf · 2010-10-24 20:16 · Score: 3, Interesting

something that uses a bunch of smaller tools might have more brittleness than something that is entirely contained in code controlled by the maintainer
Not necessarily. The unix tools are very well specified by comparison with most libraries used in nearly any language you care to name (they're in the POSIX spec) so there's a substantial amount that you can rely on, and rely on long-term. They can be composed poorly, of course, but bad programmers can write bad programs in anything so it's (close to) a null argument.
Brittleness in shell scripts typically refers to assumptions of particular filesystem layouts or that nobody will be silly enough to put odd characters in filenames (if only that were true!) but piped IO is very stable and well tested.

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Let me tell you a story by dkf · 2010-10-24 20:20 · Score: 2, Informative

How, exactly, are they brittle?
The principal brittleness of shell scripts is their assumption that filenames do not contain odd characters like spaces. Most other languages don't do auto-splitting of every argument and so won't break when some user insists on creating a directory called "Documents and Settings"...
(You can write armored shell scripts that cope just fine with this - I've done that quite a bit over the years - but a lot of people don't.)

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Let me tell you a story by Hognoxious · 2010-10-24 22:03 · Score: 2, Insightful

If any single one of the tools you're running on changes its input or output even slightly the whole thing can fall apart in a rather spectacular manner.
One, that very rarely happens with the common unix utilities.
Two, ever heard of change management, testing and QA? If you're the kind of idiot who flings patches and updates willy-nilly onto a production box then sorry, but you deserve to have the sky fall on your head.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

8? I thought it was 3 ... by Zero__Kelvin · 2010-10-24 11:15 · Score: 2, Funny

I thought there were three basic ingredients:

Protons
Neutrons
Electrons

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun

Re:Code reuse, junk food example? by aztracker1 · 2010-10-24 11:25 · Score: 2, Insightful

I think this is one of the reasons why jQuery has become so popular... it does "just enough" in a consistent (relatively) way, with a decent plug-in model... so stringing things together works pretty well, and there is usually a plugin for just about anything you'd want/need. Though it's maybe a bit heavier than hand crafted code, stringing jQuery and plugins is less debt, with more reuse. I do have a few things in my current js toolbox... namely some JS extensions, json2 (modified), date.js, jquery, jquery templates and jquery validation... from there I tweak the ui with PIE and wire up the validators... jquery is about 70k (min uncompressed), with 60k for my js/browser extensions, and 60k for jquery extensions... another 20K in site/app scripts... usually dead cache load is around 400k on a page (around 220k with gzip) and in about 12-16 http requests... using deferred script calls, under a second for initial load, and under another 1.4 seconds for js & binding... A bit thick, but works very well.

--
Michael J. Ryan - tracker1.info

Re:Software is not food by sakdoctor · 2010-10-24 11:27 · Score: 2, Interesting

Had a friend confuse bulbs of garlic with cloves of garlic. Niiice.

Re:Software is not food by T+Murphy · 2010-10-24 11:36 · Score: 3, Funny

Had a friend confuse bulbs of garlic with cloves of garlic.

My uncle made that mistake once. It resulted in everyone asking him for the recipe (true story).

--
My webcomic

Re:8? I thought it was 3 ... by dakameleon · 2010-10-24 11:44 · Score: 2, Insightful

I thought there were 6 - up, down, strange, charm, top and bottom?

--
Man who leaps off cliff jumps to conclusion.

Re:8 keywords? by tompaulco · 2010-10-24 11:49 · Score: 2, Insightful

I find that most defects in the English that is produced is due to the use of words that are not in the vocabulary. A sufficiently intelligent compiler (listener) is able to successfully compile the code even though the programmer(speaker) did not write it correctly, which unfortunately only reinforces the bad habit of the programmer.
I saw this same behavior in Internet explorer a few days ago. Someone complained that "Firefox isn't working", because an ASP page had a malformed link in it. IE was "smart" enough to unmangle it and display it. Firefox chose not to try to outthink the programmer and reinterpret the mess that had been passed to it. The users assumption was that Firefox was broken. I would argue the opposite.

--
If you are not allowed to question your government then the government has answered your question.

Re:8? I thought it was 3 ... by digitig · 2010-10-24 11:51 · Score: 2, Funny

No, it's up, down, sideways. sex-appeal and peppermint.

--
Quidnam Latine loqui modo coepi?

Weak error handling by Animats · 2010-10-24 11:54 · Score: 4, Informative

A big problem with shell programming is that the error information coming back is so limited. You get back a numeric status code, if you're lucky, or maybe a "broken pipe" signal. It's difficult to handle errors gracefully. This is a killer in production applications.

Here's an example. The original article talks about reading a million pages with "wget". I doubt the author of the article has actually done that. Our sitetruth.com system does in fact read a million web pages or so a month. Blindly getting them with "wget" won't work. All of the following situations come up routinely:

There's a network error. A retry in an hour or so needs to be scheduled.
There's an HTTP error. That has to be analyzed. Some errors mean "give up", and some mean "try again later".
The site's HTML contains a redirect, which needs to be followed. "wget" won't notice a redirect at the HTML level.
The site's "robots.txt" file says we shouldn't read the file from a bulk process. "wget" does not obey "robots.txt".
The site is really, really slow. Some sites will take half an hour to feed out a page. Maybe they're overloaded. Maybe their denial of service detection software has tripped and is metering out bytes very slowly in defense. You don't want this to hold up the entire operation. Last week, for some reason, "orbitz.com" did that.
The site doesn't return data at all. Some British university sites have a network implementation which, if asked for a HTTPS connection, does part of the SSL connection handshake and then just stops, leaving the TCP connection open but sending nothing. This requires a special timeout.
The site doesn't like too many simultaneous connections from the same IP address. We limit our system to three simultaneous connections to a given site, so as not to overload it.

That's just reading the page text. More things can go wrong in parsing.

Even routine reading of some known data page requires some effort to get it right. We read PhishTank's entire XML list of phishing sites every three hours. Doing this reliably is non-trivial. PhishTank just overwrites their file when they update, rather than replacing it with a new one. (This is one of the design errors of UNIX, as Stallman once pointed out. Yes, there are workarounds they could do.) So we have to read the file twice, a minute apart, and wait until we get two identical copies. Then we have to check for 1) an empty file, 2) a file with proper XML structure but no data records, and 3) an improperly terminated XML file, all of which we've encountered. Then we pump the data into a MySQL database, prepared to roll back the changes if some error is detected.

The clowns who try to do stuff like this with shell scripts and cron jobs spend a big fraction of their time dealing manually with the failures. If you do it right, it just keeps working. One of my other sites, "downside.com", has been updating itself daily from SEC filings for over a decade now. About once a month, something goes wrong with the nightly update, and it's corrected automatically the next night.

Re:Weak error handling by Eskarel · 2010-10-24 13:33 · Score: 2, Informative

That was HTML redirects (well likely more specifically javascript redirects), not HTTP redirects.
Re:Weak error handling by arth1 · 2010-10-24 14:21 · Score: 4, Informative

The site's HTML contains a redirect, which needs to be followed. "wget" won't notice a redirect at the HTML level.
Actually, it does. But in any case, this is why you parse the HTML after fetching it with wget -- how else can you get things like javascript generated URLs to work?

The site's "robots.txt" file says we shouldn't read the file from a bulk process. "wget" does not obey "robots.txt".
From the wget man page:
Wget can follow links in HTML, XHTML, and CSS pages, to create local
versions of remote web sites, fully recreating the directory structure
of the original site. This is sometimes referred to as "recursive
downloading." While doing that, Wget respects the Robot Exclusion
Standard (/robots.txt).

The site is really, really slow. Some sites will take half an hour to feed out a page.
And you still haven't looked at the wget(1) man page, or you'd know about the --read-timeout parameter.

Maybe they're overloaded. Maybe their denial of service detection software has tripped and is metering out bytes very slowly in defense. You don't want this to hold up the entire operation. Last week, for some reason, "orbitz.com" did that.
Not holding up your operation is why you use multiple tools that can run concurrently. A wget of orbitz.com taking forever won't prevent the wget of soggy.com that you scheduled for half an hour later, and neither will stop the parser.
Of course, if you design an all-eggs-in-one-basket solution that depends on sequential operations, you deserve what you get.

The site doesn't return data at all. Some British university sites have a network implementation which, if asked for a HTTPS connection, does part of the SSL connection handshake and then just stops, leaving the TCP connection open but sending nothing.
This requires a special timeout.
Yes, the --connect-timeout.

The site doesn't like too many simultaneous connections from the same IP address. We limit our system to three simultaneous connections to a given site, so as not to overload it.
wget limits to a single connection with keep-alive per instance. (If you want more, spawn more wget -nc commands)

Even routine reading of some known data page requires some effort to get it right. We read PhishTank's entire XML list of phishing sites every three hours. Doing this reliably is non-trivial. PhishTank just overwrites their file when they update, rather than replacing it with a new one.
That's no problem as long as you pay attention to the HTTP timestamp.

(This is one of the design errors of UNIX, as Stallman once pointed out. Yes, there are workarounds they could do.) So we have to read the file twice, a minute apart, and wait until we get two identical copies. Then we have to check for 1) an empty file, 2) a file with proper XML structure but no data records, and 3) an improperly terminated XML file, all of which we've encountered.
Oh. My.
I'd do a HEAD as the second request, and check the Last-Modified time stamp.
If the Date in the fetch was later than this, and you got a 2xx return code, all is well, and there's no need to download two copies, blatantly disregarding the "X-Request-Limit-Interval: 259200 Seconds" as you do.
It'd be much faster too. But what do I know...

The clowns who try to do stuff like this with shell scripts and cron jobs spend a big fraction of their time dealing manually with the failures.
The clowns who do stuff like this with the simplest tools that do the job (
Re:Weak error handling by jklovanc · 2010-10-24 20:11 · Score: 4, Insightful

It is interesting that wget does not handle errors other than ignoring them and trying to continue. The original poster's first and second point are not addressed. Does that mean the operator has to manually monitor the crons and restart the ones that failed?

The site is really, really slow. Some sites will take half an hour to feed out a page.
And you still haven't looked at the wget(1) man page, or you'd know about the --read-timeout parameter.

Maybe they're overloaded. Maybe their denial of service detection software has tripped and is metering out bytes very slowly in defense. You don't want this to hold up the entire operation. Last week, for some reason, "orbitz.com" did that.
Not holding up your operation is why you use multiple tools that can run concurrently. A wget of orbitz.com taking forever won't prevent the wget of soggy.com that you scheduled for half an hour later, and neither will stop the parser.
Of course, if you design an all-eggs-in-one-basket solution that depends on sequential operations, you deserve what you get.
How do you schedule orbitz.com to go off and then soggy.com to go off later? What of you are handling hundreds of different web sites? Hundreds of crons? How do you retry later on sites that are very slow at the moment? How would you know that wget timed out due to slow download?

The site doesn't return data at all. Some British university sites have a network implementation which, if asked for a HTTPS connection, does part of the SSL connection handshake and then just stops, leaving the TCP connection open but sending nothing.
This requires a special timeout.
Yes, the --connect-timeout.
The connection has been made so it is not --connect-timeout it is --read-timeout. That is the problem, there is no different timeout when you are slowly getting data vs getting no data.

The site doesn't like too many simultaneous connections from the same IP address. We limit our system to three simultaneous connections to a given site, so as not to overload it.
wget limits to a single connection with keep-alive per instance. (If you want more, spawn more wget -nc commands)
You missed the point; it is not more connections it is limiting connections. Say I am crawling five different sites using host spanning and they all link to the same site. Since there is no coordination between the wgets it is possible for all of the to connect to the same site at the same time. What if I have 100 crawlers at the same time?
The original poster is right; using wget ignores errors (timesout) and does not report them so there is no way of programaticly figuring out what went wrong and react to it.
Things wget does not do: avoid known non responsive pages, requeue requests that have timed out or log them so that are not tried again, coordinate multiple crawls so they do not hit the same server simultaneously, handle errors itself. There are probably more.
This is a perfect example of the 80/20 rule. The "solution" may cover 80% of the problem but that final 20% will require so much babysitting as to make it unusable. Wget is not an enterprise level web crawler.

Re:Reference ? by couchslug · 2010-10-24 11:59 · Score: 2, Funny

That post is worthless without pics!

--
"This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."

Re:It's easy to overthink even in the simplest cas by _LORAX_ · 2010-10-24 12:42 · Score: 3, Informative

Psst,

" | sort | uniq -c "

Will sort and then count repetitive lines and output count, line. You can pipe the result back through sort -n if you want a frequency sort or sort -k 2 for item sorting.

Re:Reference ? by msaavedra · 2010-10-24 12:54 · Score: 2, Funny

Taco Bell ingredients are great for quickly passing through your pipeline

That's why one of my friends calls the place Taco Bowel. It's much more descriptive than the commonly-heard Taco Hell.

--
"Any fool can make a rule, and any fool will mind it."
--Henry David Thoreau

My code is Taco Bell food at its finest by rwwyatt · 2010-10-24 13:09 · Score: 3, Funny

It leaves results in your shorts.

Re:Code reuse, junk food example? by scourfish · 2010-10-24 13:27 · Score: 2, Insightful

Your argument is terribly flawed; Taco Bell doesn't serve French fries.

Re:8 keywords? by Suzuran · 2010-10-24 14:13 · Score: 2, Interesting

I think programming on an old machine should be required for any sort of programming course. It would teach people to conserve resources and think about how the machine works.
He who cannot program in 64K cannot program in more.

Re:8 keywords? by Nursie · 2010-10-24 15:10 · Score: 3, Funny

You had 1s?

Luxury! When I was a lad we had to program everything using only zeros!

Re:8 keywords? by iamnobody2 · 2010-10-24 15:23 · Score: 4, Informative

8 ingredients, no. i've worked at a taco bell, there's a few more then that. this is most Hot Line: beef, chicken, steak, beans, rice, potatoes, red sauce, nacho cheese sauce, green sauce (only used by request), cold line: lettuce, tomatos, cheddar cheese, 3 cheese blend, onions, fiesta salsa (pico de gallo, the same tomatos and onions mixed with a sauce), sour cream, gaucamole, baja sauce, creamy jalapeno sauce. plus 5 kinds/sizes of tortillas (3 sizes of regular, 2 sizes of die cut) nacho chips, etc etc here's an interesting fact, those Cinnamon Twists you may or may not love? they're made of deep fried rotini (a type of pasta, usually boiled)

--
nobody's perfect

Re:8 keywords? by Shadow+of+Eternity · 2010-10-24 16:10 · Score: 2, Insightful

"When you work in a monkeyhouse you're more used to having shit thrown at you".

--
A bullet may have your name on it but splash damage is addressed "To whom it may concern."

Re:8 keywords? by Anonymous Coward · 2010-10-24 18:22 · Score: 5, Funny

Noli strepere.

Per tempus mei, "zero" non habemus. Numerorum Romanorum usi eramus.

Re:8 keywords? by drjzzz · 2010-10-25 01:43 · Score: 2, Informative

So if I limit myself to 8 keywords my code has less defects and is more maintainable?

... fewer defects. Never mind.

--
to err is human, to forgive is divine, to forget is... umm...

Re:It's easy to overthink even in the simplest cas by eap · 2010-10-25 07:09 · Score: 2, Informative

Psst,

" | sort | uniq -c "

Will sort and then count repetitive lines and output count, line. You can pipe the result back through sort -n if you want a frequency sort or sort -k 2 for item sorting.

The problem was not figuring out how to count the unique items. It's the part before the pipe that was difficult. The poster needed to combine the results of two different commands and then compute the unique items. The solution would have to be, logically, "command1 + command2 | sort | uniq -c".

Unless you can find a way to pass the output from command1 through command2, you will lose command1's data. The solution he/she found was elegant: (command1):(command2) | someKindOfSort. My syntax is probably wrong. If you were simply pointing out a better way to sort, then please disregard.

82 of 394 comments (clear)