Slashdot Mirror


Researchers Expanding Diff, Grep Unix Tools

itwbennett writes "At the Usenix Large Installation System Administration (LISA) conference being held this week in Boston, two Dartmouth computer scientists presented variants of the grep and diff Unix command line utilities that can handle more complex types of data. The new programs, called Context-Free Grep and Hierarchical Diff, will provide the ability to parse blocks of data rather than single lines. The research has been funded in part by Google and the U.S. Energy Department."

27 of 276 comments (clear)

  1. Strange names by gnasher719 · · Score: 4, Funny

    Space characters in the name of a Unix command line tool is asking for trouble.

    1. Re:Strange names by realyendor · · Score: 4, Insightful

      I expect those are just the spoken names and that the commands will still be single words, similar to:
      "GNU awk" -> gawk
      "enhanced grep" -> egrep

    2. Re:Strange names by adonoman · · Score: 3, Interesting

      But having to use quotes every time you call a command is a sure way to make sure your command is never used.

      Would you rather type this:
      ./"Context-Free Grep" ...
      or this:
      ./cfgrep ..

    3. Re:Strange names by ripler · · Score: 4, Funny

      Next thing you know we'll have CSIgrep. (enhance enhance enhance grep)

    4. Re:Strange names by Longjmp · · Score: 4, Insightful

      Definitely
      II mean, where would we end up if unix commands actually give a hint what they are doing ;-)
      As a unix novice, if I wanted to search for something, my first choice of course would be grep
      Also if I wanted help on something, the first word that jumps to my mind would be man

      heh.

      --
      There are fewer illiterates than people who can't read.
    5. Re:Strange names by mytec · · Score: 4, Informative

      According to this paper, they are called bgrep and bdiff.

    6. Re:Strange names by EdIII · · Score: 3, Informative

      and I really should spend a few more seconds thinking about what I'm responding to

      That's not what Slashdot is about........

    7. Re:Strange names by Anne+Thwacks · · Score: 4, Funny

      CSIgrep would take 30 mins to get the result! (With ad breaks)

      --
      Sent from my ASR33 using ASCII
    8. Re:Strange names by iluvcapra · · Score: 4, Insightful

      If you don't like a tool's name, export an alias.

      It's not about typing commands as much as it's about making these work:

      $ find . -name ".txt" | xargs wc
      $ for file in $*; do
      mv $file old/$file
      done

      Versus these:

      $ find . -name ".txt" -print0 | xargs -0 wc
      $ for file in $*; do
      mv "$file" "old/$file"
      done

      A lot of scripts you run into are just broken because of braindead assumptions.

      --
      Don't blame me, I voted for Baltar.
    9. Re:Strange names by toadlife · · Score: 4, Funny

      "I have only been able to come up with one algorithm for creating Unix command names: think of a good English word to describe what you want to do, then think of an obscure near- or partial-synonym, throw away all the vowels, arbitrarily shorten what's left, and then, finally, as a sop to the literate programmer, maybe reinsert one of the missing vowels."

      Rachel Padman

      --
      I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
    10. Re:Strange names by mfnickster · · Score: 4, Insightful

      There's nothing that says the name of the tool and the command you type must be the same

      Very true. Unix programmers seem to follow these rules:

      1. delete any spaces in the name
      2. delete any vowels in the name
      3. delete any superfluous consonants
      4. chuck the entire thing and just abbreviate it to the first letter of each word in the name

      So these tools will likely be run as "ctxtfrgrp" and "hierdiff" or just "cfgrep" and "hdiff"

      --
      "Slow down, Cowboy! It has been 3 years, 7 months and 26 days since you last successfully posted a comment."
    11. Re:Strange names by jd · · Score: 5, Funny

      You have to figure in two's complement notation. If it's sufficiently counter-intuitive, the sign bit flips over and it becomes totally intuitive.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    12. Re:Strange names by jejones · · Score: 3, Interesting

      Alas, history and lots of shell scripts have probably made existing command names unchangeable. History in this case goes back to the time people got RSI from ASR-33 Teletypes and didn't want to have to type very much, and names that make sense only if you know other programs (in ed, "g//p" prints all lines containing the specified regular expression, hence the name "grep").

      That said, we programmers are users of programming languages as much as Joe Sixpack is a user of the desktop, and surely we deserve good design as much as they do, so we can get things done rather than taking perverse pride in mastering needlessly ghastly syntax.

    13. Re:Strange names by morgauxo · · Score: 4, Insightful

      GP was a joke I am sure.

      As to yours though.. I wouldn't want spaces in my commands. How do you tell where the command ends and the arguments begin?

      As for man... man is the MANual. That's not that bad is it? Ok, help might be a little better but it's not a big deal unless you are very closed minded. It's really a history thing. Man wasn't just somebody's idea of a help command. Unix (or Unics as it was called back then) originally actually had a manual. As in dead trees paper! It got big. Real big. One day Dennis Ritchie accidentally dropped a copy and killed his dog. Flattened the poor girl like a pancake. After that he decided it needed to be digital. Man is a digital copy of that original dog killing book plus decades of additions and updates. Thus it is man(ual).

      Now should manual have been "manual" or maybe the real whole title "Unix Programmers Manual"? It might be easier to remember. 5 years after you learned that command and you are still typing it 5 times a day would you still appreciate the ease of using real whole English words? Are you that abc? (abreviationally challenged) Or do you just really love typing. Is your r/l name Mavis Beacon?

      That's how a lot of Unix commands are, they make plenty of sense with history. I'm sure grep and the others all have their own stories. Well.. not all. How much of a story does it take to come to ls is a lazy way to type list? Oh, yah, you are AbC. Sorry about that.

      Yes, the history of decades old programming decisions isn't really something you want to learn to use an OS (or any other software). But what's the alternative? Throw everything out x number of years and start over? It sounds great when you are a hopeless newbie but once you actually learn something do you want to do it all over again every 10 years just to make it easy for the next batch of basement kiddies? Your clock is ticking too you know! Now get off my lwn!!!! (lawn)

      P.S. Ok, Ok, I made up the dog part of the story. But it COULD have happened! The rest was real. Actually, I don't KNOW that it didn't happen... hmm....

    14. Re:Strange names by 93+Escort+Wagon · · Score: 3, Funny

      Just wait until Microsoft sees your post and we'll have eeegrep.

      No, I expect they'd call it grep#. And when Apple forks their own version, it'll be objective grep.

      --
      #DeleteChrome
    15. Re:Strange names by rk · · Score: 4, Funny

      Unix is user-friendly, it's just picky about who its friends are.

  2. Interesting... by DangerOnTheRanger · · Score: 3, Interesting

    With these tools, you could make grep and diff work with binary files in a meaningful way - very useful at times. I bet you could even adapt the "Context-Free Grep" into a sort of packet sniffer with enough work. I'd sure like to try these new programs sometime.

  3. Re:How's it compare to Meld? by Anonymous Coward · · Score: 3, Insightful

    It is surprising that Slashdot even let you post a deb: url, as the filter usually seems to destroy most non-http(s) links. However, not everyone uses a Debian-based distro, and not everyone tries some random package (even from the repository) before reading a little about it, so posting the home page would have been a bit more useful.

  4. Link to one of their papers on these tools by treerex · · Score: 4, Informative
  5. Re:DOE?????? by iced_tea · · Score: 3, Interesting

    They have HUGE amounts of data kicking around from various simulations/experiments.

    Check out the wikipedia article for supercomputers, and you'll see DOE mentioned.

    Tools like this could help with analysis and finding certain data sets. IIRC, regex are already used in DNA sequencing. There is probably a similar application and use for tools like this with their data.

  6. Re:Follow the money...? by Tanktalus · · Score: 5, Insightful

    Context-free grep/diff can be used to search for data/changes in arbitrary non-line-record-based files. Such as XML, HTML, JSON, SQLite databases, other databases, Apache configs, and many other pieces of data. Heck, even most programming languages are not line-based, but statement terminated/separated. Imagine being able to grep for a function name, and getting its entire prototype/usage even when it spans multiple lines (very common in standard glibc headers). And, depending on the plugin's capabilities, you could grep for a function name as a function name and not get back any usage of that text as a variable or embedded in a string, or a comment (skip commented-out calls!).

    If there's sufficient configurability, you could ask for the entire block that given text is in, and such a grep would be able to display everything in the corresponding {...}. Makes grep that much valuable.

    So, my question is, why aren't more IT-heavy corporations/government departments not involved?

  7. RTFA? by DragonWriter · · Score: 4, Informative

    funded in part by Google and the U.S. Energy Department

    I wonder what's the interest of these two in this.

    FTFA:

    Google's interest in this technology springs from the company's efforts in cloud computing, where it must automate operations across a wide range of networking gear, Weaver said. The DOE foresees that this sort of software could play a vital role in smart grids, in which millions of energy consuming end-devices would have connectivity of some sort. The software would help "make sense of all the log files and the configurations of the power control networks," Weaver said.

  8. Ooooh! by gstoddart · · Score: 3, Interesting

    As soon as I see "Context-Free Grep", I immediately think of a Context Free Grammar.

    That basically implies we can have much more sophisticated rules that match other structural elements the way a language compiler does. Which means that in theory you could do grep's that take into account structures a little more complex than just a flat file.

    Grep and diff that can be made aware of the larger structure of documents potentially has a lot of uses. Anybody who has had to maintain structured config files across multiple environments has likely wished for this before.

    Sounds really cool.

    --
    Lost at C:>. Found at C.
  9. Microsoft Ad by lucm · · Score: 3, Interesting

    I know I'll be modded down, but I have to say it: what they describe is already available in Powershell, where objects can be piped in search/filter functions.

    --
    lucm, indeed.
  10. They should call it... by goombah99 · · Score: 3, Insightful

    perl. Isn't this exactly why perl was invented?

    --
    Some drink at the fountain of knowledge. Others just gargle.
  11. Terrible idea by deblau · · Score: 4, Insightful

    This violates so many rules of the Unix philosophy that I don't even know where to begin...

    FTFA:

    Grep has issues with data blocks as well. "With regular expressions, you don't really have the ability to extract things that are nested arbitrarily deep," Weaver said.

    If your data structures are so complex that diff/grep won't cut it, they should probably be massaged into XML, in which case you can use XSLT off the shelf. It's already customizable to whatever data format you're working with.

    FTFA:

    With [operational data in block-like data structures], a tool such as diff "can be too low-level," Weaver said. "Diff doesn't really pay attention to the structure of the language you are trying to tell differences between." He has seen cases where dif reports that 10 changes have been made to a file, when in fact only two changes have been made, and the remaining data has simply been shifted around.

    No, 10 changes have been made. The fact that only two substantive changes have been made based on 10 edits is a subjective determination. That is, unless you want to detect that moving a block of code or data from one place to another in a file has no actual effect, in which case good luck because that's a domain-specific hard problem.

    --
    This post expresses my opinion, not that of my employer. And yes, IAAL.
  12. Re:Follow the money...? by bobaferret · · Score: 3, Interesting

    I wouldn't call it a cancer. But it's definitely useful if you don't ever want commercial companies to use your code in public. It matches up well with the open core model. Commercial people will only use it if you can give them a differently licensed copy of the code. Apache, MIT, and BSD are great if you truly want to give your code away and don't care what people do with it behind closed doors. AGPL is nice to make sure people always give back. LGPL and GPL nice if you only want them to give back if they change it. Should people pay and how much is an age old question. I have to balance the cost of support and development vs. the cost of the product. The more I lean on the community the less I can charge and the more exposure I get. While in the other direction I get more money, but have to spend more of it. And there is no one size fits all solution to any of this.