Slashdot Mirror


New Web Application Attack - Insecure Indexing

An anonymous reader writes "Take a look at 'The Insecure Indexing Vulnerability - Attacks Against Local Search Engines' by Amit Klein. This is a new article about 'insecure indexing.' It's a good read -- shows you how to find 'invisible files' on a web server and moreover, how to see contents of files you'd usually get a 401/403 response for, using a locally installed search engine that indexes files (not URLs)."

120 comments

  1. but its fixed in firefox now by Prophetic_Truth · · Score: 2, Funny

    right?

    --
    time is a perception of a being's consciousness
    time is your 6th sense, the wierd ones are 7+
    1. Re:but its fixed in firefox now by jacquesm · · Score: 2, Insightful
      Sure, and Konqueror never had it :)


      that's all nice and good, personally I think files that were never meant to be indexed make for the best reading by far !


    2. Re:but its fixed in firefox now by Anonymous Coward · · Score: 0

      nice work on the 3 consecutive first posts

  2. should have been from.... by Anonymous Coward · · Score: 5, Funny

    the department-of-the-bleedingly-obvious...

    1. Re:should have been from.... by tagish · · Score: 2, Insightful

      Bleedingly obvious and written in sufficiently pompous style that you feel obliged to read the whole thing just to verify that there really is nothing there that hasn't been common knowledge for the better part of the last decade.

      Of course in those days people actually built their sites using static HTML...

      --
      Andy Armstrong
  3. this is'nt new by rkv · · Score: 0

    was'nt there already one?

    1. Re:this is'nt new by Refrozen · · Score: 1, Offtopic

      Yeah, you're apostraphy goes one character to the right.

    2. Re:this is'nt new by Refrozen · · Score: 0, Offtopic

      And that "you're" was supposed to be your.... :P

    3. Re:this is'nt new by ikkonoishi · · Score: 0, Offtopic

      You really should have put quotation marks around the "your".

    4. Re:this is'nt new by jacksonj04 · · Score: 0, Offtopic

      I think you should have... what... but...

      Dammit! A perfect Grammar Nazi!

      --
      How many people can read hex if only you and dead people can read hex?
    5. Re:this is'nt new by Anonymous Coward · · Score: 0

      Yeah, you're apostraphy goes one character to the right.
      Y'ore seppeling of apostrophe is a catastraphy.

    6. Re:this is'nt new by Anonymous Coward · · Score: 0

      and your apostraphy should be apostrophy.

    7. Re:this is'nt new by Anonymous Coward · · Score: 0
      Yeah, you're apostraphy goes one character to the right.

      ...and "your" doesn't have an apostrophe.

    8. Re:this is'nt new by ikkonoishi · · Score: 1

      Actually I think standard english says that the period should go inside the quotation marks, but my programming trained mind refuses to let me do so.

    9. Re:this is'nt new by Anonymous Coward · · Score: 0

      That's "Standard Written English."

    10. Re:this is'nt new by EnronHaliburton2004 · · Score: 1

      Your humor capability does not contain any decent formatting at all.

    11. Re:this is'nt new by Anonymous Coward · · Score: 0

      Depends.

      In US English the period indeed goes inside the quotes, or so they say. However, in British English it depends what the quotes are doing.

      If it's a fragment, the full stop goes outside:- it usually looks "something like this".

      If it is a quote which already had a full stop or equivalent, "You'd have the full stop inside the quotes."

      Or, you know. Something like that.

    12. Re:this is'nt new by pla · · Score: 1

      Actually I think standard english says that the period should go inside the quotation marks, but my programming trained mind refuses to let me do so.

      Same here, that rule drives me absolutely batty... In my opinion, if I put the period in the quotes, I effectively tell the parser (aka "person reading my text" in the case of normal English communication) that I attribute the period to the source of the quote, while simultaneously leaving my own sentence un-terminated.

      However, I realized that you just need to apply the rules a little bit more literally to find a simple exploit that lets you do whichever you want. The rule says that, if you end your sentence with a quote, you put the period inside the quotes. But, if you put a period outside the close-quote, then technically the quote hasn't ended your sentence - the period following it has.

    13. Re:this is'nt new by Anonymous Coward · · Score: 0

      No, it's "American Written English". The English English use the more logical approach of having the period only appear inside the quotes if it's part of the quotation. Which makes writing technical documentation much easier...

    14. Re:this is'nt new by rkv · · Score: 0

      k so my english pretty much suckx and i know that but seriously i thought this new was old

    15. Re:this is'nt new by DarkMantle · · Score: 1
      Grammar Nazi's asside...

      Um, it IS new, because /. didn't post anything about it before. Even tho I've been using google cache to see files that I usually get a 403 on for a few months now.

      Besides, in a few hours it will be new all over again when they post the dupe.

      You can see evidence of that
      --
      DarkMantle I been bored, so I started a blog.
  4. and don't forget... by DrKyle · · Score: 4, Interesting

    to see if you can get the site's robots.txt as the files/directories in that file are sometimes full of goodies.

    1. Re:and don't forget... by MrEcho.net · · Score: 1

      Not when the file has something like this:
      User-agent: *
      Disallow: /

    2. Re:and don't forget... by spdt · · Score: 1

      "sometimes"

    3. Re:and don't forget... by Anonymous Coward · · Score: 0

      Sometimes "wget http://site.example.com/robots.txt" works to get the robots.txt file, but some smart webmasters are aware of this security hole and hide their robots.txt under a different name. B*stards!

    4. Re:and don't forget... by MostlyHarmless · · Score: 1

      Of course, that's assuming that you don't want your site indexed by any search engine (in which case, why is it exposed to the outside Internet to begin with?)

      Incidentally, it also breaks properly-designed retrieval mechanisms (like, say, RSS readers -- yes, dailykos.com, I'm talking about you!)

      --
      Friends don't let friends misuse the subjunctive.
    5. Re:and don't forget... by myov · · Score: 1

      For this reason, I tended not to create a robots.txt file. At minimum, sensitive sites wouldn't go in it.

      If anything, I'd block googlebot/others in .htaccess files, assuming it wasn't a passworded site to begin with.

      --
      I use Macs to up my productivity, so up yours Microsoft!
    6. Re:and don't forget... by Mekanix · · Score: 1

      Simple, to save bandwith.

      One of my friend does some genealogy research. He decided to put all his data online. Even before all the files were uploaded his site took a massive load. He thought his site was really popular, but after looking at his logs I could tell him that most of the trafic was done by searchengine robots and his ISP might not be so happy for this massive trafic.

    7. Re:and don't forget... by DrSkwid · · Score: 2, Insightful

      Incidentally, it also breaks properly-designed retrieval mechanisms

      if they break, how can they be properly designed ?

      --
      There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
    8. Re:and don't forget... by MostlyHarmless · · Score: 1

      They are properly designed if they obey robots.txt files at all times... which prevents them from downloading certain files that the web site's author probably meant to allow them to download. Like an RSS feed, again.

      --
      Friends don't let friends misuse the subjunctive.
    9. Re:and don't forget... by elemental23 · · Score: 1

      RSS readers, being essentially special-use web browsers, are not obligated to honor robots.txt. They certainly aren't robots/web crawlers. If your RSS reader is checking robots.txt restrictions before retreiving RSS feeds, it's misguided at best, if not broken.

      --
      I like my women like my coffee... pale and bitter.
  5. indexing google by page275 · · Score: 5, Interesting

    Even though here's about internal indexing, it reminded me of the old fashion google indexing: Search google with some sensitive terms such as : 'index of /' *.pdf *.ps

    1. Re:indexing google by Sweetshark · · Score: 1

      better use "topic filetype:pdf"

    2. Re:indexing google by Neil+Blender · · Score: 2, Informative

      Even though here's about internal indexing, it reminded me of the old fashion google indexing: Search google with some sensitive terms such as : 'index of /' *.pdf *.ps

      This is an execellent trick for searching for porn (ie "index of /" lesbian).

    3. Re:indexing google by ikkonoishi · · Score: 2, Interesting

      intitle:"axis storpoint CD" intitle:"ip address"

      DVD/CD servers...

  6. permissions permissions permissions by Capt'n+Hector · · Score: 4, Insightful

    Never give web-executable scripts more permissions than absolutely required. If the search engine has permission to read sensitive documents, and web users have access to this engine... well duh. It's just common sense.

    --
    Quid festinatio swallonis est aetherfuga inonusti?
    Africus aut Europaeus?
    1. Re:permissions permissions permissions by WiFiBro · · Score: 4, Insightful

      This document in the first paragraphs describes how to get to files which are not public. So you also need to take the sensitive files out of the public directory, which is easy but hardly ever done. (You can easily make a script to serve the files in non-public directories to those entitled to).

    2. Re:permissions permissions permissions by a55mnky · · Score: 1

      Expecting common sense is rather presumptuous of you - don't you think

      --
      Where oh where has my Underdog gone?
    3. Re:permissions permissions permissions by Anonymous Coward · · Score: 1, Insightful

      Give me a freaking break. This is the same guy who found the "HTTP RESPONSE SPLITTING" vulnerability. Last years catch phrase among the wankers at Ernest and Young and Accidenture. The same type of people who consider a HTTP TRACE XSS a vulnerability. I guess it's been a slow freaking year for security research.

      Amit Klein at least used to work for Watchfire formerly known as Scrotum (Sanctum), and the same company who tried to patent the application security assessment process. I guess it's been a really slow year for vulnerability research. They need a new terminology to scare the executives at fortune 500 corporations, and sell their useless products.

      People tend to forget that to compromise data, it's easier to steal the tape from the back of a plane than it is to hack up some stupid search engine.

    4. Re:permissions permissions permissions by Anonymous Coward · · Score: 0
      Moderators: Please note that "twitter" is a known fanatical sycophant whose obnoxious offtopic rants are legend here on Slashdot. It doesn't matter what the topic is, he'll find a way to scrape in some pointless Microsoft bashing. While nobody expects us to love Microsoft in any way, his particularly tepid style of calling anyone he replies to "troll" or "liar" or "fanboy" because he happens to disagree with whatever they're saying is well documented and should not be rewarded. If anything, twitter is the type of person that should not be part of the open source/free software community. He is an anathema to all that is good about free software.

      I'm posting this so that you (the moderator) have some context to consider twitter and not mod him up whenever he posts his filler preformatted rants about installing Knoppix or Mepis or whatever that unfortunately get him karma every single time and allow him to continue posting his trademark toxic crap (read on) day in and day out. You may consider this a troll - I consider it community service. And I ain't kidding.

      If you're a /. subscriber, I invite you to look through some of his posting history. I guarantee that you'll be hard pressed to find someone that is more "out there" than twitter. You'll also probably notice he's got quite an AC following. Don't just read his posts, make sure you go through the replies.

      To get an idea of what I'm talking about, check this post out. This is an article about email disclaimers. The parent of the post is complaining about the ads in the linked page and so on, and twitter actually goes off on a rant to blame it on Microsoft and recommend Lynx, because "is teh free".

      Here's another. In this post twitter not only calls the OP a troll but attempts to "tell it like it is" while making some vague argument about "GNU". Yes, if you're confused, you're not alone. The reply (modded +4) proceeds to simply destroy his bogus argument. You will notice he did not reply. This is what some people call "drive-by advocacy". A sort of I'll just leave you with my thoughts here and move on to the next flamebait kind of deal. In fact, he almost never replies because he knows that his fanatical arguments simply do not hold up to any sort of discussion. It's not that he's chosen the wrong cause - he's just going at it in a completely wrong way.

      Here's that drive-by advocacy and FUD in motion: twitter goes on about some topic and then drops the usual "oh and M$ is teh evil" because "WMP phones home" or some such. Called on his FUD, he then claims that WMP stores every song and movie you've ever played in a file, somewhere. Pressed further, he just sort of slithers out of sight, his FUD-spreading complete. This is not about some Microsoft technology that nobody likes anyway; it's about lying for the sake of lying. Way too many of his posts are exactly like this one.

      More? Just read though this post and the subsequent replies. I guess this stands on its own. Or these two. Or this one. Or this one.

      Still not convinced? This is what twitter considers "humour" while going about his daily "M$" routine.

      M

    5. Re:permissions permissions permissions by Anonymous Coward · · Score: 0

      Yes precisely, don't let the search tool index pages you don't want accessed. I'm no expert but I've recently read "Innocent Code", and so it immediately jumped out at me that (part of) the answer is whitelisting or blacklisting. If the search tool doesn't support one or the other, don't use it. This will need to be done in conjunction with other common sense measures: "security in depth". If you don't want a document accessed, don't give it a predictable name, such as one based on date.

  7. Interesting. Brief summary. by caryw · · Score: 4, Insightful

    Basically the article says that some site-installed search engines that simply index all the files in /var/www or whatever are insecure because they will index things that httpd would return a 401 or 403 for. Makes sense. A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.
    Does anyone know if the Google search applicance is affected by this?
    - Cary
    --Fairfax Underground: Where Fairax County comes out to play

  8. News at 11! by tetromino · · Score: 2, Insightful

    Search engines let you find stuff! This is precisely why google, yahoo, and all the rest obey robots.txt Personally, I would be amazed if local search engines didn't have their own equivalent of robots.txt that limited the directories they are allowed to crawl.

    1. Re:News at 11! by WiFiBro · · Score: 1, Insightful

      With a scripting language capable of giving directory contents and opening files (php, asp, python, etc), anyone can write such a search engine. No degree required.

    2. Re:News at 11! by WiFiBro · · Score: 1

      I forgot to say, many scripters are blissfully ignorant about most security issues.

    3. Re:News at 11! by Anonymous Coward · · Score: 0

      equivalent of robots.txt that limited the directories they are allowed to crawl.

      Given the number of people/companies that don't even configure a robots.txt, you're asking too much of the end-user.

    4. Re:News at 11! by digital+bath · · Score: 1

      Read the article. This does not apply to "external" search engines such as Google and Yahoo - only to internal search engines that have access to the files via thru the filesystem, not through the webserver, since these "internal" search engines are capable of indexing files that would return a 403/401 via http.

      --
      find / -name "*.sig" | xargs rm
  9. sounds like fun by h4ter · · Score: 2, Funny

    The attacker first loops through all possible words in English...

    I get the idea this might take a while.

    1. Re:sounds like fun by h4ter · · Score: 2, Funny

      Wait a minute. All possible? Couldn't be satisfied with just actual words? This is going to take a lot longer than I first thought.

      (Sorry for the reply to self. It's like my own little dupe.)

    2. Re:sounds like fun by gstoddart · · Score: 1
      Wait a minute. All possible? Couldn't be satisfied with just actual words? This is going to take a lot longer than I first thought.

      Well, just record the guessed words, you might stumble on Hamlet. :-P
      --
      Lost at C:>. Found at C.
  10. Does he really mean this by iMaple · · Score: 0, Redundant

    The article saysThe attacker first loops through all possible words in English

    I mean is this not a bit too ridiculous. (Esp if the inaccessible file is someone's personal outdated webpage). If it is anything useful(to a hacker or other persons involved in illegitimate acitvity) then the technique will most probably fail.
    I am not saying that there is no vulnerability (the get data from search snippets is a good idea), but the third option I just quoted above seems to be pretty lame

    1. Re:Does he really mean this by Anonymous Coward · · Score: 0

      Yep, I agree. the only good app for this is to hope that some one is dumb enough to store creditcard #, SSNs passwords or even email addresses in some file. And none of these will work if u loop through all the words.

    2. Re:Does he really mean this by Moraelin · · Score: 1

      Don't think of this step as requiring someone to sit there and type each word by hand. A script going through a dictionary file and writing the results to another file will do this step quite nicely.

      And, frankly, there aren't _that_ many words in English. Even assuming that your server is really slow and can return as little as 10 searches per second (via more than one thread, if needed), we're talking less than an hour for this script to do its job.

      And woe if someone decides to use an army of zombies to scan the whole Internet.

      Basically thinking that everything must be done by hand is _the_ antithesis and nemesis of security nowadays.

      We all laugh at users running without a firewall on the implicit assumption that "bah, nobody knows my IP address and noone is bored enough to try all IP addresses and known vulnerabilities by hand, so I'm safe". Think Blaster. Noone needs to do that by hand any more. In the beginning there were script kiddie kits out there that you can just start in the evening and have a list of vulnerable servers in the morning. But even that has been obsoleted by viruses who do the whole infection _and_ scanning for new hosts automatically and in a distributed fashion.

      So what's to keep the same from applying here?

      There's good stuff to be picked from unsuspecting companies' servers. A list of credit cards can be as good as finding oil. Or a huge list of valid and checked emails can be sold for good money to spammers. Etc. So the incentive is there.

      Do you think anyone will stop trying to get at it just because a brute-force automated attack is needed? I sure hope you don't, because you might be in for a surprise.

      --
      A polar bear is a cartesian bear after a coordinate transform.
  11. Vs. Database-Driven Sites? by Eberlin · · Score: 3, Insightful

    The instances mentioned all seem to revolve around the idea of indexing files. Could the same be used for database driven sites? You know, like the old search for "or 1=1" trick?

    Then again, it's about being organized, isn't it? A check of what should and shouldn't be allowed to go public, some sort of flag where even if it shows up in the result, it better not make its way onto the HTML being sent back. (I figure that's more DB-centric though)

    Last madman rant -- Don't put anything up there that shouldn't be for public consumption to begin with!!! If you're the kind to leave private XLS, DOC, MDB, and other sensitive data on a PUBLIC server thinking it's safe just because nobody can "see" it, to put it delicately, you're an idiot.

    1. Re:Vs. Database-Driven Sites? by jnf · · Score: 2, Insightful

      thank you. thats the real security risk- not the indexing agent- but rather why is there internal documentation that is 'private' or 'confidential' within the webroot on an externally accessible webserver?

    2. Re:Vs. Database-Driven Sites? by illumin8 · · Score: 0, Troll

      If you're the kind to leave private XLS, DOC, MDB, and other sensitive data on a PUBLIC server thinking it's safe just because nobody can "see" it, to put it delicately, you're an idiot.

      Or, you're a Diebold employee...

      --
      "When the president does it, that means it's not illegal." - Richard M. Nixon
  12. Re:Interesting. Brief summary. by XorNand · · Score: 4, Insightful
    A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.
    Yes, that would be safer. But one of the powers of local search engines is the ability to index content that isn't linked elsewhere on the site, e.g. old press releases, discontinued product documentation, etc. Sometimes you don't want to clutter up your site with irrelavant content, but you want to allow people who know what they're looking for to find it. This article isn't really groundbreaking. It's just another example of how technology can be a double-edged sword.
    --
    Entrepreneur : (noun), French for "unemployed"
  13. sort of like googling for visa cards by Triumph+The+Insult+C · · Score: 1

    4750 ....

    --
    vodka, straight up, thank you!
    1. Re:sort of like googling for visa cards by Anonymous Coward · · Score: 0

      It doesn't work very well now that so many people know about it.

  14. Re:Interesting. Brief summary. by tetromino · · Score: 4, Informative

    Does anyone know if the Google search applicance is affected by this?

    No. First of all, the Google Search Appliance crawls over http, and therefore obeys any .htaccess rules your server uses. Second, you can set it up so that users need to authenticate themselves. Third, there are many filters you can set up to prevent it from indexing sensitive content in the first place (except that since any sensitive content the google appliance indexes must already be accessible via an external http connection, one hopes it's not too sensitive).

  15. Isn't this by jacksonj04 · · Score: 1

    by design? Surely something with permission to index internal files (even those specified to give 403s etc) is inherently designed to make them available to view.

    Either that, or it's a user error (configuration).

    --
    How many people can read hex if only you and dead people can read hex?
  16. that gives me an idea... by edeus · · Score: 1

    Is it possible given the time and perseverence to exploit a vunerability in a search engine's parsing of a webpage say, you maliciously published somewhere? Obviously one would expect google and the likes to have good security (well apart from the gmail exploit and... well lets not go there), so I was curious has it ever been done? (ponders)

  17. search indexer = magic by EvilSheep · · Score: 1

    Summary; If you are going to use magic to index your web site, be smart about it. Don't just blindly use a tool that "does the job".

    Nothing new here.

    --
    ---
  18. obvious? by jnf · · Score: 5, Insightful

    I read the article and it seems to be like a good chunk of todays security papers, 'heres a long drawn out explanation of the obvious', I suppose it wasn't as long as it could be, but really ... using a search engine to find a list of files on a website? I suppose someone has to document it..

    I mean, I understand its a little more complex as described in the article- but i would hardly call this a 'new web application attack', at best perhaps one of those humorous advisories where the author overstates things and creates much ado about nothing- or at least thats my take;

    -1 not profound

  19. does this mean more PRON? by jephthah · · Score: 2, Funny

    bastards always hiding their stash. this'll show 'em

  20. P2P by Turn-X+Alphonse · · Score: 4, Interesting

    goto any P2P network and type @hotmail.com, @Gmail.com or @yahoo.com and see what documents turn up.. I'm willing to put money on them all being e-mails saved on idiots PCs which will contain everything from stuff to sell to spammers (if your so inclined), to sexual stuff and passwords/creditcard info.

    Nothing really new here..

    --
    I like muppets.
    1. Re:P2P by 12+inch+pianist · · Score: 0

      "Resume" is another fun p2p search. Usually has the name, address, and phone number. Then browse host and check out their kiddie pr0n collection.

    2. Re:P2P by mibus · · Score: 1

      That should give you plenty of cookies with authentication info...

      Search for the right extension and you're likely to find MSN Messenger logs from people who have shared out all of "My Documents" without thinking!

    3. Re:P2P by glesga_kiss · · Score: 1
      Outlook *.pst files are another interesting one to search for. And most cameras prefix all photographs with something, e.g. DSGXXXXXX.jpg, so you can search for them.

      One interesing thing to note is that the site spammers are onto these things already. The photo one now pulls in lots of sample advert images for adult sites, as did a couple of the older searches that are linked on the site the article refers to.

  21. Re:Interesting. Brief summary. by Qzukk · · Score: 4, Interesting

    If you could give the crawler multiple starting points then you could simply have an unlinked page that links to all the old content, and give that page to the crawler as a second starting point.

    --
    If I have been able to see further than others, it is because I bought a pair of binoculars.
  22. Uh huh.... by conran · · Score: 1

    "Reconstructing" files by searching every word in the english language in different orders? I want the last 5 minutes of my life back...

    1. Re:Uh huh.... by SharpFang · · Score: 1

      Did you RTFA?

      Search foo. You get: .. first version of Foo, the world leading ...
      Then search just the above. You get: ... to release the first version of Foo, the world leading anti-gravity engine ...
      Repeat... ... We are happy to release the first version of Foo, the world leading anti-gravity engine that works on ...
      Doesn't sound too hard?

      Of course the length is limited but that can be solved by "moving frame." Say, putting the above, the engine says your query is too long.
      Search: "anti-gravity engine that works on" and get
      "... world leading anti-gravity engine that works on salted water and cheap..."
      Then put "works on salted water and cheap" and get
      "...engine that works on salted water and cheap components like..."
      Search "water and cheap components like" and so on...

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    2. Re:Uh huh.... by conran · · Score: 2, Informative

      Did you RTFA?

      Yep. Did you keep reading it? I'm referring to the methods for when no excerpts are given.

  23. RTFM by Tuross · · Score: 5, Informative

    My company specialises in search engine technology (for almost a decade now). I've worked quite in-depth with all the big boys (Verity, Autonomy, FAST, ...) and many of the smaller players too (Ultraseek, ISYS, Blue Angel, ...)

    I can't recall the last time this kind of attack wasn't mentioned in the documentation for the product, along with instructions on how to disable it. If you choose to ignore the product documentation, you get what you deserve.

    It's quite simple folks. Don't open the search engine. ACL query connections. Sanitize queries like you (should?) do other CGI applications. Authenticate queries and results. If you can't be bothered, hire someone who can.

    --
    Matt
    1. Read Slashdot
    2. ???
    3. Profit
  24. That'll make it easy... by Jaidon · · Score: 0

    ...to find all the "free sample" pr0n hidden in the maze of otherwise unintelligble directories. In the end, isn't that what the Internet is all about -- finding more efficient ways to see boobies? Yes...yes I think so.

  25. Re:Interesting. Brief summary. by BigGerman · · Score: 4, Interesting

    This is even more important when a search engine (appliance) is capable to crawl the file shares directly (not just over HTTP).
    EnterFind appliance (which I participated in developing) has this (still unique) feature and their clients were amazed by what the crawler can dig out. Especially in those "hidden" fields in the Office documents.

  26. Assumptions by shird · · Score: 1

    All these "attacks" assume the indexing program will index and return results for files you dont have access to.

    Im pretty sure the indexing server on Windows won't return 'search results' for files you dont have permissions to list. As with any other sensible indexing schemes, except perhaps the newer silly 'desktop search' tools. Seems pretty obvious to me.

    --
    I.O.U One Sig.
    1. Re:Assumptions by SharpFang · · Score: 2, Informative

      Im pretty sure the indexing server on Windows won't return 'search results' for files you dont have permissions to list.
      The problem and vulnerablity lies in definition of "you".
      The indexing program runs on privledges of a local user with direct access to the harddrive. Listing directory contents, reading user-readable files. "you" are the user, like one behind the console, maybe without access to sensitive system files, but with access to mostly everything in the htroot tree the administrator hasn't blocked using the OS permissions, not the httpd features.
      As a webpage visitor "you" are "guest", filtered through httpd, with all httpd restrictions applied. No directory listing, obscure blocking methods (.htaccess, config files, redirects, CGI wrapping) working. Your access is limited to what httpd lets you do, not just what the OS does. Now if you access the search engine database, you can see mostly everything the engine saw, including things it wouldn't see if it was running through httpd, not directly accessing the filesystem.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    2. Re:Assumptions by shird · · Score: 1

      Yes the indexing service may have access to everything. Thats why I said it won't return search results for files *you* dont have permissions to list.

      ie, the indexing service checks the permssion of the requesting user, and only lists files they would be able to list in the OS. Its only common sense.

      --
      I.O.U One Sig.
    3. Re:Assumptions by robertwall · · Score: 1

      The article's talking about search engines that are run locally on websites, not indexing services on local computer terminals.

  27. application in porn by Anonymous Coward · · Score: 1, Funny

    my mind being the way it is, i can't help but think of an application for this in porn ;). a lot of porn sites have extensive free previews, but its hard for someone to find all the free preview pics for a certain site (useful especially for a single model's site) unless you can find a direct link to every single unique free preview gallery from somewhere, and you'll undoubtedly miss some good stuff. i want to see a firefox extension that gets me all the free pics from a given site damnit!

  28. Re:Mozilla Firefox fucking sucks by Anonymous Coward · · Score: 2, Insightful

    Oh, we are terribly sorry for taking so long!
    Don't worry, we will give you a full refund.

  29. Google Hacks Database by giant_toaster · · Score: 5, Informative

    I guess a lot of people have seen this site before, but http://johnny.ihackstuff.com/index.php?module=prod reviews has a lot of these google exploits etc, he is posting them up so people can check if their sites are secure. There are some interesting presentations by him on the main site about how search engines can be exploited.

    1. Re:Google Hacks Database by veg_all · · Score: 1

      he is posting them up so people can check if their sites are secure

      Uh-huh. I imagine most of his readers are using them to make sure everyone else's site is secure : )

      --
      grammar-lesson free since 1999. (rescinded - 2005)
    2. Re:Google Hacks Database by Anonymous Coward · · Score: 0

      In case you didn't know, it's actually a her

    3. Re:Google Hacks Database by giant_toaster · · Score: 1

      "Who's johnny? ...... Secondly, I am a family guy. I am very close to my family and make them the second-highest priority in my life." http://johnny.ihackstuff.com/modules.php?op=modloa d&name=FAQ&file=index&myfaq=yes&id_cat=1 I'm sure he's a guy?

  30. Speaking of firefox by ad0gg · · Score: 4, Interesting

    Another exploit can out this weekend. The funny thing is that microsoft antispyware beta 1 detects the execution of the payload file and shows a prompt if you want continue or stop the execution.

    --

    Have you ever been to a turkish prison?

    1. Re:Speaking of firefox by irix · · Score: 1

      Another exploit can out this weekend.

      I don't think it is so new - it is fixed by 1.0.1. From the description:

      Status The exploit is based on multiple vulnerabilities: bugzilla.mozilla.org #280664 (fireflashing) bugzilla.mozilla.org #280056 (firetabbing) bugzilla.mozilla.org #281807 (firescrolling) Upgrade to Firefox 1.0.1 or disable javascript.
      --

      Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
    2. Re:Speaking of firefox by Chris+Kamel · · Score: 1

      The funny thing is that microsoft antispyware beta 1 detects the execution of the payload file and shows a prompt if you want continue or stop the execution.
      Now what's funny about that? Should it always be the other way round? Yeah I know this is against the "majority mindset" as someone just said. I don't care

      --
      The following statement is true
      The preceding statement is false
  31. New option for robots.txt by michelcultivo · · Score: 5, Funny

    Please put this new undocumented tag on your robots.txt file: "hackthis=false" "xss=false" "scriptkiddies=log,drop" And all you problems will be solved.

    1. Re:New option for robots.txt by greyhoundpoe · · Score: 1

      New option for robots.txt (Score:3, Interesting)
      Please put this new undocumented tag on your robots.txt file: "hackthis=false" "xss=false" "scriptkiddies=log,drop" And all your problems will be solved.


      Note to mods: *slap*

  32. Re:Interesting. Brief summary. by Grax · · Score: 4, Insightful

    On a site with mixed security levels (i.e. some anonymous and some permission-based access) the "proper" thing to do is to check security on the results the search engine is returning.

    That way an anonymous user would see only results for documents that have read permissions for anonymous while a logged-in user would see results for anything they had permissions to.

    Of course this idea works fine for a special purpose database-backed web site but takes a bit more work on just your average web site.

    Crawling the site via localhost:80 is the most secure method for a normal site. This would index only documents available to the anonymous user already and would ignore any unlinked documents as well.

  33. Re:Mozilla Firefox fucking sucks by Anonymous Coward · · Score: 0
    I want a translation in my language


    Yuhn. we wannt the langwich opshun fer "idiot" ...

  34. how to solve by foo(foo(foo(bar))) · · Score: 1


    1) write your own web applications
    2) Use lucene
    3) only index what you want to index
    4) ????
    5) profit

  35. RTFA by SharpFang · · Score: 1

    The problem is these are perfectly legal search engine queries. No matter how you "sanitize" the queries, that won't help, because they contain valid requests. The vulnerablity lies at the side of the indexing program, not the query/search/display one. The indexer indexes things it shouldn't. Files inaccessible normally through httpd are accessible in the search database.

    A method I see for that would be running the indexing by piping it through httpd, make even local indexing go the same way remote indexing is being done - not indexing /var/www/... but http://localhost/. This way the indexer won't be able to access anything else common user can.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    1. Re:RTFA by Tuross · · Score: 1

      Maybe, just maybe, someone wants to see a file that's inaccessible by anyone else (or perhaps limited to a select few). Like, personal info, classified information (be it military classification or simply commercial-in-confidence), employment records, blah blah blah blah. Most search engines handle this, as I mentioned before, through various means that are more or less secure.

      You are inferring that search engines should only index public information, essentially crippling their usefulness. Glad you don't work here.

      (and FWIW I did RTFA.)

      --
      Matt
      1. Read Slashdot
      2. ???
      3. Profit
  36. This is old. by brennz · · Score: 4, Insightful

    Why is this being labeled as something new? I remember this being a problem back in 1997 when I was still working as a webmaster.

    Whoever posted this as a "new" item, is behind the times.

    OWASP covers it!

    Lets not rehash old things!

    1. Re:This is old. by lux55 · · Score: 1

      Not to be all "I'm so smart" but isn't this also rather obvious? If you're indexing private documents, don't return private results for public visitors. Simple as that.

      All it takes to implement this is an "access level" field stored with each index entry, and assigning an "access level" session value to each visitor (defaulting to 0 for anonymous visitors).

      Plus, this way you'll avoid pissing off visitors who click on essentially broken links in their search results.

      No wonder the search capabilities of most sites are rated so poorly...

  37. Why bother with phisching scams... by B747SP · · Score: 3, Interesting
    This is hardly news to me. When I need a handy-dandy credit card number with which to sign up for one of those, er, 'adult hygeine' web sites, I just google for a string like "SQL Dump" or "CREATE TABLE" or "INSERT INTO" with filetype:sql and reap the harvest. No need to piss about with hours of spamming, setting up phisching hosts, etc, etc :-)

    --
    I find your ideas intriguing and I wish to subscribe to your newsletter.
    1. Re:Why bother with phisching scams... by NaDrew · · Score: 1

      Your ideas intrigue me, and I wish to subscribe to your newsletter.

      --
      Vista:XPSP2::ME:98SE
  38. solution by Anonymous Coward · · Score: 3, Insightful

    here's a solution thats been tried and seems to work: create metadata for each page as an xml/rdf file (or db field). XPATH can be used to scrape content from HTML et al to automate the process, as can capture from CMS or other doc management solutions. create a manifest per site or sub site that is an XML-RDF tree structure containing references to the metadata files and mirroring your site structure. finally, assuming you have an API for your search solution (and don't b*gger around using ones that dont) code the indexing application to only parse the XML-RDF files, beginning with the structural manifest and then down into the metadata files. Your index will then contain relevant data, site structure, and thanks to XPATH, hyperlinks for the web site. No need to directly traverse the HTML. Still standards based. Security perms only need to allow access to the XML-RDF files for the indexer, which means process perms only are needed, user perms are irrelevant.

    There are variations and contingencies, but the bottom line is, even if someone cracked into the location for an xml metadata file, its not the data itself and while it may reveal a few things about the page or file it relates to, certainly is bottom line much less of a risk than full access to other file types on the server.

    heres another tip for free. because you now have metadata in RDF, with a few more lines of code you can output it as RSS.

  39. I just... by Anonymous Coward · · Score: 0

    ...let j0hnny do all the work for me.

    I mean with the 0 in his name and everything, I know he's good.

  40. Re:Interesting. Brief summary. by Anonymous Coward · · Score: 0

    Crawling over http with a single privilege level would address this. Multiple privilege levels is exactly the problem at hand. Presumably the crawler has a tasty privilege level..

  41. WASC is looking for content authors by Anonymous Coward · · Score: 0
  42. It's just more news you won't see on /. by Anonymous Coward · · Score: 0

    Firefox having another exploit, and Micrisoft's new beta software fixing it. You won't see it on Slashdot's front page.

    Posting anon because this is both off-topic and against the majority mindset.

    1. Re:It's just more news you won't see on /. by Anonymous Coward · · Score: 0

      Posting anon because this is both off-topic and against the majority mindset.

      you rebel you

  43. credit where due.... by Anonymous Coward · · Score: 0

    Anonymous? I sent that in and I demand recognition!

  44. Why worry about this? by Anonymous Coward · · Score: 1, Insightful

    Anything I put on a publicly-acessible web server, I want publicly accessible, and I want it to be as easily accessed as possible.

    Anything else goes on a pocket network or not at all.

    The only exception would be an order form, and that will be very narrowly designed to do exactly one thing securely.

  45. Should search be addressed at file system level by BrianUofR · · Score: 1

    What if the file system supported an index attribute that proper search programs (windows search, google desktop, UNIX locate, etc) could respect?

    chmod -i file

    With the search vendors racing to own desktop search and microsoft working on WinFS, is "indexability" now an important security attribute for a file?

    1. Re:Should search be addressed at file system level by mrmagos · · Score: 1
      Why not just chmod 660 directory that contains the file? If the directory is unreadable by those without permission, it can't be viewed or indexed. Just be wary of whom you're giving permission to where like you already (should) be doing. There's no need to add another file attribute.

      --
      Help me help you get a free mini Mac.

      --
      Never start vast projects with half-vast ideas.
  46. To those who put sensitive documents on web server by MaGogue · · Score: 1

    ... leave IT decisions to engineers, not the managers!
    Once upon a time, intelligent people were responsible for computers and IT.
    Now, it's either a manager, or a bunch of kids ("web developers") who don't know what they are playing with.
    Of course there are plenty of exploits waiting to be discovered that WILL get those documents off your web server.. UNLESS you are smart enough to keep them elsewhere.
    I realize this is a flamebait as good as they get - but please understand that I will just duck. It was not intended as such.

  47. Re:Interesting. Brief summary. by viktor · · Score: 1

    A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.

    That would not help much. Most sites have different content depending on the IP address accessing the content, i.e. internal IP:s get content that external IP:s cannot access. Crawling on localhost:80 would remove the non-linked files, but still gives the search engine access to a lot of content that should not be indexed.

    The only safe crawler is one that is located outside your network.

    What really scares me, though, is that this idea is somehow seen as new. It is blatantly obvious that one does not get good or proper results by indexing files locally. For example, you get an index of your PHP script's source code (including the database passwords they likely contain) instead of the output from them. And it doesn't follow any .shtml includes etc. either.

    Even the fact that a search engine crawler running from an internal IP will be able to access and index content that shouldn't be externally available is very obvious.

    What the article possibly adds, is a list of ideas about what to search for in the affected organization's index. But I wouldn't consider the idea new in any way.

    This, or rather its sibling with internal IP:s, was something that we designed the robots.txt file for back in '97 when our university bought it's first search engine. I refuse to believe that nobody has written an article about this idea until now.

    But if this is the first article about this, and if people actually find it interesting and revealing, then it was really fortunate that it got written now rather than in ten more years.

  48. In Other News ..... by Anonymous Coward · · Score: 0

    You can get some idea of just how easy it is to commit "identity theft" by visiting this secret URL.

    On that basis alone, I'm not massively bothered about putting intact gas bills &c. in the recycling. Other people's identities are easier to steal! {And beside which, there are CCTV cameras to make sure nobody is putting the wrong stuff in the wrong bin ..... so they probably would see someone taking stuff out .....}

  49. yes and no by Moraelin · · Score: 1

    The old break-out-of-quotes trick is IMHO a different kind of vulnerability, in that it's really a programming bug. There is no reason, other than a programmer being too stupid/ignorant to escape quotes (or for most burger-flippers-turned-programmers, to even know that it's possible to escape quotes or to use prepared statements), for that happening. For that matter, also too ignorant to know that the "LIKE" operator isn't really a full text search engine.

    The search index problem is similar, but not quite. The search machine works as intended, it just has access to more data than the site owners realize.

    Now it _can_ also be a programemr error, but most often, it's a design error. People just haven't even given any thought to security there, and thus implemented a system that is broken as designed.

    You'd be surprised how quickly people can skip over any security considerations. Especially when they can find half an excuse. Even a stupid one, like "but we don't link to that file, so it's safe." Or worse, "but we're using SSL and we're behind a firewall, so _of_ _course_ we're secure. No need to worry about security."

    --
    A polar bear is a cartesian bear after a coordinate transform.
  50. Twitter: Life and times of a petulant cock-gobbler by Anonymous Coward · · Score: 0

    Twitter, you're a petulant cock-gobbling sycophant to Linux Torvaldyos! Quit taking DP from ESR and RMS's feculent cocks and why don't you try to stop sucking quite so much? Get out of your parents' basement and see the real world - maybe then you'll see how pathetic you sound, with your neverending stream of bullshit about how Microsoft is stalking you. Wasn't it you who said that Microsoft believes your insane ranting is actually a threat to them, so they PAY PEOPLE to reply to you on Slashdot? No sir, I don't get any money. I do it for the love. Someone has to go up against your paranoid whining. So get back in your cage and shut the fuck up already.