New Web Application Attack - Insecure Indexing

← Back to Stories (view on slashdot.org)

New Web Application Attack - Insecure Indexing

Posted by timothy on Monday February 28, 2005 @11:53AM from the trawling-for-patterns dept.

An anonymous reader writes "Take a look at 'The Insecure Indexing Vulnerability - Attacks Against Local Search Engines' by Amit Klein. This is a new article about 'insecure indexing.' It's a good read -- shows you how to find 'invisible files' on a web server and moreover, how to see contents of files you'd usually get a 401/403 response for, using a locally installed search engine that indexes files (not URLs)."

35 of 120 comments (clear)

but its fixed in firefox now by Prophetic_Truth · 2005-02-28 11:54 · Score: 2, Funny

right?

--
time is a perception of a being's consciousness
time is your 6th sense, the wierd ones are 7+
1. Re:but its fixed in firefox now by jacquesm · 2005-02-28 12:05 · Score: 2, Insightful
  
  Sure, and Konqueror never had it :)
  
  that's all nice and good, personally I think files that were never meant to be indexed make for the best reading by far !
  
  --
  MP3 Search Engine
should have been from.... by Anonymous Coward · 2005-02-28 11:55 · Score: 5, Funny

the department-of-the-bleedingly-obvious...
1. Re:should have been from.... by tagish · 2005-02-28 20:24 · Score: 2, Insightful
  
  Bleedingly obvious and written in sufficiently pompous style that you feel obliged to read the whole thing just to verify that there really is nothing there that hasn't been common knowledge for the better part of the last decade.
  
  Of course in those days people actually built their sites using static HTML...
  
  --
  Andy Armstrong
and don't forget... by DrKyle · 2005-02-28 11:58 · Score: 4, Interesting

to see if you can get the site's robots.txt as the files/directories in that file are sometimes full of goodies.
1. Re:and don't forget... by DrSkwid · 2005-03-01 04:24 · Score: 2, Insightful
  
  Incidentally, it also breaks properly-designed retrieval mechanisms
  
  if they break, how can they be properly designed ?
  
  --
  There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
indexing google by page275 · 2005-02-28 11:59 · Score: 5, Interesting

Even though here's about internal indexing, it reminded me of the old fashion google indexing: Search google with some sensitive terms such as : 'index of /' *.pdf *.ps
1. Re:indexing google by Neil+Blender · 2005-02-28 12:09 · Score: 2, Informative
  
  Even though here's about internal indexing, it reminded me of the old fashion google indexing: Search google with some sensitive terms such as : 'index of /' *.pdf *.ps
  
  This is an execellent trick for searching for porn (ie "index of /" lesbian).
2. Re:indexing google by ikkonoishi · 2005-02-28 12:15 · Score: 2, Interesting
  
  intitle:"axis storpoint CD" intitle:"ip address"
  
  DVD/CD servers...
permissions permissions permissions by Capt'n+Hector · 2005-02-28 12:00 · Score: 4, Insightful

Never give web-executable scripts more permissions than absolutely required. If the search engine has permission to read sensitive documents, and web users have access to this engine... well duh. It's just common sense.

--
Quid festinatio swallonis est aetherfuga inonusti?
Africus aut Europaeus?
1. Re:permissions permissions permissions by WiFiBro · 2005-02-28 12:11 · Score: 4, Insightful
  
  This document in the first paragraphs describes how to get to files which are not public. So you also need to take the sensitive files out of the public directory, which is easy but hardly ever done. (You can easily make a script to serve the files in non-public directories to those entitled to).
Interesting. Brief summary. by caryw · 2005-02-28 12:00 · Score: 4, Insightful

Basically the article says that some site-installed search engines that simply index all the files in /var/www or whatever are insecure because they will index things that httpd would return a 401 or 403 for. Makes sense. A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.
Does anyone know if the Google search applicance is affected by this?
- Cary
--Fairfax Underground: Where Fairax County comes out to play
News at 11! by tetromino · 2005-02-28 12:01 · Score: 2, Insightful

Search engines let you find stuff! This is precisely why google, yahoo, and all the rest obey robots.txt Personally, I would be amazed if local search engines didn't have their own equivalent of robots.txt that limited the directories they are allowed to crawl.
sounds like fun by h4ter · 2005-02-28 12:04 · Score: 2, Funny

The attacker first loops through all possible words in English...

I get the idea this might take a while.
1. Re:sounds like fun by h4ter · 2005-02-28 12:07 · Score: 2, Funny
  
  Wait a minute. All possible? Couldn't be satisfied with just actual words? This is going to take a lot longer than I first thought.
  
  (Sorry for the reply to self. It's like my own little dupe.)
Vs. Database-Driven Sites? by Eberlin · 2005-02-28 12:08 · Score: 3, Insightful

The instances mentioned all seem to revolve around the idea of indexing files. Could the same be used for database driven sites? You know, like the old search for "or 1=1" trick?

Then again, it's about being organized, isn't it? A check of what should and shouldn't be allowed to go public, some sort of flag where even if it shows up in the result, it better not make its way onto the HTML being sent back. (I figure that's more DB-centric though)

Last madman rant -- Don't put anything up there that shouldn't be for public consumption to begin with!!! If you're the kind to leave private XLS, DOC, MDB, and other sensitive data on a PUBLIC server thinking it's safe just because nobody can "see" it, to put it delicately, you're an idiot.
1. Re:Vs. Database-Driven Sites? by jnf · 2005-02-28 12:40 · Score: 2, Insightful
  
  thank you. thats the real security risk- not the indexing agent- but rather why is there internal documentation that is 'private' or 'confidential' within the webroot on an externally accessible webserver?
Re:Interesting. Brief summary. by XorNand · 2005-02-28 12:09 · Score: 4, Insightful

A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.
Yes, that would be safer. But one of the powers of local search engines is the ability to index content that isn't linked elsewhere on the site, e.g. old press releases, discontinued product documentation, etc. Sometimes you don't want to clutter up your site with irrelavant content, but you want to allow people who know what they're looking for to find it. This article isn't really groundbreaking. It's just another example of how technology can be a double-edged sword.

--
Entrepreneur : (noun), French for "unemployed"
Re:Interesting. Brief summary. by tetromino · 2005-02-28 12:10 · Score: 4, Informative

Does anyone know if the Google search applicance is affected by this?

No. First of all, the Google Search Appliance crawls over http, and therefore obeys any .htaccess rules your server uses. Second, you can set it up so that users need to authenticate themselves. Third, there are many filters you can set up to prevent it from indexing sensitive content in the first place (except that since any sensitive content the google appliance indexes must already be accessible via an external http connection, one hopes it's not too sensitive).
obvious? by jnf · 2005-02-28 12:15 · Score: 5, Insightful

I read the article and it seems to be like a good chunk of todays security papers, 'heres a long drawn out explanation of the obvious', I suppose it wasn't as long as it could be, but really ... using a search engine to find a list of files on a website? I suppose someone has to document it..

I mean, I understand its a little more complex as described in the article- but i would hardly call this a 'new web application attack', at best perhaps one of those humorous advisories where the author overstates things and creates much ado about nothing- or at least thats my take;

-1 not profound
does this mean more PRON? by jephthah · 2005-02-28 12:19 · Score: 2, Funny

bastards always hiding their stash. this'll show 'em
P2P by Turn-X+Alphonse · 2005-02-28 12:20 · Score: 4, Interesting

goto any P2P network and type @hotmail.com, @Gmail.com or @yahoo.com and see what documents turn up.. I'm willing to put money on them all being e-mails saved on idiots PCs which will contain everything from stuff to sell to spammers (if your so inclined), to sexual stuff and passwords/creditcard info.

Nothing really new here..

--
I like muppets.
Re:Interesting. Brief summary. by Qzukk · 2005-02-28 12:26 · Score: 4, Interesting

If you could give the crawler multiple starting points then you could simply have an unlinked page that links to all the old content, and give that page to the crawler as a second starting point.

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
RTFM by Tuross · 2005-02-28 12:27 · Score: 5, Informative

My company specialises in search engine technology (for almost a decade now). I've worked quite in-depth with all the big boys (Verity, Autonomy, FAST, ...) and many of the smaller players too (Ultraseek, ISYS, Blue Angel, ...)

I can't recall the last time this kind of attack wasn't mentioned in the documentation for the product, along with instructions on how to disable it. If you choose to ignore the product documentation, you get what you deserve.

It's quite simple folks. Don't open the search engine. ACL query connections. Sanitize queries like you (should?) do other CGI applications. Authenticate queries and results. If you can't be bothered, hire someone who can.
--
Matt
1. Read Slashdot
2. ???
3. Profit
Re:Interesting. Brief summary. by BigGerman · 2005-02-28 12:35 · Score: 4, Interesting

This is even more important when a search engine (appliance) is capable to crawl the file shares directly (not just over HTTP).
EnterFind appliance (which I participated in developing) has this (still unique) feature and their clients were amazed by what the crawler can dig out. Especially in those "hidden" fields in the Office documents.
Re:Mozilla Firefox fucking sucks by Anonymous Coward · 2005-02-28 12:37 · Score: 2, Insightful

Oh, we are terribly sorry for taking so long!
Don't worry, we will give you a full refund.
Google Hacks Database by giant_toaster · 2005-02-28 12:38 · Score: 5, Informative

I guess a lot of people have seen this site before, but http://johnny.ihackstuff.com/index.php?module=prod reviews has a lot of these google exploits etc, he is posting them up so people can check if their sites are secure. There are some interesting presentations by him on the main site about how search engines can be exploited.
Speaking of firefox by ad0gg · 2005-02-28 12:44 · Score: 4, Interesting

Another exploit can out this weekend. The funny thing is that microsoft antispyware beta 1 detects the execution of the payload file and shows a prompt if you want continue or stop the execution.

--
Have you ever been to a turkish prison?
New option for robots.txt by michelcultivo · 2005-02-28 12:44 · Score: 5, Funny

Please put this new undocumented tag on your robots.txt file: "hackthis=false" "xss=false" "scriptkiddies=log,drop" And all you problems will be solved.

--
http://www.michel.eti.br
Re:Interesting. Brief summary. by Grax · 2005-02-28 12:44 · Score: 4, Insightful

On a site with mixed security levels (i.e. some anonymous and some permission-based access) the "proper" thing to do is to check security on the results the search engine is returning.

That way an anonymous user would see only results for documents that have read permissions for anonymous while a logged-in user would see results for anything they had permissions to.

Of course this idea works fine for a special purpose database-backed web site but takes a bit more work on just your average web site.

Crawling the site via localhost:80 is the most secure method for a normal site. This would index only documents available to the anonymous user already and would ignore any unlinked documents as well.

--
Coding Blog
Re:Assumptions by SharpFang · 2005-02-28 13:11 · Score: 2, Informative

Im pretty sure the indexing server on Windows won't return 'search results' for files you dont have permissions to list.
The problem and vulnerablity lies in definition of "you".
The indexing program runs on privledges of a local user with direct access to the harddrive. Listing directory contents, reading user-readable files. "you" are the user, like one behind the console, maybe without access to sensitive system files, but with access to mostly everything in the htroot tree the administrator hasn't blocked using the OS permissions, not the httpd features.
As a webpage visitor "you" are "guest", filtered through httpd, with all httpd restrictions applied. No directory listing, obscure blocking methods (.htaccess, config files, redirects, CGI wrapping) working. Your access is limited to what httpd lets you do, not just what the OS does. Now if you access the search engine database, you can see mostly everything the engine saw, including things it wouldn't see if it was running through httpd, not directly accessing the filesystem.

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
This is old. by brennz · 2005-02-28 13:29 · Score: 4, Insightful

Why is this being labeled as something new? I remember this being a problem back in 1997 when I was still working as a webmaster.

Whoever posted this as a "new" item, is behind the times.

OWASP covers it!

Lets not rehash old things!
Why bother with phisching scams... by B747SP · 2005-02-28 13:45 · Score: 3, Interesting

This is hardly news to me. When I need a handy-dandy credit card number with which to sign up for one of those, er, 'adult hygeine' web sites, I just google for a string like "SQL Dump" or "CREATE TABLE" or "INSERT INTO" with filetype:sql and reap the harvest. No need to piss about with hours of spamming, setting up phisching hosts, etc, etc :-)

--
I find your ideas intriguing and I wish to subscribe to your newsletter.
Re:Uh huh.... by conran · 2005-02-28 13:55 · Score: 2, Informative

Did you RTFA?

Yep. Did you keep reading it? I'm referring to the methods for when no excerpts are given.
solution by Anonymous Coward · 2005-02-28 13:58 · Score: 3, Insightful

here's a solution thats been tried and seems to work: create metadata for each page as an xml/rdf file (or db field). XPATH can be used to scrape content from HTML et al to automate the process, as can capture from CMS or other doc management solutions. create a manifest per site or sub site that is an XML-RDF tree structure containing references to the metadata files and mirroring your site structure. finally, assuming you have an API for your search solution (and don't b*gger around using ones that dont) code the indexing application to only parse the XML-RDF files, beginning with the structural manifest and then down into the metadata files. Your index will then contain relevant data, site structure, and thanks to XPATH, hyperlinks for the web site. No need to directly traverse the HTML. Still standards based. Security perms only need to allow access to the XML-RDF files for the indexer, which means process perms only are needed, user perms are irrelevant.

There are variations and contingencies, but the bottom line is, even if someone cracked into the location for an xml metadata file, its not the data itself and while it may reveal a few things about the page or file it relates to, certainly is bottom line much less of a risk than full access to other file types on the server.

heres another tip for free. because you now have metadata in RDF, with a few more lines of code you can output it as RSS.