New Web Application Attack - Insecure Indexing
An anonymous reader writes "Take a look at 'The Insecure Indexing Vulnerability - Attacks Against Local Search Engines'
by Amit Klein. This is a new article about 'insecure indexing.' It's a good read -- shows you how to find 'invisible files' on a web server and moreover, how to see contents of files you'd usually get a 401/403 response for, using a locally installed search engine that indexes files (not URLs)."
Never give web-executable scripts more permissions than absolutely required. If the search engine has permission to read sensitive documents, and web users have access to this engine... well duh. It's just common sense.
Quid festinatio swallonis est aetherfuga inonusti?
Africus aut Europaeus?
Basically the article says that some site-installed search engines that simply index all the files in /var/www or whatever are insecure because they will index things that httpd would return a 401 or 403 for. Makes sense. A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.
Does anyone know if the Google search applicance is affected by this?
- Cary
--Fairfax Underground: Where Fairax County comes out to play
Search engines let you find stuff! This is precisely why google, yahoo, and all the rest obey robots.txt Personally, I would be amazed if local search engines didn't have their own equivalent of robots.txt that limited the directories they are allowed to crawl.
that's all nice and good, personally I think files that were never meant to be indexed make for the best reading by far !
MP3 Search Engine
The instances mentioned all seem to revolve around the idea of indexing files. Could the same be used for database driven sites? You know, like the old search for "or 1=1" trick?
Then again, it's about being organized, isn't it? A check of what should and shouldn't be allowed to go public, some sort of flag where even if it shows up in the result, it better not make its way onto the HTML being sent back. (I figure that's more DB-centric though)
Last madman rant -- Don't put anything up there that shouldn't be for public consumption to begin with!!! If you're the kind to leave private XLS, DOC, MDB, and other sensitive data on a PUBLIC server thinking it's safe just because nobody can "see" it, to put it delicately, you're an idiot.
Entrepreneur : (noun), French for "unemployed"
I read the article and it seems to be like a good chunk of todays security papers, 'heres a long drawn out explanation of the obvious', I suppose it wasn't as long as it could be, but really ... using a search engine to find a list of files on a website? I suppose someone has to document it..
I mean, I understand its a little more complex as described in the article- but i would hardly call this a 'new web application attack', at best perhaps one of those humorous advisories where the author overstates things and creates much ado about nothing- or at least thats my take;
-1 not profound
Oh, we are terribly sorry for taking so long!
Don't worry, we will give you a full refund.
On a site with mixed security levels (i.e. some anonymous and some permission-based access) the "proper" thing to do is to check security on the results the search engine is returning.
That way an anonymous user would see only results for documents that have read permissions for anonymous while a logged-in user would see results for anything they had permissions to.
Of course this idea works fine for a special purpose database-backed web site but takes a bit more work on just your average web site.
Crawling the site via localhost:80 is the most secure method for a normal site. This would index only documents available to the anonymous user already and would ignore any unlinked documents as well.
Coding Blog
Why is this being labeled as something new? I remember this being a problem back in 1997 when I was still working as a webmaster.
Whoever posted this as a "new" item, is behind the times.
OWASP covers it!
Lets not rehash old things!
here's a solution thats been tried and seems to work: create metadata for each page as an xml/rdf file (or db field). XPATH can be used to scrape content from HTML et al to automate the process, as can capture from CMS or other doc management solutions. create a manifest per site or sub site that is an XML-RDF tree structure containing references to the metadata files and mirroring your site structure. finally, assuming you have an API for your search solution (and don't b*gger around using ones that dont) code the indexing application to only parse the XML-RDF files, beginning with the structural manifest and then down into the metadata files. Your index will then contain relevant data, site structure, and thanks to XPATH, hyperlinks for the web site. No need to directly traverse the HTML. Still standards based. Security perms only need to allow access to the XML-RDF files for the indexer, which means process perms only are needed, user perms are irrelevant.
There are variations and contingencies, but the bottom line is, even if someone cracked into the location for an xml metadata file, its not the data itself and while it may reveal a few things about the page or file it relates to, certainly is bottom line much less of a risk than full access to other file types on the server.
heres another tip for free. because you now have metadata in RDF, with a few more lines of code you can output it as RSS.
Bleedingly obvious and written in sufficiently pompous style that you feel obliged to read the whole thing just to verify that there really is nothing there that hasn't been common knowledge for the better part of the last decade.
Of course in those days people actually built their sites using static HTML...
Andy Armstrong
Anything I put on a publicly-acessible web server, I want publicly accessible, and I want it to be as easily accessed as possible.
Anything else goes on a pocket network or not at all.
The only exception would be an order form, and that will be very narrowly designed to do exactly one thing securely.
Incidentally, it also breaks properly-designed retrieval mechanisms
if they break, how can they be properly designed ?
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter