New Web Application Attack - Insecure Indexing
An anonymous reader writes "Take a look at 'The Insecure Indexing Vulnerability - Attacks Against Local Search Engines'
by Amit Klein. This is a new article about 'insecure indexing.' It's a good read -- shows you how to find 'invisible files' on a web server and moreover, how to see contents of files you'd usually get a 401/403 response for, using a locally installed search engine that indexes files (not URLs)."
right?
time is a perception of a being's consciousness
time is your 6th sense, the wierd ones are 7+
the department-of-the-bleedingly-obvious...
was'nt there already one?
to see if you can get the site's robots.txt as the files/directories in that file are sometimes full of goodies.
Even though here's about internal indexing, it reminded me of the old fashion google indexing: Search google with some sensitive terms such as : 'index of /' *.pdf *.ps
Never give web-executable scripts more permissions than absolutely required. If the search engine has permission to read sensitive documents, and web users have access to this engine... well duh. It's just common sense.
Quid festinatio swallonis est aetherfuga inonusti?
Africus aut Europaeus?
Basically the article says that some site-installed search engines that simply index all the files in /var/www or whatever are insecure because they will index things that httpd would return a 401 or 403 for. Makes sense. A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.
Does anyone know if the Google search applicance is affected by this?
- Cary
--Fairfax Underground: Where Fairax County comes out to play
Search engines let you find stuff! This is precisely why google, yahoo, and all the rest obey robots.txt Personally, I would be amazed if local search engines didn't have their own equivalent of robots.txt that limited the directories they are allowed to crawl.
The attacker first loops through all possible words in English...
I get the idea this might take a while.
The article saysThe attacker first loops through all possible words in English
I mean is this not a bit too ridiculous. (Esp if the inaccessible file is someone's personal outdated webpage). If it is anything useful(to a hacker or other persons involved in illegitimate acitvity) then the technique will most probably fail.
I am not saying that there is no vulnerability (the get data from search snippets is a good idea), but the third option I just quoted above seems to be pretty lame
The instances mentioned all seem to revolve around the idea of indexing files. Could the same be used for database driven sites? You know, like the old search for "or 1=1" trick?
Then again, it's about being organized, isn't it? A check of what should and shouldn't be allowed to go public, some sort of flag where even if it shows up in the result, it better not make its way onto the HTML being sent back. (I figure that's more DB-centric though)
Last madman rant -- Don't put anything up there that shouldn't be for public consumption to begin with!!! If you're the kind to leave private XLS, DOC, MDB, and other sensitive data on a PUBLIC server thinking it's safe just because nobody can "see" it, to put it delicately, you're an idiot.
Entrepreneur : (noun), French for "unemployed"
4750 ....
vodka, straight up, thank you!
Does anyone know if the Google search applicance is affected by this?
.htaccess rules your server uses. Second, you can set it up so that users need to authenticate themselves. Third, there are many filters you can set up to prevent it from indexing sensitive content in the first place (except that since any sensitive content the google appliance indexes must already be accessible via an external http connection, one hopes it's not too sensitive).
No. First of all, the Google Search Appliance crawls over http, and therefore obeys any
by design? Surely something with permission to index internal files (even those specified to give 403s etc) is inherently designed to make them available to view.
Either that, or it's a user error (configuration).
How many people can read hex if only you and dead people can read hex?
Is it possible given the time and perseverence to exploit a vunerability in a search engine's parsing of a webpage say, you maliciously published somewhere? Obviously one would expect google and the likes to have good security (well apart from the gmail exploit and... well lets not go there), so I was curious has it ever been done? (ponders)
Summary; If you are going to use magic to index your web site, be smart about it. Don't just blindly use a tool that "does the job".
Nothing new here.
---
I read the article and it seems to be like a good chunk of todays security papers, 'heres a long drawn out explanation of the obvious', I suppose it wasn't as long as it could be, but really ... using a search engine to find a list of files on a website? I suppose someone has to document it..
I mean, I understand its a little more complex as described in the article- but i would hardly call this a 'new web application attack', at best perhaps one of those humorous advisories where the author overstates things and creates much ado about nothing- or at least thats my take;
-1 not profound
bastards always hiding their stash. this'll show 'em
goto any P2P network and type @hotmail.com, @Gmail.com or @yahoo.com and see what documents turn up.. I'm willing to put money on them all being e-mails saved on idiots PCs which will contain everything from stuff to sell to spammers (if your so inclined), to sexual stuff and passwords/creditcard info.
Nothing really new here..
I like muppets.
If you could give the crawler multiple starting points then you could simply have an unlinked page that links to all the old content, and give that page to the crawler as a second starting point.
If I have been able to see further than others, it is because I bought a pair of binoculars.
"Reconstructing" files by searching every word in the english language in different orders? I want the last 5 minutes of my life back...
My company specialises in search engine technology (for almost a decade now). I've worked quite in-depth with all the big boys (Verity, Autonomy, FAST, ...) and many of the smaller players too (Ultraseek, ISYS, Blue Angel, ...)
I can't recall the last time this kind of attack wasn't mentioned in the documentation for the product, along with instructions on how to disable it. If you choose to ignore the product documentation, you get what you deserve.
It's quite simple folks. Don't open the search engine. ACL query connections. Sanitize queries like you (should?) do other CGI applications. Authenticate queries and results. If you can't be bothered, hire someone who can.
Matt
...to find all the "free sample" pr0n hidden in the maze of otherwise unintelligble directories. In the end, isn't that what the Internet is all about -- finding more efficient ways to see boobies? Yes...yes I think so.
This is even more important when a search engine (appliance) is capable to crawl the file shares directly (not just over HTTP).
EnterFind appliance (which I participated in developing) has this (still unique) feature and their clients were amazed by what the crawler can dig out. Especially in those "hidden" fields in the Office documents.
All these "attacks" assume the indexing program will index and return results for files you dont have access to.
Im pretty sure the indexing server on Windows won't return 'search results' for files you dont have permissions to list. As with any other sensible indexing schemes, except perhaps the newer silly 'desktop search' tools. Seems pretty obvious to me.
I.O.U One Sig.
my mind being the way it is, i can't help but think of an application for this in porn ;). a lot of porn sites have extensive free previews, but its hard for someone to find all the free preview pics for a certain site (useful especially for a single model's site) unless you can find a direct link to every single unique free preview gallery from somewhere, and you'll undoubtedly miss some good stuff. i want to see a firefox extension that gets me all the free pics from a given site damnit!
Oh, we are terribly sorry for taking so long!
Don't worry, we will give you a full refund.
I guess a lot of people have seen this site before, but http://johnny.ihackstuff.com/index.php?module=prod reviews
has a lot of these google exploits etc, he is posting them up so people can check if their sites are secure. There are some interesting presentations by him on the main site about how search engines can be exploited.
Another exploit can out this weekend. The funny thing is that microsoft antispyware beta 1 detects the execution of the payload file and shows a prompt if you want continue or stop the execution.
Have you ever been to a turkish prison?
Please put this new undocumented tag on your robots.txt file: "hackthis=false" "xss=false" "scriptkiddies=log,drop" And all you problems will be solved.
http://www.michel.eti.br
On a site with mixed security levels (i.e. some anonymous and some permission-based access) the "proper" thing to do is to check security on the results the search engine is returning.
That way an anonymous user would see only results for documents that have read permissions for anonymous while a logged-in user would see results for anything they had permissions to.
Of course this idea works fine for a special purpose database-backed web site but takes a bit more work on just your average web site.
Crawling the site via localhost:80 is the most secure method for a normal site. This would index only documents available to the anonymous user already and would ignore any unlinked documents as well.
Coding Blog
Yuhn. we wannt the langwich opshun fer "idiot"
1) write your own web applications
2) Use lucene
3) only index what you want to index
4) ????
5) profit
The problem is these are perfectly legal search engine queries. No matter how you "sanitize" the queries, that won't help, because they contain valid requests. The vulnerablity lies at the side of the indexing program, not the query/search/display one. The indexer indexes things it shouldn't. Files inaccessible normally through httpd are accessible in the search database.
/var/www/... but http://localhost/. This way the indexer won't be able to access anything else common user can.
A method I see for that would be running the indexing by piping it through httpd, make even local indexing go the same way remote indexing is being done - not indexing
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Why is this being labeled as something new? I remember this being a problem back in 1997 when I was still working as a webmaster.
Whoever posted this as a "new" item, is behind the times.
OWASP covers it!
Lets not rehash old things!
I find your ideas intriguing and I wish to subscribe to your newsletter.
here's a solution thats been tried and seems to work: create metadata for each page as an xml/rdf file (or db field). XPATH can be used to scrape content from HTML et al to automate the process, as can capture from CMS or other doc management solutions. create a manifest per site or sub site that is an XML-RDF tree structure containing references to the metadata files and mirroring your site structure. finally, assuming you have an API for your search solution (and don't b*gger around using ones that dont) code the indexing application to only parse the XML-RDF files, beginning with the structural manifest and then down into the metadata files. Your index will then contain relevant data, site structure, and thanks to XPATH, hyperlinks for the web site. No need to directly traverse the HTML. Still standards based. Security perms only need to allow access to the XML-RDF files for the indexer, which means process perms only are needed, user perms are irrelevant.
There are variations and contingencies, but the bottom line is, even if someone cracked into the location for an xml metadata file, its not the data itself and while it may reveal a few things about the page or file it relates to, certainly is bottom line much less of a risk than full access to other file types on the server.
heres another tip for free. because you now have metadata in RDF, with a few more lines of code you can output it as RSS.
...let j0hnny do all the work for me.
I mean with the 0 in his name and everything, I know he's good.
Crawling over http with a single privilege level would address this. Multiple privilege levels is exactly the problem at hand. Presumably the crawler has a tasty privilege level..
http://www.webappsec.org/pr_120304.html
Firefox having another exploit, and Micrisoft's new beta software fixing it. You won't see it on Slashdot's front page.
Posting anon because this is both off-topic and against the majority mindset.
Anonymous? I sent that in and I demand recognition!
Anything I put on a publicly-acessible web server, I want publicly accessible, and I want it to be as easily accessed as possible.
Anything else goes on a pocket network or not at all.
The only exception would be an order form, and that will be very narrowly designed to do exactly one thing securely.
What if the file system supported an index attribute that proper search programs (windows search, google desktop, UNIX locate, etc) could respect?
chmod -i file
With the search vendors racing to own desktop search and microsoft working on WinFS, is "indexability" now an important security attribute for a file?
... leave IT decisions to engineers, not the managers!
Once upon a time, intelligent people were responsible for computers and IT.
Now, it's either a manager, or a bunch of kids ("web developers") who don't know what they are playing with.
Of course there are plenty of exploits waiting to be discovered that WILL get those documents off your web server.. UNLESS you are smart enough to keep them elsewhere.
I realize this is a flamebait as good as they get - but please understand that I will just duck. It was not intended as such.
A smarter way to do such a thing would be to "crawl" the whole site on localhost:80 instead of just indexing files, that way .htaccess and the such would be preserved throughout.
That would not help much. Most sites have different content depending on the IP address accessing the content, i.e. internal IP:s get content that external IP:s cannot access. Crawling on localhost:80 would remove the non-linked files, but still gives the search engine access to a lot of content that should not be indexed.
The only safe crawler is one that is located outside your network.
What really scares me, though, is that this idea is somehow seen as new. It is blatantly obvious that one does not get good or proper results by indexing files locally. For example, you get an index of your PHP script's source code (including the database passwords they likely contain) instead of the output from them. And it doesn't follow any .shtml includes etc. either.
Even the fact that a search engine crawler running from an internal IP will be able to access and index content that shouldn't be externally available is very obvious.
What the article possibly adds, is a list of ideas about what to search for in the affected organization's index. But I wouldn't consider the idea new in any way.
This, or rather its sibling with internal IP:s, was something that we designed the robots.txt file for back in '97 when our university bought it's first search engine. I refuse to believe that nobody has written an article about this idea until now.
But if this is the first article about this, and if people actually find it interesting and revealing, then it was really fortunate that it got written now rather than in ten more years.
You can get some idea of just how easy it is to commit "identity theft" by visiting this secret URL.
..... so they probably would see someone taking stuff out .....}
On that basis alone, I'm not massively bothered about putting intact gas bills &c. in the recycling. Other people's identities are easier to steal! {And beside which, there are CCTV cameras to make sure nobody is putting the wrong stuff in the wrong bin
The old break-out-of-quotes trick is IMHO a different kind of vulnerability, in that it's really a programming bug. There is no reason, other than a programmer being too stupid/ignorant to escape quotes (or for most burger-flippers-turned-programmers, to even know that it's possible to escape quotes or to use prepared statements), for that happening. For that matter, also too ignorant to know that the "LIKE" operator isn't really a full text search engine.
The search index problem is similar, but not quite. The search machine works as intended, it just has access to more data than the site owners realize.
Now it _can_ also be a programemr error, but most often, it's a design error. People just haven't even given any thought to security there, and thus implemented a system that is broken as designed.
You'd be surprised how quickly people can skip over any security considerations. Especially when they can find half an excuse. Even a stupid one, like "but we don't link to that file, so it's safe." Or worse, "but we're using SSL and we're behind a firewall, so _of_ _course_ we're secure. No need to worry about security."
A polar bear is a cartesian bear after a coordinate transform.
Twitter, you're a petulant cock-gobbling sycophant to Linux Torvaldyos! Quit taking DP from ESR and RMS's feculent cocks and why don't you try to stop sucking quite so much? Get out of your parents' basement and see the real world - maybe then you'll see how pathetic you sound, with your neverending stream of bullshit about how Microsoft is stalking you. Wasn't it you who said that Microsoft believes your insane ranting is actually a threat to them, so they PAY PEOPLE to reply to you on Slashdot? No sir, I don't get any money. I do it for the love. Someone has to go up against your paranoid whining. So get back in your cage and shut the fuck up already.