Slashdot Mirror


The Problem of Search Engines and "Sekrit" Data

Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.

17 of 411 comments (clear)

  1. Oh Yeah? by Knunov · · Score: 4, Funny

    "...search engines are finding password and credit card numbers while doing its indexing."

    This is very serious. Could you please post the exact search engines are query strings so I can make sure my information isn't there?

    Knunov

    --
    Why do users with IDs under 100,000 or over 700,000 usually have the most worthwhile comments?
    1. Re:Oh Yeah? by Karma+50 · · Score: 5, Funny

      Just search for your credit card number.

      By the way, does google have that realtime display of what people are searching for?

      --
      http://www.thehungersite.com
    2. Re:Oh Yeah? by 4of12 · · Score: 2, Funny

      Yeah!

      I just typed in my credit card number and found 15 hits on web sites involving videos of hot young goats.

      --
      "Provided by the management for your protection."
  2. Google exploit patch for Apache by Anarchofascist · · Score: 4, Funny

    % cd /var/www
    % cat > robots.txt
    User-agent: *
    Disallow: /
    ^D
    %

    --
    Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
  3. The Problem of Search Engines and "Sekrit" Data by NTSwerver · · Score: 4, Funny

    Please change the title of this article to:

    The Problem Incompetent System Administrators

    If data is 'sekrit'/sensitive/confidential - don't put it on the web. It's as simple as that. If that data is available on the web, search engines can't be blamed for finding it.

    --
    -----------------------
    Moderator's essentials
  4. I've got a solution! by CraigoFL · · Score: 5, Funny
    Every web server should have a file in their root directory called "secret.xml" or somesuch. This file could list all the publicly-accessible URLs that have all the "secret" data such as credit card numbers, root passwords, and private keys. Search engines could parse this file and then NOT include those URLs in their search results!

    Brilliant, huh? ;-)

    On second thought, maybe I shouldn't post this... some PHB might actually think it's a good idea.

  5. Google exploit patch 0.2 for Apache by Anarchofascist · · Score: 2, Funny
    Oops! Version 0.2 already:

    % cat > /var/www/html/robots.txt
    User-agent: *
    Disallow: /
    ^D
    %

    --
    Once more unto the breach, dear friends, once more, Or close the wall up with our American dead!
  6. Hell, No. by tomblackwell · · Score: 1, Funny

    You should be writing that type of data on the backs of envelopes and leaving them scattered around your living room...

  7. Re:Insert foot in mouth.... by simong · · Score: 2, Funny

    Not necessarily, they are chief executives after all.

  8. Oh, for regular expression searching in Google by EnglishTim · · Score: 5, Funny

    I could be a rich man...

    (Not, of course that I'd ever do anything like that...)

    Searching with regular expressions would be cool, though...

  9. Must... blame... someone.... by JMZero · · Score: 3, Funny

    INetPub means "INetPublic" not "INetPubrobably a great place to put my credit card numbers".

    Why are stupid people not to blame for anything anymore?

    --
    Let's not stir that bag of worms...
  10. Re:A symptom of poor programming... by Brainless · · Score: 4, Funny

    I manage a Cold Fusion web server that we allow clients to post their own websites to. Recently, their programmer accidentally made a link to the admin section. Google found that link and proceeded into the admin secion and indexed all the "delete item" links as well. I found it quite amusing when they asked to see a copy of the logs complaining the website was hacked and I discovered GoogleBot deleted every single database entry for them.

  11. Business Model by Alomex · · Score: 5, Funny

    A while back there was a thread here about the weakness of the revenue model for search engines. Maybe we have found the answer, think about all the revenue that Google could generate with this data!

    Anybody knows when Google is going public?

  12. Blaming Google for this... by night_flyer · · Score: 3, Funny

    Is like blaming the Highway department for speeders...

    --


    Thanks to file sharing, I purchase more CDs
    Thanks to the RIAA, I buy them used...
  13. Re:Stopping Google won't stop the problem... by mobiGeek · · Score: 5, Funny
    but Google undoubtedly uses techniques beyond that of the casual browser

    Uhh...no.

    HTTP is an extremely basic protocol. Google's bots simply do a series of GET requests.

    It would be possible that Google's bots have a database of username/passwords for given sites, but the more likely scenario is that they have stumbled across another way to get the "protected" information:

    • a link which contains a username and/or password
      /protected/show_article.pl?username=foo&passwo rd=bar&num=1
    • a link to the pages which by-passes the protection scheme
      /no_one_can_find_this_cause_Im_3l33t/article1.html
    • someone else posted the information elsewhere, and this is what is actually crawled

    I ran robots for nearly 2 years and was harassed by many a Webmuster who could prove that my robots had hacked their site. They'd show me protected or secret data. It typically took 3 to 5 minutes to find the problem...usually the muster was the problem themself.

    HERE'S A NOTE OF WARNING TO WEBMASTERS:
    Black text links on black backgrounds in really small fonts are NOT secure.

    Maybe I should get this posted to BugTraq...or would MS come after me??

    --

    ...Beware the IDEs of Microsoft...

  14. Re:A symptom of poor programming... by Anonymous Coward · · Score: 1, Funny

    First thing I do when bored and surfing (porn sites) is whenever I see a new link with a non-standard index page (anything other than index.html) I chop it off and see if I can get a listing. Since most porn sites run on Apache, and Apache by default does not disable directory listings, and since most porn sites are designed on Windows (can you say index.htm?), and since the default Apache index page is index.html, this leads to a great deal of free fun.

  15. Re:Nice work, Legion303. by well_jung · · Score: 3, Funny
    "Trees cause more pollution than automobiles do." --Ronald Reagan '81

    --
    Carl G. Jung
    --
    "With one breath, with one flow, You will know Synchronicity" -La Policia