Slashdot Mirror


User: accidentalGeek

accidentalGeek's activity in the archive.

Stories
0
Comments
6
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 6

  1. Non-Free = Less Portable on OpenOffice 2.0 Criticized on Use of Java · · Score: 2, Informative

    If you think that the Java license is not a problem, try running Java apps on a non-Intel Linux platform such as linux/ppc. Sun does not make a JRE for linux/ppc so the choices come down to IBM Java (which is also non-free, crashes frequently and does not support the 1.5 spec), Blackdown (which is non-free and seems to be stalled at 1.3), and the free JREs such as Jikes which will always be behind the curve as RMS points out.

    These problems are not incidental. They're a necessary consequence of the non-free license. Fewer developers are allowed to work with the code. This lack of resources directly translates to less portability. It also lengthens the bug fix cycle, slows the adoption of new features, and places supreme power in the hands of the copyright holder. If you require big changes to a free software product, you have the power to make those changes or hire someone else to make them for you. If you require big changes to a non-free product, you're at the mercy of the copyright holder.

    In the case of Java, the source is not as open as Sun would like you to believe. Parts of it are open. Other parts are locked away in binary files. You need an existing Sun bytecode compiler (on a platform supported by Sun) to build Java from source. This necessarily precludes porting it to other platforms without assistance from Sun. This is why the folks at blackdown needed to sign special agreements with Sun before they were granted access.

    I love Java. It's quickly becoming my favorite programming language, but I also have to agree with RMS that the license is problematic. Great language. Dangerous platform.

  2. Unfair to Patterson on Johnny Can So Program · · Score: 1

    The Paterson interview is all about the coding competition and how wonderful it is that so many students from around the world take part. The headline "Can Johnny still program" was probably slapped on the write-up by an overzealous editor eager to capture eyeballs. Most likely, Patterson had nothing to do with it. Then, Matloff comes along, grabs hold of the headline, ignores the content, and twists the interview to pound on what seems to be his favorite political issue. In the process, he labels Patterson as a shrill alarmist crying for more H1-B visas. Nevermind that, in the interview, Patterson has nothing to say about visas and very little to say about why the US team finished where it did. He's focused entirely on how wonderful the competition is. As both a UC graduate and an ACM member, I don't know whether to be amused or chagrined.

  3. Re:Won't work: Robots don't send the referrer on Millions of Pages Google Hijacked using ODP Feed · · Score: 2, Insightful

    More precisely, googlebot always sends the same referrer. Here's a snippet from an apache access log.

    -----------------
    64.68.80.4 - - [01/Mar/2005:16:19:24 -0500] "GET /robots.txt HTTP/1.0" 200 770 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)
    ------------ -----

    In practice, a static referrer and no referrer amount to the same thing so you're right from a practical standpoint. The referrer is not useful.

    But that's OK because the system I described does not depend on the referrer header. If a referrer header is available, it will use it as a shortcut to determine that if client was referred by an internal link and potentially bypass the whole redirect process. This saves system and and network use for the majority of cases when the client is an ordinary web browser, but it's not essential and clearly won't be useful when the client is googlebot (or some other robot that does not provide a referrer).

    If the client is a googlebot, the filter will see that there's no referrer. It will then check its stateful cache to determine if it has seen this robot recently. If so, it will let the robot right through and the request will be procesed normally. If not, it will issue the slightly obfuscated 301 redirect. When the robot follows this redirect, the filter will be invoked again. This time, it will recognize the robot from its previous visit and will let it through.

  4. Re:HTTP 301 filter on Millions of Pages Google Hijacked using ODP Feed · · Score: 2, Informative

    Ach! this leads to an endless loop. Please note my revised (and more complicated)version

  5. Possible defense: HTTP 301 filter on Millions of Pages Google Hijacked using ODP Feed · · Score: 2, Interesting

    I haven't tried this. It's just an idea knocking around in my head.

    What would happen if I set up a stateful filter on my web server that did the following?

    1. If the http client provided a referrer header and that header contains my own domain name, exit (and let the request be processed normally)

    3. Record the user agent header, client IP address, and current timestamp in some sort of temporary lookup table

    4. Issue a http 301 with an absolute URL that points to the current page but with some technically insignificant rewrite from the way that the client requested it. For example, if the request is a simple GET, append a "?" or "&"

    If the client was not referred by an internal link, this filter would instruct the client to reload the page in a way that insures that it knows the correct, full URL.

    By itself, this would simply cause an infinite loop which a robot would probably detect. That's where the temporary lookup table and slightly modified URL come in. I left step two out of the list above because it does not apply until the second time the agent hits our page:

    2. Consult the lookup table. If this agent already hit this page within the last n seconds, exit and allow the request to be processed normally.

    I don't know much about how robots such as googlebot behave. I'd love to see a reply from someone who knows more than I do.

  6. HTTP 301 filter on Millions of Pages Google Hijacked using ODP Feed · · Score: 1

    I have a solution in mind that may or may not work depending on robot/indexer behavior.

    What would happen if I installed a filter on my Web server (in pseudocode):
    if (!(http.referrer matches(my.domain))
    {
    send(301, target=full_url)
    }

    In English, if the client was referred by an external link, immediately issue a 301 redirect to the full URL of the current page. This should inform the robot that it is now looking at a new site and should start indexing content under a URL that I'm positive that I control.

    This solution will collapse if the robot refuses to follow the 301 or does something that I don't expect.

    I'd be interested to see a response from someone who understands spyders (since I don't).