Slashdot Mirror


User: sxxw

sxxw's activity in the archive.

Stories
0
Comments
4
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 4

  1. Mod-ssl and Apache-SSL on On the Commercial Use Of Apache and SSL · · Score: 4

    In general, I would say that it depends on exactly what you're looking for - they're both free, why not evaluate them both and see how they work in your envirnoment.

    I have used and installed both, in both commerical and academic environments. I started out using Apache-SSL, but have now moved over to using mod_ssl.

    Some background - Apache-SSL came first, and ships as a set of patches for the core Apache code. mod_ssl ships as patches, and an additional Apache module. When I last compared them, the fundamental difference was the Apache-SSL just patches itself into the Apache code, mod_ssl extends the Apache module interface definition to allow the SSL functionality to be contained in a module. In general, I have found mod_ssl to be easier to use and debug. It also appears to have more features, although whether thats a good thing probably depends on how much use the features are to you!

    There's more background available from both of the websites.

    Finally, as others have pointed out, if you're wanting to use your server with a wider community, you'll need to obtain a certificate from a recognised CA (this isn't as expensive, or difficult, a process as many make out).

  2. Re:XML? on Is the Internet Becoming Unsearchable? · · Score: 2
    You're thinking of RDF, the W3C's language for embedding metadata information into XML (and by extension XHTML) content. This is great for page specifc information (such as Dublin Core metadata), and can also be used to provide metadata information about collections (such as a set of web pages, or an entire site).

    However (there's always a however) there's the metadata catch. If you divorce metadata from content, then it becomes easy for site admins to lie in their metadata in order to attract vistors. Remember the keywords spamming that used to occur? Now, imagine if thats extended to being able to lie completely about the content of an entire site. Unless you're in an environment where you can trust the providers of your metadata, by and large you're in trouble.

    Cheers,

    Simon.

  3. Re:Black holes on Is the Internet Becoming Unsearchable? · · Score: 1
    Search engine "blackholes" are actually fairly common, either those deliberately created by someone who wants to trap spam harvesting bots, or accidentally, through dynamically generated content or the like.

    However, using the URL is not necessarily the way to avoid this. There's no written rule on how the path section of a URL translates to a query, and its possible to create dynamic content that never uses ? operators. Similarly, there's no requirment on servers to have any correlation between an extension on a URL (such as .html) and the MIME type that they return (which is what you're really interested in).

    To deal with black holes, your best bet is to use some form of depth count on the site that you're indexing - once you've gone down past a certain depth give up. The use of MD5 hashes of content can also help prevent simple recursive trees from being indexed.

    Cheers

    Simon.

  4. Re:Distributed effort ? on Juggernaut GPLd Search Engine · · Score: 2
    Ummmm - no patent - its already been done.

    The main problem, as other posters have commented in doing anything like this in a co-operative fashion is the large commercial value of the results. It also requires those taking part to have a significant amount of bandwidth (to pull in all of the content and then to exchange indexes).

    The spidering part of the process is one of the least processor intensive - once you've completed it you're left with a large glob of data. You then need to convert that into an inverted index, which would still be large and then need passing to a central server, which would then have to do further processing in order to actually merge it in to the whole.

    The Harvest Indexing system (http://www.tardis.ed.ac.uk/harvest) sought to develop a system like this. It seperated the searching and crawling tasks, so it would be possible to have a large number of crawlers (probably topologically close to the sites they were indexing), which then gave their results to an indexing system which collated them and presented them to the world.

    The problem here is that you've still got one large, monolothic system at the indexing end. TERENA, as part of the TF-CHIC project developed a referral system (based on WHOIS+) to allow there to be one central gateway which then passed search requests to a large number of individual engines, each of which could run different software. Kind of like a fancy metasearch engine.

    Originally the plan for devolving things locally was that if the indexes were generated by people who know the pages, then you'll get a higher standard of index. Aliweb, for instance, had a file per server which contained an index of all of the objects available on that server.

    The problem with this is easily shown up by metatag abuse. If the person running the spider has a commerical interest in the sites they're indexing, they'll often go and fabricate the index so that their sites appear higher on searches.

    Cheers.

    Simon.