ger · Slashdot Mirror

Re:Delay on W3C Gets Excessive DTD Traffic · 2008-02-10 18:23 · Score: 1

I think this is an excellent idea, thanks.

We considered tarpitting before, I think we were always scared off by the prospect of having to keep tens of thousands of connections open.

Does anyone have specific software to recommend that is able to keep that many connections open on a typical cheap Linux box? (Lighttpd? Nginx? Varnish? Yaws?)

The implementation I'm thinking might work well is:

Switch www.w3.org to use some lightweight server software that is able to keep lots of connections open, and configure it to serve DTD files with an artificial 5 second delay. Proxy all the other requests to our existing Apache server running elsewhere (possibly on another port on the same system)

Most people shouldn't notice or care about the delay for DTD files, only the apps that are requesting them hundreds or thousands of times in a row will notice.

W3C's current traffic is something like:

- 66% DTD/schema files (.dtd/ent/mod/xsd)
- 25% valid HTML/CSS/WAI icons
- 9% other

So we'd probably want to configure the lightweight server to serve those icons too (but then it would have to do conneg as well)

Re:Oy Vey... on W3C Gets Excessive DTD Traffic · 2008-02-09 08:24 · Score: 1

Responsible developers use a URL that links to their OWN COPY of the DTD. ANYTHING else is just leeching from W3C. PERIOD.

No no no, that's not the intent at all, documents should continue to point to DTDs on W3C's site. In fact the next version of W3C's markup validator will issue a warning if the FPI and system ID do not match.

People who are simply creating HTML documents generally don't need to worry about this issue at all, sorry if the article was unclear.

Re:Submitted this to /.? on W3C Gets Excessive DTD Traffic · 2008-02-08 18:45 · Score: 4, Informative

650 times as many hits. (163 times as many bytes.) But that's just from a quick sample.

Re:Submitted this to /.? on W3C Gets Excessive DTD Traffic · 2008-02-08 14:47 · Score: 5, Informative

To try to help put these numbers into perspective, this blog post is currently #1 on slashdot, #7 on reddit, the top page of del.icio.us, etc; yet www.w3.org is still serving more than 650 times as many DTDs as this blog post, according to a 10-min sample of the logs I just checked.

dealing with http logs on busy sites on The Real Problem With Alexa · 2007-07-23 13:19 · Score: 3, Interesting

At W3C we log almost everything as well, and we end up with way too much data as a result.

But we use the logs to detect and prevent certain classes of abuse as well (e.g. too many requests in a short time interval or re-requesting the same resources over and over), and we also want to be able to track trends over time, so we have been reluctant to just throw that data away.

I have a plan that I have yet to implement, which is to log only 0.001% of the requests for certain very popular resources (e.g. HTML DTDs and valid-HTML icons), which would allow us to monitor trends without logging tens of gigs of data per day; we'd just need to compensate for it when calculating stats later.

Then I planned to monitor for abuse by also logging every request to a script that watches for abusive traffic patterns, an easy adaptation from the current script that wakes up and skims the logs every 10 mins.

(in your journal entry, when you say you are MD5ing IP addresses for privacy reasons, are you adding a random bit of data to the IP address before calcuating the MD5? If not it's pretty easy to find out which IP address corresponds to a given MD5 sum.)

Re:cool on Apache Rejects Sender ID · 2004-09-02 14:28 · Score: 1

I did:

grep 'Not authorized by SPF' /var/log/exim4/mainlog | wc -l

on our mail hubs.

a few apache subdomains have txt records on Apache Rejects Sender ID · 2004-09-02 08:21 · Score: 2, Informative

some apache.org subdomains have txt records:

$ host -t txt xml.apache.org
xml.apache.org TXT "v=spf1 mx -all"

w3.org started rejecting forgeries based on SPF records about a week ago, and has been rejecting about 10000 forgeries/day since then, including:

52 jakarta.apache.org
18 xml.apache.org

a few other domains that have been forged and rejected according to their SPF records:

1628 amazon.com
222 gmail.com
175 redhat.com
129 lists.sourceforge.net
17 sourceforge.net

(numbers above are # of rejections in the first week)

Re:Would someone explain this to a simpleton? on AOL Now Publishing SPF Records · 2004-01-09 03:22 · Score: 1

No... if the MTA is SPF-enabled it can reject it immediately (while still talking to the sending relay), without causing a bounce to the forged address.

the WWW *is* content-addressable on P2P Content Delivery for Open Source · 2003-01-30 09:00 · Score: 1

Regarding:

This document specifies HTTP extensions that bridge the current location-based Web with the Content-Addressable Web. -- HTTP Extensions for a Content-Addressable Web

The World Wide Web is "the universe of network-accessible information", i.e. anything with a URI, including URIs that are not tied to a particular hostname.

The Web already includes non-location-based URIs like mid: (for referring to message-ids), and urn:sha1: for referring to a specific set of bits by their checksum.

This proposal seems like a decent way of bridging HTTP-space with URN-space, but please remember that the Web is more than just HTTP. (see also: URIs, URLs, and URNs)

Anyway, it seems to me that sites that tend to suffer from slashdotting are:

those that use dynamically-generated pages for what is basically static content: this problem can be fixed by sites making sure their content is cacheable, and further deployment of HTTP caches. (I'm not convinced a p2p-style solution is the solution here.)
those with large bandwidth needs (kernel images, linux distribution .iso's, multimedia): as p2p software becomes more mature and widely deployed, everyone will have a urn:sha1: resolver on their desktop (pointing to their p2p software of choice), then whenever a new kernel is announced, the announcement can say:
Linux kernel version 2.4.20 has been released. It is available from: Patch: ftp://ftp.kernel.org/pub/linux/kernel/v2.4/patch-2 .4.20.gz a.k.a. urn:sha1:OWXEOVAK2YJW3G6XSULXDWFCNWTX7B2K Full source: ftp://ftp.kernel.org/pub/linux/kernel/v2.4/linux-2 .4.20.tar.gz a.k.a. urn:sha1:PPWXYMA32YNDNO35UD3IQTCWBVBYK5DC

and people can just fetch the files using urn:sha1 URIs instead of everyone hitting the same set of mirrors. (gtk-gnutella already supports searching on urn:sha1: URIs)

Re:Prefetch paranoia on Mozilla 1.2 Unleashed · 2002-11-27 20:24 · Score: 1

Using "Click here to complete your purchase" as a regular hypertext link (i.e. href="foo") would be a violation of the HTTP protocol, so any sites that do that are broken, and should not be used. (see further reading, if you're interested.)

In general, it sould be safe to prefetch any URLs using HTTP's GET method.

I am very happy to hear about prefetching in Mozilla -- I have been wanting this feature for years!

checksum-based URIs and P2P systems on Kernel.org Needs Some Help, Perl Foundation Got Some · 2002-01-20 19:13 · Score: 1

I think widespread deployment of checksum-based URIs like urn:sha1 could help solve this problem.

mini-canadian flag on What Does Your Command Prompt Look Like? · 2001-07-06 13:54 · Score: 1

I have a mini canadian flag in my prompt, using an asterisk for the maple leaf.

Here's a screenshot, and here's the .bashrc stuff used to do it.

dumb optimization on Secure Shell Will Remain 'SSH' · 2001-03-22 06:15 · Score: 1

Why would anyone type 'ln -s /usr/bin/secsh /usr/bin/ssh' when you could just type 'ln -s /usr/bin/s{ec,}sh'?

Re: But will it..... on The Most Powerful Mouse in the World · 2000-12-12 21:18 · Score: 1

I once saw a mouse that could change its own ball: see mpeg movie

tunnel IRC over ssh on Secure Instant Messaging Systems? · 2000-09-24 02:24 · Score: 1

Just tunnel IRC (or something) over ssh: works fine, is easy to set up, and you're not reinventing either wheel. (there are plenty of ssh and IRC clients available for most platforms)

Re:Waaaahhh on Microsoft's IE 5.5 Flouts Industry Standards · 2000-07-14 00:40 · Score: 1

Uh... the validator validates slashdot to HTML 3.2 because slashdot claims conformance with HTML 3.2. (in the doctype declaration, the first line of the file)

Given that HTML 3.2 is three and a half years old, what do you expect?

pair.com rocks! on On The Subject of Web Hosting · 2000-01-15 14:51 · Score: 1

I've hosted my site with pair Networks for the last few years, and have been extremely happy with them.

It's amazing what you get with a webmaster account for only $29/month -- 120M disk (and extra is cheap), 400M/day bandwidth, virtual FTP server, modern Apache http service, CGI scripts anywhere, shell access, unlimited email aliases, etc.

And they're extremely well-connected (redundant DS3s); cumulative downtime over the last few years has been maybe a few hours.

I don't get anything for plugging them, I'm just a happy customer. Oh, and unlike most sites, their own web site doesn't suck.

Slashdot Mirror

User: ger

Comments · 17