Quite simply, people should not be using URLs to authenticate a site. I created a Mozilla bug 184881 to try and address this, by making the SSL certificate more obvious. Bug 228524 is one person's attempt at this, effectively removing the URL bar and replacing it with fields identifying the hostname and SSL/TLS identity.
I took a look at some of the spec and some of the samples, and you're totally right. There's a lot that can be done here to reduce the complexity of the markup. It seems like they did everything they could to specify every detail of every note every time that note was needed.
In addition, breaking some of the information out of *this* spec into another namespace (like all of the MIDI-related stuff), as well as using existing namespaces like RDF for meta-data, would go far into simplifying some of this.
Maybe version 2 will address some of this complexity.
3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable.
How do you read this in a way that allows binary redistribution without an obligation to redistribute source code? Am I missing something?
The FAQ talks about this in some depth, but I don't really buy its objections either. A web server front-end to a caching HTTP proxy would completely solve this problem, but there are misconceptions that nobody seems able to look past:
1. that it violates copyright of the content owner
If you're publishing something over HTTP, and create HTTP caching headers that permit caching, you are implicitly authorizing retransmission. A normal caching HTTP proxy certainly wouldn't be the target of copyright infringement suits even if these caching headers were absent; it's simply relaying content the way HTTP was designed. If the content author doesn't like the way that works, he should not publish his content over HTTP.
2. that it would screw up web page advertisements or hit tracking
Not if it was done properly. Every HTTP resource can have its own set of HTTP caching headers. Advertisements that are changing dynamically for each request should not express a willingness to be cached. That ad would be re-fetched for every visitor hitting the page. The content itself may not change, so it makes sense for that resource to be cacheable.
The point is, these HTTP-defined policies are defined on the origin server, where the content originates. If that content carries cache-friendly HTTP headers, why not use them?
Apache can act as a caching HTTP proxy. It can also rewrite incoming requests (and outbound HTTP headers) to masquerade one URL under its own. These systems can work together to present an HTTP cache under a different, Slashdot-controlled URL.
If the origin server doesn't want to be cache-friendly, this solution wouldn't work, since every request would have to be made back to the origin server anyway, and they get Slashdotted just as quickly. But that's also kind of dumb of them, and they deserve what they get as a result of that policy decision.
Or hell, if nothing else, why not post the Google cache? Google seems to deal with this issue somehow. IMO, the powers at Slashdot simply don't want to deal with this issue. Maybe it's laziness, or maybe it's corporate greed (their bandwidth at $0 or ours at $X), I don't know.
Of course, ISP's could also stand to improve their own HTTP caching policies (transparent proxying, for example, or at least offering proxies for users to voluntarily use). AOL seems to be able to do this without causing problems for users. Why can't other ISP's?
Because these are not devices that communicate with the Internet Protocol. Just because there are a lot of IP addresses in IPv6 doesn't mean we should start handing them out to everything that needs an ID number.
There still may be merit to considering the use of one common "ID space" for drawing these IDs from (perhaps allocating a prefix to each type of ID), but this doesn't really seem useful.
A lot of your summary seems already available in HTTP, if crudely. HTTP can:
fetch metadata (like an ETag, which is a hash uniquely identifying an HTTP resource), depending on whether or not it's new (via Last-Modified, or If-None-Match in conjunction with known ETags)
proxy requests, at the request of the origin server (i.e., "Thanks for your request, but please use this proxy server to get your response:...") via the 305 response code
retrieve "shards" of a resource via partial GETs (using the Range header and 206 response codes)
A "web torrent" could be built with a quantity of participating caching HTTP proxies. You'd make a request against your local "web torrent" proxy, which, if it didn't have a recent copy of the resource, would make partial GETs of a set of nearby "web torrent" proxies until it retrieved everything.
Granted, this isn't nearly as efficient as a protocol more suited to the task (such as BitTorrent itself), but it seems like a workable solution without changing web browsers or servers.
I long for the day when software-defined radio can allow me to use one single device and have it adapt through software to whatever task I want it to do.
Imagine having a PDA that can pick up HDTV signals, calculate GPS positions, make calls via GSM and monitor your local police frequencies. It could communicate via any of the 802.11 standards, participate in a Bluetooth network, unlock your car and open your garage door.
And at the bottom of it all is a software stack, able to monitor and analyze any kind of RF signal and dynamically demodulate, decode it all, and respond.
You don't need to know the fields. Anyone and everyone that can honestly claim proficiency with both Solaris and BSD architectures has needed to obtain a "full" list of all processes running on the system. This means they've done a "ps -ef" and, given that they're proficient in BSD also, know the difference between this command and the BSD equivalent.
For those people, it's quite clear which platform gives you more data with that command.
I'd give half credit for someone that responded saying they'd need to read the man page, because that shows me that they know where to look up information they don't know off-hand, but I agree with the original poster: anyone claiming that level of proficiency should be able to make this comparison without one.
I'm curious to know how often some other RF hops are used for typical network traffic. I hear of some sites with satellite or 802.11-based point-to-point network connections. How secure are those? It's very likely that some amount of Internet traffic you've created has passed over some form of RF link. You cannot guarantee that every hop your data travels over is free from snooping or logging of some kind. Sure, it's easier to do that with 802.11 but it's a bad assumption to say it won't happen without it.
All sensitive data should be encrypted end-to-end. This means secure end-to-end transports like VPN or SSL/TLS. You should never rely on encryption at the local link layer to protect your data all the way to its destination.
The original poster is advocating the use of encryption at the session or transport layers. He's not suggesting that you encrypt your files first, then send them using unencrypted transports, he's suggesting that you encrypt your transports. In other words, use SCP or SFTP instead of FTP, SSH instead of TELNET, HTTPS instead of HTTP, etc.
Or use VPN, which sets up an encrypted tunnel at the IP layer, which effectively encrypts all of your transport protocols from the perspective of someone outside of your tunnel.
As far as your legacy application example goes, just do a simple cost-benefit analysis. Is it worth it to upgrade or enhance your legacy application to make it secure over unsecure transports, or is it better to find ways of securing those transports? VPN would likely be the best solution for this scenario because it's transparent to the application as it operates above the application/transport layers.
We already see scams with domains like "ebayfake.com" (replace 'fake' with some other plausible term). People need to stop doing their authentication with a cursory visual check of the domain name and start using technologies designed for the purpose (e.g. TLS).
Precisely why DNS isn't appropriate to authenticate an organization. We need to push technologies like TLS and discourage users from giving so much weight to DNS hostnames (and URLs for that matter).
This is actually a perfectly legitimate suggestion. The problems people come across are caused by:
sites that fail to declare a character set properly, where the browser fails to auto-detect it (e.g. slashdot, which has no declared character set and where users, as a result, end up pasting things in a multitude of character sets, causing the browser to auto-detect one of them and misrender the rest)
sites that declare a character set that's too constrictive for the characters someone is trying to paste
A variation of #1 involves sites that may appear on the face to be all UTF-8 and internationalized, but utilize databases that store text in, say, 7-bit ASCII, or otherwise fail to preserve their Unicode data on its way to/from any back-ends.
And this is why DNS is not (and never was) appropriate for use as a directory or a Yellow Pages. It's simply there to create a naming hierarchy. I would have hoped that the creation of a bunch of new generic top-level domains would show everyone the futileness of trying to control every possible form of their name in DNS, but I should never underestimate the will of a legal department to make work for themselves.
This just takes things to a new level entirely.
It is not appropriate to use DNS as a directory ("search term dot com"), and it's not appropriate to use it as a form of authentication ("if it has ebay in its name, it must be eBay, right?"). Better technology needs to be pushed for these needs.
The bulk of the world does not run on the Latin alphabet. Either they go off and create their own Internets that follow their rules, at everyone's expense, or we resolve to use one single root and find ways to make it work for everyone's rules.
There are worse ways to approach this problem, and I don't see any better suggestions.
Your information is also somewhat dated or not completely accurate.
DNS, collectively, operates on a standardized set of Latin characters to identify country codes. This is the crux of the issue, obviously. I'll speak more to this later.
Web markup languages are currently moving to those based on XML. XML allows Unicode anywhere, including the use of Unicode characters in XML elements and attributes. It's pretty easy to create an XML schema that only uses characters from non-Latin scripts. HTML and its XML-based children continue to use tags with a clearly visible English background, but who cares? It's trivial to create an XML schema with a CSS style sheet that allows Chinese authors to create markup using elements in Chinese, and XML- and CSS-aware browsers will actually render this correctly today. The presence of English is no longer a design issue, it's just that our standards bodies speak English and by this point, it's easier for non-Latin developers to deal with Latin scripts than it is for Latin developers to deal with non-Latin scripts.
E-mail is, from the user's perspective, completely internationalized as well. Only the mail header names carry English-language words in them, but these shouldn't matter. The values of these headers can be internationalized, along with the content of the e-mail itself. The header names themselves are an implementation detail that can be completely isolated not only from the user, but from the developer as well, in the form of a library abstracting the implementation away.
You will quickly find that many other protocols share these properties. Things like header names can be treated simply as opaque tokens (and frequently are, in the case of programmers that don't speak English). Their values are usually more opaque tokens, or internationalized characters.
We run into issues when the stuff that isn't an implementation detail need to be internationalized. DNS domains are the most critical. Given that our society places so much intellectual property and "first line of search" weight on DNS domains, it's only natural that change like this is going to make things difficult for a lot of people. But remember that in the end, it's going to make things a lot less difficult for a lot more people.
There will absolutely be issues with people "spoofing" similar-looking domains, and you can bet that companies are probably doubling the size of their Internet legal departments for the next rounds of litigation. But seriously, if you're relying on the appearance of DNS domains as some form of authentication, your security model is badly broken to begin with.
It is my hope that this internationalization effort is a catalyst to:
a) make people realize that DNS domains do not make a good Yellow Pages b) spur development on a better form of directory to supplement search engines to identify the Internet location of a real-world entity c) promote the use of other technologies to better establish "identity" online (e.g. SSL/TLS or some public key infrastructure)
P.S. Why can't that company have a chineese domain name and a roman-character domain name? Is there a law I don't know about?
No law, but best DNS practice usually suggests selecting a single exclusive DNS domain for an organization. Units within that organization get subdomains within your primary DNS domain, not their own independent domain. Usually network administrators prefer to have all of their computers on a single DNS domain anyway, it's just the web and marketing guys that want to have a "presence" with all of those other names.
In practice, going with one exclusive DNS domain isn't possible, since DNS domains now carry so much intellectual property weight, and companies feel they have to snatch up every DNS domain that's remotely similar to any mark they have or will ever own. But that's not the way things were meant to be, and certainly not in the best interests of DNS.
But until we come up with anything better, and a company wants to market their URLs and DNS domains "out of band" (on printed media or broadcast) on a global scale, they'll want to have more than one domain in as many native scripts as they can.
I would personally prefer to see a directory atop DNS that would map "logical" real-world names (in as many scripts as someone wants) to an organization's single exclusive DNS domain. As soon as we stop (ab)using DNS to be a content label or a yellow pages, the easier it will be to cope with domains in other scripts.
Bear in mind that hostnames and URLs were never truly meant to be consumer-friendly. The goal of things like HTML was to hide those as implementation details. You didn't need to know what HTTP was so long as you could click on a link. So long as applications try to keep that implementation detail away, things should still be OK in that regard, but everyone knows that's a dream that will never be fully realized. Eventually URLs with scripts not in your native language will make their way to other formats (print and broadcast media, for example), requiring the viewer to manually enter them. So long as we're relying on URLs and DNS to be our primary means of naming content for the general public, this will always be a problem. It really sounds like we need a better, logical directory to sit atop DNS that maps real-world names (in whatever languages and scripts) to the one, single, exclusive DNS domain for that entity. This, along with better use of search engines (e.g. a formal engine for searching trademarks), could allow us to search for things in our native scripts and follow links to URLs and DNS domains using a different script entirely.
I basically agree with your points, but I view the conflict a little differently. I think the fundamental disconnect is between those that want the Internet based on good technical decisions and those that want it based on good commercial ones. The former wants to ensure stability, promote new technologies, and design a robust, scalable system. The latter wants to exploit what we have to make money.
What's needed is a healthy balance: a solid, robust base that can meet the demands of businesses and users that use it.
If you go too much on one side, you end up with what appears to be a great technical achievement, but something that's difficult to use and difficult to commercialize, requiring extensive application-layer work to get around technical decisions that don't satisfy non-technical demands. If you go too much in the other direction, you end up with InterwebXP: an unscalable mess of proprietary "extensions", and ultimately, proprietary commercial replacements.
The problem with ICANN is that they don't seem to be moving forward. They seem preoccupied with making current technology do what Big Business wants it to do. Why the hell are we even using DNS anymore? It's clear that it's not designed to be used as a content label, or a trademark or some other label with intellectual property weight. So why do we continue to use it like that? Why is nobody looking for a replacement?
You can only hack on an implementation so far before you're just making a mess of its splattered remains and amputated limbs. Our technical bodies need to be spending more time coming up with solid technical solutions to requirements of all kinds (both technical and non-technical), and less time trying to hack things in ways that bother the fewest people.
People creating HTML in line with WAI should skirt this problem neatly. A good piece of documentation should make no assumptions about the capabilities of the device displaying it, but the writer should feel free to take advantage of things (like images) if it aids in the understanding of the material.
Yes, I did. It relies on the judgement of a few trusted people to say whether a piece of "spam" warrants a DoS-style response. It also suggests that maybe some "test" e-mails might be necessary, so I suppose you're thinking in the right direction, but this isn't going to be practical enough to be effective.
The bottom line is that it's going to be very difficult for your trusted moderators to tell if a piece of spam is "legitimately" advertising a product, or if it's a fake ad designed to attract retaliation (which is what your tool is doing). Once your moderators are tricked into thinking it's real spam, whether it warrants a response is a subjective call.
I hate spam as much as the next guy, and while I'm all for having everyone respond to these unsolicited ads on a direct basis, using some automated system to attempt the same thing is likely to be abused.
I would be wary of a system like this. What's to stop someone from crafting a bogus piece of spam purportedly "from" someone he doesn't like? Your system could be exploited to harrass innocent people.
Quite simply, people should not be using URLs to authenticate a site. I created a Mozilla bug 184881 to try and address this, by making the SSL certificate more obvious. Bug 228524 is one person's attempt at this, effectively removing the URL bar and replacing it with fields identifying the hostname and SSL/TLS identity.
I took a look at some of the spec and some of the samples, and you're totally right. There's a lot that can be done here to reduce the complexity of the markup. It seems like they did everything they could to specify every detail of every note every time that note was needed.
In addition, breaking some of the information out of *this* spec into another namespace (like all of the MIDI-related stuff), as well as using existing namespaces like RDF for meta-data, would go far into simplifying some of this.
Maybe version 2 will address some of this complexity.
?
From http://www.gnu.org/licenses/gpl.html:
How do you read this in a way that allows binary redistribution without an obligation to redistribute source code? Am I missing something?
It doesn't really matter. If you redistribute a compiled version of the original code, you are still obligated to make the source available as well.
The FAQ talks about this in some depth, but I don't really buy its objections either. A web server front-end to a caching HTTP proxy would completely solve this problem, but there are misconceptions that nobody seems able to look past:
1. that it violates copyright of the content owner
If you're publishing something over HTTP, and create HTTP caching headers that permit caching, you are implicitly authorizing retransmission. A normal caching HTTP proxy certainly wouldn't be the target of copyright infringement suits even if these caching headers were absent; it's simply relaying content the way HTTP was designed. If the content author doesn't like the way that works, he should not publish his content over HTTP.
2. that it would screw up web page advertisements or hit tracking
Not if it was done properly. Every HTTP resource can have its own set of HTTP caching headers. Advertisements that are changing dynamically for each request should not express a willingness to be cached. That ad would be re-fetched for every visitor hitting the page. The content itself may not change, so it makes sense for that resource to be cacheable.
The point is, these HTTP-defined policies are defined on the origin server, where the content originates. If that content carries cache-friendly HTTP headers, why not use them?
Apache can act as a caching HTTP proxy. It can also rewrite incoming requests (and outbound HTTP headers) to masquerade one URL under its own. These systems can work together to present an HTTP cache under a different, Slashdot-controlled URL.
If the origin server doesn't want to be cache-friendly, this solution wouldn't work, since every request would have to be made back to the origin server anyway, and they get Slashdotted just as quickly. But that's also kind of dumb of them, and they deserve what they get as a result of that policy decision.
Or hell, if nothing else, why not post the Google cache? Google seems to deal with this issue somehow. IMO, the powers at Slashdot simply don't want to deal with this issue. Maybe it's laziness, or maybe it's corporate greed (their bandwidth at $0 or ours at $X), I don't know.
Of course, ISP's could also stand to improve their own HTTP caching policies (transparent proxying, for example, or at least offering proxies for users to voluntarily use). AOL seems to be able to do this without causing problems for users. Why can't other ISP's?
Because these are not devices that communicate with the Internet Protocol. Just because there are a lot of IP addresses in IPv6 doesn't mean we should start handing them out to everything that needs an ID number.
There still may be merit to considering the use of one common "ID space" for drawing these IDs from (perhaps allocating a prefix to each type of ID), but this doesn't really seem useful.
stupid speedbumps that most people stop for better than they stop for stop-signs
I wonder why no one has ever tried putting speed bumps at stop signs?
A lot of your summary seems already available in HTTP, if crudely. HTTP can:
A "web torrent" could be built with a quantity of participating caching HTTP proxies. You'd make a request against your local "web torrent" proxy, which, if it didn't have a recent copy of the resource, would make partial GETs of a set of nearby "web torrent" proxies until it retrieved everything.
Granted, this isn't nearly as efficient as a protocol more suited to the task (such as BitTorrent itself), but it seems like a workable solution without changing web browsers or servers.
I long for the day when software-defined radio can allow me to use one single device and have it adapt through software to whatever task I want it to do.
Imagine having a PDA that can pick up HDTV signals, calculate GPS positions, make calls via GSM and monitor your local police frequencies. It could communicate via any of the 802.11 standards, participate in a Bluetooth network, unlock your car and open your garage door.
And at the bottom of it all is a software stack, able to monitor and analyze any kind of RF signal and dynamically demodulate, decode it all, and respond.
That's what I want for Christmas.
Some versions of ps do react differently to the flags
That's exactly the point of the question. Someone claiming proficiency in both should be able to identify which variant that command is designed for.
You don't need to know the fields. Anyone and everyone that can honestly claim proficiency with both Solaris and BSD architectures has needed to obtain a "full" list of all processes running on the system. This means they've done a "ps -ef" and, given that they're proficient in BSD also, know the difference between this command and the BSD equivalent.
For those people, it's quite clear which platform gives you more data with that command.
I'd give half credit for someone that responded saying they'd need to read the man page, because that shows me that they know where to look up information they don't know off-hand, but I agree with the original poster: anyone claiming that level of proficiency should be able to make this comparison without one.
understand the humor in the name of "C++"
That in itself would make a good interview question...
I'm curious to know how often some other RF hops are used for typical network traffic. I hear of some sites with satellite or 802.11-based point-to-point network connections. How secure are those? It's very likely that some amount of Internet traffic you've created has passed over some form of RF link. You cannot guarantee that every hop your data travels over is free from snooping or logging of some kind. Sure, it's easier to do that with 802.11 but it's a bad assumption to say it won't happen without it.
All sensitive data should be encrypted end-to-end. This means secure end-to-end transports like VPN or SSL/TLS. You should never rely on encryption at the local link layer to protect your data all the way to its destination.
The original poster is advocating the use of encryption at the session or transport layers. He's not suggesting that you encrypt your files first, then send them using unencrypted transports, he's suggesting that you encrypt your transports. In other words, use SCP or SFTP instead of FTP, SSH instead of TELNET, HTTPS instead of HTTP, etc.
Or use VPN, which sets up an encrypted tunnel at the IP layer, which effectively encrypts all of your transport protocols from the perspective of someone outside of your tunnel.
As far as your legacy application example goes, just do a simple cost-benefit analysis. Is it worth it to upgrade or enhance your legacy application to make it secure over unsecure transports, or is it better to find ways of securing those transports? VPN would likely be the best solution for this scenario because it's transparent to the application as it operates above the application/transport layers.
We already see scams with domains like "ebayfake.com" (replace 'fake' with some other plausible term). People need to stop doing their authentication with a cursory visual check of the domain name and start using technologies designed for the purpose (e.g. TLS).
Precisely why DNS isn't appropriate to authenticate an organization. We need to push technologies like TLS and discourage users from giving so much weight to DNS hostnames (and URLs for that matter).
- sites that fail to declare a character set properly, where the browser fails to auto-detect it (e.g. slashdot, which has no declared character set and where users, as a result, end up pasting things in a multitude of character sets, causing the browser to auto-detect one of them and misrender the rest)
- sites that declare a character set that's too constrictive for the characters someone is trying to paste
A variation of #1 involves sites that may appear on the face to be all UTF-8 and internationalized, but utilize databases that store text in, say, 7-bit ASCII, or otherwise fail to preserve their Unicode data on its way to/from any back-ends.And this is why DNS is not (and never was) appropriate for use as a directory or a Yellow Pages. It's simply there to create a naming hierarchy. I would have hoped that the creation of a bunch of new generic top-level domains would show everyone the futileness of trying to control every possible form of their name in DNS, but I should never underestimate the will of a legal department to make work for themselves.
This just takes things to a new level entirely.
It is not appropriate to use DNS as a directory ("search term dot com"), and it's not appropriate to use it as a form of authentication ("if it has ebay in its name, it must be eBay, right?"). Better technology needs to be pushed for these needs.
The bulk of the world does not run on the Latin alphabet. Either they go off and create their own Internets that follow their rules, at everyone's expense, or we resolve to use one single root and find ways to make it work for everyone's rules.
There are worse ways to approach this problem, and I don't see any better suggestions.
Your information is also somewhat dated or not completely accurate.
DNS, collectively, operates on a standardized set of Latin characters to identify country codes. This is the crux of the issue, obviously. I'll speak more to this later.
Web markup languages are currently moving to those based on XML. XML allows Unicode anywhere, including the use of Unicode characters in XML elements and attributes. It's pretty easy to create an XML schema that only uses characters from non-Latin scripts. HTML and its XML-based children continue to use tags with a clearly visible English background, but who cares? It's trivial to create an XML schema with a CSS style sheet that allows Chinese authors to create markup using elements in Chinese, and XML- and CSS-aware browsers will actually render this correctly today. The presence of English is no longer a design issue, it's just that our standards bodies speak English and by this point, it's easier for non-Latin developers to deal with Latin scripts than it is for Latin developers to deal with non-Latin scripts.
E-mail is, from the user's perspective, completely internationalized as well. Only the mail header names carry English-language words in them, but these shouldn't matter. The values of these headers can be internationalized, along with the content of the e-mail itself. The header names themselves are an implementation detail that can be completely isolated not only from the user, but from the developer as well, in the form of a library abstracting the implementation away.
You will quickly find that many other protocols share these properties. Things like header names can be treated simply as opaque tokens (and frequently are, in the case of programmers that don't speak English). Their values are usually more opaque tokens, or internationalized characters.
We run into issues when the stuff that isn't an implementation detail need to be internationalized. DNS domains are the most critical. Given that our society places so much intellectual property and "first line of search" weight on DNS domains, it's only natural that change like this is going to make things difficult for a lot of people. But remember that in the end, it's going to make things a lot less difficult for a lot more people.
There will absolutely be issues with people "spoofing" similar-looking domains, and you can bet that companies are probably doubling the size of their Internet legal departments for the next rounds of litigation. But seriously, if you're relying on the appearance of DNS domains as some form of authentication, your security model is badly broken to begin with.
It is my hope that this internationalization effort is a catalyst to:
a) make people realize that DNS domains do not make a good Yellow Pages
b) spur development on a better form of directory to supplement search engines to identify the Internet location of a real-world entity
c) promote the use of other technologies to better establish "identity" online (e.g. SSL/TLS or some public key infrastructure)
My two cents.
P.S. Why can't that company have a chineese domain name and a roman-character domain name? Is there a law I don't know about?
No law, but best DNS practice usually suggests selecting a single exclusive DNS domain for an organization. Units within that organization get subdomains within your primary DNS domain, not their own independent domain. Usually network administrators prefer to have all of their computers on a single DNS domain anyway, it's just the web and marketing guys that want to have a "presence" with all of those other names.
In practice, going with one exclusive DNS domain isn't possible, since DNS domains now carry so much intellectual property weight, and companies feel they have to snatch up every DNS domain that's remotely similar to any mark they have or will ever own. But that's not the way things were meant to be, and certainly not in the best interests of DNS.
But until we come up with anything better, and a company wants to market their URLs and DNS domains "out of band" (on printed media or broadcast) on a global scale, they'll want to have more than one domain in as many native scripts as they can.
I would personally prefer to see a directory atop DNS that would map "logical" real-world names (in as many scripts as someone wants) to an organization's single exclusive DNS domain. As soon as we stop (ab)using DNS to be a content label or a yellow pages, the easier it will be to cope with domains in other scripts.
Bear in mind that hostnames and URLs were never truly meant to be consumer-friendly. The goal of things like HTML was to hide those as implementation details. You didn't need to know what HTTP was so long as you could click on a link. So long as applications try to keep that implementation detail away, things should still be OK in that regard, but everyone knows that's a dream that will never be fully realized. Eventually URLs with scripts not in your native language will make their way to other formats (print and broadcast media, for example), requiring the viewer to manually enter them. So long as we're relying on URLs and DNS to be our primary means of naming content for the general public, this will always be a problem. It really sounds like we need a better, logical directory to sit atop DNS that maps real-world names (in whatever languages and scripts) to the one, single, exclusive DNS domain for that entity. This, along with better use of search engines (e.g. a formal engine for searching trademarks), could allow us to search for things in our native scripts and follow links to URLs and DNS domains using a different script entirely.
I basically agree with your points, but I view the conflict a little differently. I think the fundamental disconnect is between those that want the Internet based on good technical decisions and those that want it based on good commercial ones. The former wants to ensure stability, promote new technologies, and design a robust, scalable system. The latter wants to exploit what we have to make money.
What's needed is a healthy balance: a solid, robust base that can meet the demands of businesses and users that use it.
If you go too much on one side, you end up with what appears to be a great technical achievement, but something that's difficult to use and difficult to commercialize, requiring extensive application-layer work to get around technical decisions that don't satisfy non-technical demands. If you go too much in the other direction, you end up with InterwebXP: an unscalable mess of proprietary "extensions", and ultimately, proprietary commercial replacements.
The problem with ICANN is that they don't seem to be moving forward. They seem preoccupied with making current technology do what Big Business wants it to do. Why the hell are we even using DNS anymore? It's clear that it's not designed to be used as a content label, or a trademark or some other label with intellectual property weight. So why do we continue to use it like that? Why is nobody looking for a replacement?
You can only hack on an implementation so far before you're just making a mess of its splattered remains and amputated limbs. Our technical bodies need to be spending more time coming up with solid technical solutions to requirements of all kinds (both technical and non-technical), and less time trying to hack things in ways that bother the fewest people.
People creating HTML in line with WAI should skirt this problem neatly. A good piece of documentation should make no assumptions about the capabilities of the device displaying it, but the writer should feel free to take advantage of things (like images) if it aids in the understanding of the material.
Yes, I did. It relies on the judgement of a few trusted people to say whether a piece of "spam" warrants a DoS-style response. It also suggests that maybe some "test" e-mails might be necessary, so I suppose you're thinking in the right direction, but this isn't going to be practical enough to be effective.
The bottom line is that it's going to be very difficult for your trusted moderators to tell if a piece of spam is "legitimately" advertising a product, or if it's a fake ad designed to attract retaliation (which is what your tool is doing). Once your moderators are tricked into thinking it's real spam, whether it warrants a response is a subjective call.
I hate spam as much as the next guy, and while I'm all for having everyone respond to these unsolicited ads on a direct basis, using some automated system to attempt the same thing is likely to be abused.
I would be wary of a system like this. What's to stop someone from crafting a bogus piece of spam purportedly "from" someone he doesn't like? Your system could be exploited to harrass innocent people.