Not necessarily. If the bandwidth is *available*, you could teleconference all day and still go full speed. It doesn't slow you down until it has to, and then only as much as it has to to ensure fair sharing. If there's 45 MBit available, 200 customers web surfing, 5 using BT, and 2 teleconferencing, everything will work out fine: BT goes screaming fast in between HTTP requests, your videoconference gets the constant stream it needs, and the HTTP users get fast response times.
If there's not enough bandwidth to go around during peak hours, then yes, it'll slow you down. Their BT will slow down too. Why should you get priority? They paid the same fees as you. Someone has to take the hit if it's too oversubscribed. However, you'll be back to full speed as soon as bandwidth is available again.
The better answer is to have more backbone bandwidth, but given a constrained resource, CBQ makes the best use of it.
HTB is Hierarchical Token Bucket, a CBQ (Class Based Queueing) discipline for Linux. It lets you create a hierarchy of queues for a network link. The "Token Bucket" part means each leaf and node in the tree has a "bucket" that constantly, slowly fills with tokens. Sending a byte removes a token. So, on average, you're only guaranteed the fill rate, but if you haven't used it for a bit, you can send a burst until your bucket is empty. Extra tokens can be borrowed between nodes if they're not used by the others, up to the max rate. Thus you get minimum guarantees, max limits, and bursts, such as being able to quickly fetch a web page even if the link is full from others' usage, if you haven't used up your tokens.
For instance, you could have Customer A, Customer B, and Customer C at the top level, and then they each have a second level of HTTP, BitTorrent, and SSH. Customer A and B get a rate of 128k, and C gets 512k since he pays extra as a business customer. They all have a max rate of 6M, since that's the speed of their DSL lines, and a burst size of 1MB. Then, they have SSH (with a small rate and a small burst), HTTP (with a high rate and a large burst), and BitTorrent (with a 1k rate, and a small burst).
As long as Customer C isn't using any bandwidth, A and B can use it all. As soon as C wants to use some, he first gets his guaranteed 512k - no matter what - and then they all split any leftovers in proportion to their committed rates (So A gets a share, B gets a share, C gets four shares). If C only wants 512k, A and B each get to split all the leftovers evenly.
If A is using BT like mad, but then opens an HTTP connection, it'll be allowed most of his net connection (it has a high rate, but still lower than the full line speed). BT will automatically (and instantly) be throttled until HTTP is done. When he types on the SSH connection, it'll use little bits of its burst speed to refresh the window instantly, but its small rate won't let it consume the whole net if he accidentally cats/dev/urandom.
Sounds great, right? There are a few gotchas: You can only queue packets like this when *sending*. What're you going to do, receive a packet from the slow link and then delay it before sending it over the fast one that's not saturated? (Well, yes, you can, and it makes a limited amount of sense to fine tune TCP's flow control, in addition to selectively dropping packets to make it back off, and other tricks.) It's good, but it doesn't necessarily make optimal tradeoffs between latency and bandwidth - HFSC is an attempt to address this. Also, this is a moderately heavyweight way to do things. It has to spend some CPU classifying packets, and memory to track the buckets' state, so other queueing disciplines and schedulers exist that work on other methods (such as statistical, instead of discrete tracking), that are more appropriate for very large ISPs. Also, as a large ISP, you're going to be using Cisco, not Linux, for routing.:) But Cisco has sophisticated QOS as well.
Despite how complex this sounds, even using the simplest case on your home router will make a huge improvement in the weak side of your DSL line, the uplink. Several of the open source WIFI router firmwares support it out of the box for this reason. I have survived having my web site on my DSL linked to the front page of a popular site known to bring servers to their knees, without any lag in SSH or games, or interruption of mail or other services. We only noticed because our bulk transfers slowed to a crawl, as intended.
Sure. The way I've implemented something like this in the past is you use this to priority-queue per customer - IE, choose the order that the packets get queued to them. SSH interactive (low-latency) goes to the front of the queue, and get lower latency and reduced loss. File transfers (bulk) go to the end, and wait for idle periods, and suffer less loss. This goes at the leaves of the queueing hierarchy.
The rate limiting of those queues is based on the top level of the hierarchy, so if the backbone gets saturated, it's the customers who've used up their burst who start losing their bulk-tagged traffic first.
In my opinion, the best solution is to strongly throttle large bandwidth usages (P2P, FTP and NNTP streams, etc) during the periods of near-capacity, and automatically relax the filtering during off hours.
That's one way... Here's another:
Instead of trying to choose which protocols are heaviest usage, traffic shape people based on what the actual criteria that you care about is: Too much overall usage over long periods.
In Linux terms, set up a HTB with a queue for every customer. Set the base rate to whatever your backbone speed is (1/70th of the customer's line rate), the ceil rate to their line rate, and give them a nice big bucket - say, 120 seconds times their line rate.
Then, people who are normal users - web surfing, downloading an occasional email attachment, etc - will go full bore, any time they want it. People who are bittorrenting will go full speed for a couple minutes, and then decrease down to whatever bandwidth is available. At night, if there's a lot of backbone free, it'll go fast. At 7 PM, they get best effort on whatever is available.
This is a very simplified example. You could additionally shape them so that their web and email will take priority over bittorrent when they're at the bottom of their token bucket, or other fine tuning...
The basic message I'd like to get across is: you don't have to shape based on protocol, because you care about the usage, not the protocol. Just shape based on usage, and let them work out which protocols they want to use.
They are not calendaring and scheduling systems though.
Of course not. I don't think that adds enough novelty to deserve a patent, any more than adding "... on the internet" should make recycled business plans patentable.
I don't know of any existing products with this functionality. So they wrote it up first, and you're bitching because you lack the creativity or ambition to do so yourself.
For prior art, check out any MMORPG with a parental control feature, or firewalls with time lock options. Maybe there's a sliver of innovation in that it custom schedules it based on when your meetings are, but that's pretty thin.
Oh, you don't like software patents? So competitive corporations should just throw in the towel and abandon patents that are allowed in our current system?
No, my plan is to bitch about them to draw attention to how broken the system is until we have the support to legislate them away. Until then I support companies' rights to keep trying for these things, and the people's rights to mock them for it.
Others have already covered the "1000 users isn't much" aspect. Benchmark, and verify what each server can handle of your anticipated load, but they're probably right.
Option 1: Don't do it yourself. Look into renting servers from a hosting company. They will often provide HA and load balancing for free if you get a couple servers. Also, having rented servers makes it much easier to scale. If you find that you have 100,000 uniques per day, you can order up a bunch more servers and meet the load within minutes to hours. If you overbought, you can scale back down just as fast.
Option 2: http://www.linuxvirtualserver.org/ plus http://www.linux-ha.org/ . You use LVS to load balance out to a cluster (including removing failed servers from the pool). You use HA so that two LVS machines can fail over to each other. Note that you can run LVS on the same machines as your load, for a small environment. This is much more DIY than the Windows setup, of course... But honestly, if the setup requirements of this scare you away, then you're not ready to run a fault-tolerant network, regardless of OS.
Option 4: Buy a commercial solution. Every major network vendor sells a HA/LB product. I've used them from most of the big players... I'm not going to write a review here, but it'll suffice to say that while they each have their good and bad points, any of them will get the job you've outlined done.
As for the network: The general rule is to reduce your single points of failures to the minimum you can afford. Common ones are: The ISP (BGP is a pain); the routers (Each ISP goes to its own router); the switches between (you need to full-mesh links from the two routers to two switches, down through the line as many layers as it goes; your switches need to run STP or be layer 3 switches running OSPF or another routing protocol; don't forget to plug the load balancers into different switches); the power (Servers, switches, and routers on separate UPSes such that losing one will leave a fully functioning path); and depending on how far you want to take this, the data center itself (in case of fire/meteor/EPO mishaps).
Note that all of this is required even for your Windows solution. Are you sure you don't want option 1?:)
It's not a waste of FLOPS. There are plenty of spare MIPS and FLOPS in the world - witness the amount that get donated to folding@home, seti@home, various cipher cracking contests, etc. While you too could donate to those causes, I'd suggest against it - it's one thing to donate niced cycles of a machine that otherwise has to be running, but it's a tremendous waste of power to spin up that many boxes just to hand out cycles.
Recognize those servers for what they are - a waste of *money*. You sunk too much cash into a resource (and that's fine, no business has perfect foresight, and you had to anticipate potential needs). Now liquidate them and get your money out so you can spend it on something better than depreciation. If it turns out you need them in a year, I assure you you can buy servers for less $/FLOP from the liquidators at that time.
Why would she ever run for governor, when her seniority pretty much guarantees a seat on the US Senate for life*?
* despite some of us who will vote for ANY turd sandwich who runs against her, even if they're worse, just to break up the seniority to give us a bare chance at someone good in the next cycle
That's not a good analogy to use. Knowing how an average cheap Master lock is made makes it *very* easy to hack, because the design is defective. I can pick the key locks in seconds, and the dial locks are similarly easy with a simple tool. Good locks confound me, but people with more skills can do it. And therein lies the rub: A well secured OS isn't a better designed lock. It's simply impervious regardless of the skill of the attacker.
The blueprints of a competently made vault door would be a better analogy, but it brings up too many memories of movie bad guys tunneling in... Which honestly is still an accurate analogy: If you can't break the security system by design, you circumvent it. But it doesn't make for a great argument.
Now you've pretty much addressed broadband [...] Last mile can be handled [...]
Except that the last mile is *the* limiting factor in broadband deployment, full stop.
There's plenty of long haul fiber all over the place to handle our needs for quite some time, and plenty of competing providers for the transit. It's the telco monopolies' hold on the last mile that kills us. In markets where there is municipal-run fiber, you can have the choice of literally dozens of ISPs, each of which is offering very reasonable very fast access.
Actually, I shouldn't have used my past history as an example; I didn't mean to make this a criticism of current practice.
My point was that the debate - including the farthest reaches of both the inclusionist and exclusionist side of things - includes a large range of articles, more than the number that are actually deleted. It'd be interesting to data mine how many are discussed for deletion to quantify that.
I agree that's roughly where the line *should* be drawn, but there are deletionists who would remove large amounts of stuff under this line, and inclusionists who would greatly grow the article set.
I've personally had several well-researched, referenced articles that I regularly used for information removed due to people not finding enough hits on the subject on Google or similar sketchy criteria. I doubt my experience is unique.
You don't get spam because of a combination of anti-spam techniques similar to this one. We have to keep developing them, or else the spammers will get ahead.
YOU may not have much of a spam problem, but mail admins everywhere - including google's - most certainly do.
Seriously, it's almost trivial to completely avoid spam now. [...] They enjoy seeing the spam, because then they can get outraged and do stuff like this.
I wouldn't attribute that much malice to it.
Sure, the big players have great spam filtering, but the work it takes to get there isn't trivial. And there are a lot of us who don't use webmail. Having configured a few mail systems, it takes a lot of poking and prodding and fine tuning to get an anti-spam configuration that works really well. In the course of doing it, you see these strong spam signals, and get drawn into them. "Hey, what if I just turn up this setting here? That'd catch a ton of spam!" And upon doing it, you find you've walked right into one of the many pitfalls of spam filtering... You're silently rejecting legit mail, or running your false positives way up, or creating backscatter, or generating inappropriate reject codes, or in this case creating an exploitable avenue to harass innocent people...
But at first blush, when you're in there watching your logs and tweaking your configs, these ideas sound great. There's a reason the form letter exists. People get excited about their great new spam solution, and go to publish before they've thought it through, or realized that their idea's already been tried and failed. (I don't like the form because it's used to dismiss *any* new anti-spam idea, even the very few that are good and original, but that's beside the point.)
Anyway, I don't think it's out of a need for outrage. I think it's just people get caught up in what they're doing, and lose track of the implications.
They can call it easy, fun, and good netizenship... But I say they're just putting a friendly face on vigilanteism.
From a technical perspective this isn't that different from other collaborative filtering systems (though since the listing criteria is based on secondary sources, it's going to be susceptible to confirmation bias and other sampling errors, so this isn't likely to be a good one). I take big issue with the naming, though: Other collaborative filters say that "This machine is listed because it met these criteria", which you then make your own decisions on.
It crosses a line when you're saying they should be "shamed", especially when you're not taking extensive precautions to make sure you're not listing innocents.
It might be nice to have some simple examples and numbers to back that up -- besides just factorial.
I did... I'm not kidding when I say my parser is ten times slower.
If you're comparing a ten-line Ruby script with a hundred-line Perl script, and the Perl script is ten times faster, that would pretty clearly show the advantages of each language.
That'd be a 100x speedup, per line.:)
It's not that extreme. In my use, Ruby's only 1.5x as dense as good clean Perl (use strict; not using default variables; not hacking and overloading data; etc). My Perl is strictly procedural with lots of very tight, fast loops, and shallow, unabstracted data structures.
My Ruby is very OO with a great many very short methods (and resultant deep call stacks), rich objects, and highly abstracted data. The ease of maintenance - not the LOC count - is what makes it faster to code... And I'm clearly handing a lot of extra work to the interpreter.
Clearly I could transliterate the algorithms between the two languages, and the speed difference would narrow. You can write BASIC in any language... I've written Perl in Ruby. But that doesn't make the basis of a good benchmark. I think it's more meaningful to compare code written like the language was meant to be used. Both of the coding styles above are what I find comes naturally to me when writing in each language.
Except that slideshow is comparing the performance of popular frameworks for different languages. So they're showing that Rails is an efficient framework. That's a perfectly valid point to make (the language makes it easy to write a better algorithm, or maybe the framework is just more efficient), but it doesn't say anything about how fast the code is executed.
I switched from Perl to Ruby as my everyday sysadmin and glue language, and I use it pretty extensively. I love Ruby, but I won't try to handwave away its faults. In my usage, it's undeniably, dramatically, slower than Perl. We're talking order of magnitude here, not marginal stuff that only shows up in benchmarking.
A script to parse a huge, complex data file sucks ten times as many CPU cycles to do the same work. For what I do, that's OK, because the ten minutes to run the job is completely dwarfed by the development time saved by using a sane language.
Not necessarily. If the bandwidth is *available*, you could teleconference all day and still go full speed. It doesn't slow you down until it has to, and then only as much as it has to to ensure fair sharing. If there's 45 MBit available, 200 customers web surfing, 5 using BT, and 2 teleconferencing, everything will work out fine: BT goes screaming fast in between HTTP requests, your videoconference gets the constant stream it needs, and the HTTP users get fast response times.
If there's not enough bandwidth to go around during peak hours, then yes, it'll slow you down. Their BT will slow down too. Why should you get priority? They paid the same fees as you. Someone has to take the hit if it's too oversubscribed. However, you'll be back to full speed as soon as bandwidth is available again.
The better answer is to have more backbone bandwidth, but given a constrained resource, CBQ makes the best use of it.
HTB is Hierarchical Token Bucket, a CBQ (Class Based Queueing) discipline for Linux. It lets you create a hierarchy of queues for a network link. The "Token Bucket" part means each leaf and node in the tree has a "bucket" that constantly, slowly fills with tokens. Sending a byte removes a token. So, on average, you're only guaranteed the fill rate, but if you haven't used it for a bit, you can send a burst until your bucket is empty. Extra tokens can be borrowed between nodes if they're not used by the others, up to the max rate. Thus you get minimum guarantees, max limits, and bursts, such as being able to quickly fetch a web page even if the link is full from others' usage, if you haven't used up your tokens.
For instance, you could have Customer A, Customer B, and Customer C at the top level, and then they each have a second level of HTTP, BitTorrent, and SSH. Customer A and B get a rate of 128k, and C gets 512k since he pays extra as a business customer. They all have a max rate of 6M, since that's the speed of their DSL lines, and a burst size of 1MB. Then, they have SSH (with a small rate and a small burst), HTTP (with a high rate and a large burst), and BitTorrent (with a 1k rate, and a small burst).
As long as Customer C isn't using any bandwidth, A and B can use it all. As soon as C wants to use some, he first gets his guaranteed 512k - no matter what - and then they all split any leftovers in proportion to their committed rates (So A gets a share, B gets a share, C gets four shares). If C only wants 512k, A and B each get to split all the leftovers evenly.
If A is using BT like mad, but then opens an HTTP connection, it'll be allowed most of his net connection (it has a high rate, but still lower than the full line speed). BT will automatically (and instantly) be throttled until HTTP is done. When he types on the SSH connection, it'll use little bits of its burst speed to refresh the window instantly, but its small rate won't let it consume the whole net if he accidentally cats /dev/urandom.
Sounds great, right? There are a few gotchas: You can only queue packets like this when *sending*. What're you going to do, receive a packet from the slow link and then delay it before sending it over the fast one that's not saturated? (Well, yes, you can, and it makes a limited amount of sense to fine tune TCP's flow control, in addition to selectively dropping packets to make it back off, and other tricks.) It's good, but it doesn't necessarily make optimal tradeoffs between latency and bandwidth - HFSC is an attempt to address this. Also, this is a moderately heavyweight way to do things. It has to spend some CPU classifying packets, and memory to track the buckets' state, so other queueing disciplines and schedulers exist that work on other methods (such as statistical, instead of discrete tracking), that are more appropriate for very large ISPs. Also, as a large ISP, you're going to be using Cisco, not Linux, for routing. :) But Cisco has sophisticated QOS as well.
Despite how complex this sounds, even using the simplest case on your home router will make a huge improvement in the weak side of your DSL line, the uplink. Several of the open source WIFI router firmwares support it out of the box for this reason. I have survived having my web site on my DSL linked to the front page of a popular site known to bring servers to their knees, without any lag in SSH or games, or interruption of mail or other services. We only noticed because our bulk transfers slowed to a crawl, as intended.
Learn more:
HTB: http://luxik.cdi.cz/~devik/qos/htb/ (the user guide has a good overview and pretty graphs)
HFSC: http://linux-ip.net/articles/hfsc.en/ (More pretty graphs and good explanation)
Linux Advanced Routing and Traffic Control list: http://lartc.org/ (The howto is out of date, but very enlightening)
Sure. The way I've implemented something like this in the past is you use this to priority-queue per customer - IE, choose the order that the packets get queued to them. SSH interactive (low-latency) goes to the front of the queue, and get lower latency and reduced loss. File transfers (bulk) go to the end, and wait for idle periods, and suffer less loss. This goes at the leaves of the queueing hierarchy.
The rate limiting of those queues is based on the top level of the hierarchy, so if the backbone gets saturated, it's the customers who've used up their burst who start losing their bulk-tagged traffic first.
Of course... You give consumers best effort bandwidth, and then if business customers want guaranteed bandwidth, they can pay extra for it.
I also don't find it unethical, as long as it's clearly advertised as "unlimited usage 6M burst / 128k committed + best effort".
In my opinion, the best solution is to strongly throttle large bandwidth usages (P2P, FTP and NNTP streams, etc) during the periods of near-capacity, and automatically relax the filtering during off hours.
That's one way... Here's another:
Instead of trying to choose which protocols are heaviest usage, traffic shape people based on what the actual criteria that you care about is: Too much overall usage over long periods.
In Linux terms, set up a HTB with a queue for every customer. Set the base rate to whatever your backbone speed is (1/70th of the customer's line rate), the ceil rate to their line rate, and give them a nice big bucket - say, 120 seconds times their line rate.
Then, people who are normal users - web surfing, downloading an occasional email attachment, etc - will go full bore, any time they want it. People who are bittorrenting will go full speed for a couple minutes, and then decrease down to whatever bandwidth is available. At night, if there's a lot of backbone free, it'll go fast. At 7 PM, they get best effort on whatever is available.
This is a very simplified example. You could additionally shape them so that their web and email will take priority over bittorrent when they're at the bottom of their token bucket, or other fine tuning...
The basic message I'd like to get across is: you don't have to shape based on protocol, because you care about the usage, not the protocol. Just shape based on usage, and let them work out which protocols they want to use.
They are not calendaring and scheduling systems though.
Of course not. I don't think that adds enough novelty to deserve a patent, any more than adding "... on the internet" should make recycled business plans patentable.
I don't know of any existing products with this functionality. So they wrote it up first, and you're bitching because you lack the creativity or ambition to do so yourself.
For prior art, check out any MMORPG with a parental control feature, or firewalls with time lock options. Maybe there's a sliver of innovation in that it custom schedules it based on when your meetings are, but that's pretty thin.
Oh, you don't like software patents? So competitive corporations should just throw in the towel and abandon patents that are allowed in our current system?
No, my plan is to bitch about them to draw attention to how broken the system is until we have the support to legislate them away. Until then I support companies' rights to keep trying for these things, and the people's rights to mock them for it.
Or in the words of George Carlin:
Selling is legal. Fucking is legal. Why isn't selling fucking legal?
I appreciate the glimmer of hope. :)
I dearly hope some of them eventually find it unenjoyable in a criminal PMITA prison kind of way, rather than a merely expensive civil way.
My heart leaped when I first read that as "Judge Orders Record Company Execs To Death". I'm so disappointed.
Others have already covered the "1000 users isn't much" aspect. Benchmark, and verify what each server can handle of your anticipated load, but they're probably right.
Option 1: Don't do it yourself. Look into renting servers from a hosting company. They will often provide HA and load balancing for free if you get a couple servers. Also, having rented servers makes it much easier to scale. If you find that you have 100,000 uniques per day, you can order up a bunch more servers and meet the load within minutes to hours. If you overbought, you can scale back down just as fast.
Option 2: http://www.linuxvirtualserver.org/ plus http://www.linux-ha.org/ . You use LVS to load balance out to a cluster (including removing failed servers from the pool). You use HA so that two LVS machines can fail over to each other. Note that you can run LVS on the same machines as your load, for a small environment. This is much more DIY than the Windows setup, of course... But honestly, if the setup requirements of this scare you away, then you're not ready to run a fault-tolerant network, regardless of OS.
Option 3: http://www.redhat.com/cluster_suite/ . Less DIY, more money. Perhaps that's better for you.
Option 4: Buy a commercial solution. Every major network vendor sells a HA/LB product. I've used them from most of the big players... I'm not going to write a review here, but it'll suffice to say that while they each have their good and bad points, any of them will get the job you've outlined done.
As for the network: The general rule is to reduce your single points of failures to the minimum you can afford. Common ones are: The ISP (BGP is a pain); the routers (Each ISP goes to its own router); the switches between (you need to full-mesh links from the two routers to two switches, down through the line as many layers as it goes; your switches need to run STP or be layer 3 switches running OSPF or another routing protocol; don't forget to plug the load balancers into different switches); the power (Servers, switches, and routers on separate UPSes such that losing one will leave a fully functioning path); and depending on how far you want to take this, the data center itself (in case of fire/meteor/EPO mishaps).
Note that all of this is required even for your Windows solution. Are you sure you don't want option 1? :)
The IT guy in me thinks that's a waste of FLOPS
It's not a waste of FLOPS. There are plenty of spare MIPS and FLOPS in the world - witness the amount that get donated to folding@home, seti@home, various cipher cracking contests, etc. While you too could donate to those causes, I'd suggest against it - it's one thing to donate niced cycles of a machine that otherwise has to be running, but it's a tremendous waste of power to spin up that many boxes just to hand out cycles.
Recognize those servers for what they are - a waste of *money*. You sunk too much cash into a resource (and that's fine, no business has perfect foresight, and you had to anticipate potential needs). Now liquidate them and get your money out so you can spend it on something better than depreciation. If it turns out you need them in a year, I assure you you can buy servers for less $/FLOP from the liquidators at that time.
Why would she ever run for governor, when her seniority pretty much guarantees a seat on the US Senate for life*?
* despite some of us who will vote for ANY turd sandwich who runs against her, even if they're worse, just to break up the seniority to give us a bare chance at someone good in the next cycle
That's not a good analogy to use. Knowing how an average cheap Master lock is made makes it *very* easy to hack, because the design is defective. I can pick the key locks in seconds, and the dial locks are similarly easy with a simple tool. Good locks confound me, but people with more skills can do it. And therein lies the rub: A well secured OS isn't a better designed lock. It's simply impervious regardless of the skill of the attacker.
The blueprints of a competently made vault door would be a better analogy, but it brings up too many memories of movie bad guys tunneling in... Which honestly is still an accurate analogy: If you can't break the security system by design, you circumvent it. But it doesn't make for a great argument.
Now you've pretty much addressed broadband [...] Last mile can be handled [...]
Except that the last mile is *the* limiting factor in broadband deployment, full stop.
There's plenty of long haul fiber all over the place to handle our needs for quite some time, and plenty of competing providers for the transit. It's the telco monopolies' hold on the last mile that kills us. In markets where there is municipal-run fiber, you can have the choice of literally dozens of ISPs, each of which is offering very reasonable very fast access.
Actually, I shouldn't have used my past history as an example; I didn't mean to make this a criticism of current practice.
My point was that the debate - including the farthest reaches of both the inclusionist and exclusionist side of things - includes a large range of articles, more than the number that are actually deleted. It'd be interesting to data mine how many are discussed for deletion to quantify that.
I agree that's roughly where the line *should* be drawn, but there are deletionists who would remove large amounts of stuff under this line, and inclusionists who would greatly grow the article set.
I've personally had several well-researched, referenced articles that I regularly used for information removed due to people not finding enough hits on the subject on Google or similar sketchy criteria. I doubt my experience is unique.
I gather most of the fighting is over a relatively small number of entries that everybody knows to be controversial.
The deletionist / inclusionist argument affects huge swaths of content, most of it completely uncontroversial other than its noteworthiness.
You don't get spam because of a combination of anti-spam techniques similar to this one. We have to keep developing them, or else the spammers will get ahead.
YOU may not have much of a spam problem, but mail admins everywhere - including google's - most certainly do.
Seriously, it's almost trivial to completely avoid spam now. [...] They enjoy seeing the spam, because then they can get outraged and do stuff like this.
I wouldn't attribute that much malice to it.
Sure, the big players have great spam filtering, but the work it takes to get there isn't trivial. And there are a lot of us who don't use webmail. Having configured a few mail systems, it takes a lot of poking and prodding and fine tuning to get an anti-spam configuration that works really well. In the course of doing it, you see these strong spam signals, and get drawn into them. "Hey, what if I just turn up this setting here? That'd catch a ton of spam!" And upon doing it, you find you've walked right into one of the many pitfalls of spam filtering... You're silently rejecting legit mail, or running your false positives way up, or creating backscatter, or generating inappropriate reject codes, or in this case creating an exploitable avenue to harass innocent people...
But at first blush, when you're in there watching your logs and tweaking your configs, these ideas sound great. There's a reason the form letter exists. People get excited about their great new spam solution, and go to publish before they've thought it through, or realized that their idea's already been tried and failed. (I don't like the form because it's used to dismiss *any* new anti-spam idea, even the very few that are good and original, but that's beside the point.)
Anyway, I don't think it's out of a need for outrage. I think it's just people get caught up in what they're doing, and lose track of the implications.
Yes, that creates unnecessary backscatter, and facilitates joe jobs.
They can call it easy, fun, and good netizenship... But I say they're just putting a friendly face on vigilanteism.
From a technical perspective this isn't that different from other collaborative filtering systems (though since the listing criteria is based on secondary sources, it's going to be susceptible to confirmation bias and other sampling errors, so this isn't likely to be a good one). I take big issue with the naming, though: Other collaborative filters say that "This machine is listed because it met these criteria", which you then make your own decisions on.
It crosses a line when you're saying they should be "shamed", especially when you're not taking extensive precautions to make sure you're not listing innocents.
It might be nice to have some simple examples and numbers to back that up -- besides just factorial.
I did... I'm not kidding when I say my parser is ten times slower.
If you're comparing a ten-line Ruby script with a hundred-line Perl script, and the Perl script is ten times faster, that would pretty clearly show the advantages of each language.
That'd be a 100x speedup, per line. :)
It's not that extreme. In my use, Ruby's only 1.5x as dense as good clean Perl (use strict; not using default variables; not hacking and overloading data; etc). My Perl is strictly procedural with lots of very tight, fast loops, and shallow, unabstracted data structures.
My Ruby is very OO with a great many very short methods (and resultant deep call stacks), rich objects, and highly abstracted data. The ease of maintenance - not the LOC count - is what makes it faster to code... And I'm clearly handing a lot of extra work to the interpreter.
Clearly I could transliterate the algorithms between the two languages, and the speed difference would narrow. You can write BASIC in any language... I've written Perl in Ruby. But that doesn't make the basis of a good benchmark. I think it's more meaningful to compare code written like the language was meant to be used. Both of the coding styles above are what I find comes naturally to me when writing in each language.
Except that slideshow is comparing the performance of popular frameworks for different languages. So they're showing that Rails is an efficient framework. That's a perfectly valid point to make (the language makes it easy to write a better algorithm, or maybe the framework is just more efficient), but it doesn't say anything about how fast the code is executed.
I switched from Perl to Ruby as my everyday sysadmin and glue language, and I use it pretty extensively. I love Ruby, but I won't try to handwave away its faults. In my usage, it's undeniably, dramatically, slower than Perl. We're talking order of magnitude here, not marginal stuff that only shows up in benchmarking.
A script to parse a huge, complex data file sucks ten times as many CPU cycles to do the same work. For what I do, that's OK, because the ten minutes to run the job is completely dwarfed by the development time saved by using a sane language.
This is about promoting independence, not saving money (though that's probably a secondary goal).