Lightning is the Thunderbird extension that provides the same capabilities as Sunbird. It supports iMIP and other email invitation formats. It also has a Provider for Google Calendar Extension .
I assumed the qualifier was understood; I meant publicly disclosed, not just disclosed to the vendor. Also, I'm not sure if you're familiar with how disclosure works, but it's not in Moore's best interests reveal that he's sitting on vulnerabilities unless he intends to disclose them soon. So he may be practicing responsible disclosure and allowing the vendor a reasonable amount of time to complete a patch. Or he may have other reasons for waiting.
Security disclosure in general is a pretty complicated game of posturing and politicking. At its best it can be a genuinely altruistic form of public service. At its worst it's extortion scams and weapons trafficking.
I think your definition of zero day is ops-centric, and not security-centric. In this post I give the generally accepted definition in the security community, which agrees with Moore's statement. To summarize, the security community only uses 0-day to refer to undisclosed vulnerabilities, and it does not address patch lag.
A 0-day refers to an undisclosed vulnerability; however, some people have stretched the definition to mean unpatched vulnerability. It's considered a stretch because an unpatched vulnerability is still known, so precautions can be taken. With a true 0-day vulnerability/exploit, you would have no knowledge of the issue and no way of protecting specifically against it.
The SHA and MD5 attacks are bait-and-switch, so the attacker must control both messages. The current research doesn't imply any weaknesses for HMAC applications, such as password storage. So yes, SHA and MD5 are probably still quite acceptable for these purposes if they're already in place.
Wow, talk about some FUD. Of the 14 vulns so far 10 are NULL pointer dereferences. HD must be really desperate for publicity if he's trying to pump these up as legitimate security vulns. I mean, you can argue that a server crash is a DoS, but crashing a browser? Get real.
Once again, I really don't think you took the article in the intended context. Here are the author's two major issues with both PHP and MySQL.
1. Bug ridden (by this I am including both misfeatures as well as actual bugs).
2. They encourage bad habits.
So, he was including issues beyond just the acknowledged bugs. MYSQL in particular has a lot of non-standard SQL quirks that create real problems for any serious development work. This site lists the major ones prior to 5.0. And PHP has numerous issues of comparable severity, like magic quotes and it's historic lack of a consistent interface for prepared queries.
I suppose the author devoted his attention to misfeatures, as opposed to more straight forward bugs. However, I think that he made his intended focus clear at the beginning. And I don't consider the second point to be hyperbole at all. I think he substantiated the first point enough to use it as a foundation for the second. Although, I will admit that my professional experience makes me significantly inclined to agree with both points. So perhaps I am taking certain background knowledge and technical details for granted.
I think you misunderstand the topic. Using an old version of MYSQL or PHP has almost nothing to do with exploitable SQL injection vulnerabilities. The author's point is that PHP and MYSQL encourage bad style that leads to poor quality code. Consider the recent issue with escaped multibyte characters; that further cements that concatenated queries are a bad practice. In the end, good coding practices would have significantly mitigated the impact of that particular vulnerability.
Personally, I wholehearteadly agree with the author. I've spent the last few years performing security reviews on millions of lines of web application code. My experience has shown PHP and MYSQL based web apps to be the worst by far. This is a combination of the level of developer they target and features absent or buggy in the platforms themself. I have reached the point where I actively discourage anyone from considering MYSQL or PHP for new development work. It's just too difficult to use them to produce secure, well structured, and maintainable code. Besides, there are a number of open source and commercial alternatives that do not share the same weaknesses.
FF2 still needs to be recompiled for Python support. However, FF3 sits on top of XULRunner and uses a more generic XPCOM interface. This will let developers use any language with an XPCOM binding (Python and Java already exist). Of course, you will still have to install the runtime for your favorite language, but they are working on a way to streamline this process.
One other nice feature of hosting on XULRunner is that the runtime will be shared. This will reduce the disk and memory footprint between apps (eg. FF and TB). It should also allow you to install an extension once for multiple apps.
Sorry to contradict, but you are incorrect. Extensions can be written in any any language that supports XPCOM bindings, and many are not portable across platforms. Enigmail and the Calendar extension are two perfect examples of compiled C/C++ binary extensions that can't be written in JavaScript. However, most extensions are cross-platform and written in JavaScript. Also, a JavaScript extension doesn't need to do any XPCOM wizardry to cause memory leaks. It just needs to maintain references to unused objects or create cyclic references.
Extensions exist in a global context for the process. They can maintain a permanent reference to objects that are never used again, and should otherwise be freed. They may also create cyclic references, in which one or more objects contains references to each other. This creates a situation where the objects are not referenced by an accessible code path, and the reference count can never drop to zero. The result is a leak, and it is an inherint weakness simple of reference counting garbage collection.
Even web pages can create circular JavaScript references that result in leaks. FF isn't alone in this area either. IE has always been vulnerable to memory leaks via JavaScript, theirs are just confined to bad pages. However, FF 3 will have a cycle detector that identifies unused cyclic references and frees the objects. But that still won't fix sloppy extensions that hang on to large objects for no goood reason.
In my experience, Plugins are pretty bad too. They operate outside the scope of the garbage collection and often don't clean up after themselves. For instance, my installation of Acrobat eats up a large chunk memory just for loading, and doesn't let it go after I navigate away from the page. The PDF Download extension helps, but it isn't perfect.
ESX is the host operating system. It's basically an embedded OS that is very tuned for it's purpose, but it still must provide the core services of a host OS. The ESX host overhead is significantly less than with GSX or Workstation, but it is still there.
It appears that you've run down a rabit hole and missed the main point, so I will restate it for clarity. VMWare is meant for x86 virtualization and would be an extremely poor choice as the basis for a distributed computing environment. I'm harping on this point because I keep running into people who try to use VMWare for problems that it can't really handle. Actually, even VMWare tried to do this when they marketted ACE as capable of protecting a guest OS from a malicious host system (which is a simple fallacy, btw).
So I'll respond briefly to your points, just to make sure that where I'm coming from is clear.
Ever heard of a SAN? It's not as if...
I very specifically stated "can incur" because a SAN can be used to address this issue. However, you have now added a very expensive hardware system into the mix as a counter to an environment that just wouldn't have the same base requirement. This is a very significant additional cost.
While the "virtual physical ram" in a guestis not resizable
Based on my read you did catch my point. You just chose to argue in another direction because you could not dispute it. Most of the guest hardware cannot be reconfigured while the guest is active. This significantly limits how well you can manage and distribute resources. It really doesn't matter when VMWare is used for server consolidation, but it would be a big issue if you were trying to use it for distributed computing purposes.
are you implying that JVMs introduce lower overheads?
I'm not implying this, I'm stating a simple fact. The overhead of a JVM is massively less than that of virtualizing an entire x86 system including the guest OS. A JVM still has the benefit of binaries being mapped and shared across multiple running instances. Guest OS images are significantly larger, and cannot be shared efficiently, thus they have much higher resource requirements. In VMWare you have an expensive world switch for IO and priveleged instruction execution (in addition to guest and host OS context switching overhead). This is exactly what I was referring to by "partial emulation", you have to emulate certain priveleged instructions to make the virtualization work. Java doesn't have that issue. After JITC, it just has the standard IO overhead of the host OS context switching. The basic point is that a single process is going to have significantly less implicit overhead than an entire virtualized system.
I completely agree that Java will introduce some overhead beyond native code, but it is just not that significant. With the exception of SWING and GUI code (which aren't a factor in server side processing), a good JITC will typically generate executables that run comparably to C/C++ in terms of speed and resource usage (after the base JVM cost). And garbage collection overhead can go either way depending on the type of code. Sometimes Java is ahead, sometimes it's behind, but generally the difference is small. So once again, a JVM simply will not introduce anywhere near as much overhead as a VMWare guest.
I work in VMware's virtual machine monitor group.
If you really work in the VMM group at VMWare, you should know enough to not be arguing this point. I'm not taking a shot at the company or product here. I honestly think that VMWare may be the most briliant and technically advanced software hack that I've ever seen. But the fact remains that that it is hack that virtualizes a processor architecturer that was not intended to be virtualized. As a result, some performance penalties are incurred.
I still think VMWare is the best currently available system for server consolidation in any moderate sized environment. I would also argue that it is almost essential in development and testing environments. And I have some seen some really interesting applications of VMWare for partitioning security domains on a single system (assuming a trusted host OS; none of that ACE crap). But it simply has no place in distributed computing.
Actually, I had VMotion in mind when I explained my scenario. You have to remember that it doesn't provide fine grained or predicitive load balancing. It also has to shovel whole guest images to transfer state between systems, which can incur significant bandwidth usage and latency issues. Plus you have to limit most of your guest requirements ahead of time (example: virtual memory cannot be dynamically resized in an active guest). So, ignoring obvious overhead that VMWare's partial emulation technique incurs, you have serious resource distribution issues.
The simple fact is that VMWare has no place in distributed computing. It's sole purpose is x86 virtualization, which it has done exceptionally well on an architecture that lacks native support. However, even that niche is starting to erode with the next generation of processors that will support true native virtualization.
Wow, you got an insightful mod when you didn't even understand the problem. The irony is overwhelming. Anyway, this really addresses a completely different problem than VMWare. It fits much more into the realm of distributed computing than virtualization. However, the JVM provides a *virtualized* platform that makes it easy to *distribute* the processing efficiently.
So, back to the VMWare thing, yes I suppose you could hack a cluster of ESX servers up to do this. Of course you would have all of the overhead that VMWare needs to introduce. This includes the host OS, world switch and priveleged instruction emulation overhead, guest os, and application image. On top of that, you would have to shovel images around your cluster to make it work so bandwidth would be a nuisance. You would also be severely limited in how dynamically you could reassign resources, given the requirements of the guest OS. And you would of course be restricted to x86 architectures, which may or may not be an issue.
You do realize that EJB is a back-end technology, and is often deployed in conjuction with Struts? Also, the VB reference makes no sense since EJB is actually a wrapper around Corba IIOP.
Thanks for pointing that out. It turns out that SwarmCast is still functional and now under the Onion Networks umbrella at http://swarmcast.net/. They also have a good Java FEC library available. I suppose I really should stop arguing on Slashdot and spend some free time putting together an Azureus plug-in that would add a similar capability BitTorrent transfers. That's probably the best way to demonstrate my point.
I've been polite so far, and I specifically avoided citing mistakes point by point. But it's apparent that you are unwilling to accept something that should be a basic matter of reasoning. Simply put, adding a redundancy set to the BitTorrent protocol would improve the even distribution of torrent pieces and help address real world deficiencies in the "rarest first" approach. The net effect of this would be to increase the health and lifespan of many less active torrents. I can understand if you personally don't have a use for this, but you really can't logically argue the point. By adding a redundancy set, you would reduce the "rareness" of every piece in the data set and thus improve the chances of completing the torrent. It's just that simple.
Since you still seem to have some serious misconceptions, I'll make one last attempt to clarify things:
Indeed. You're suggesting turning a torrent file into what will mostly be a par/par2 file by placing a percentage of the torrent data as parity recovery blocks in there. Doing so will significantly increase the size of the torrent -- dwarfing the actual torrent by orders of magnitude. (I'll come back to that.)
Incorrect. Off the top of my head I was suggesting adding 10% redundancy to the torrent *content* which would also increase the torrent file by roughly 10%. Of course, what I am more thoroughly suggesting is to add a selectable degree of redundancy information. Your introduction of inapplicable extremes implies a weak arguing position.
Actually my understanding of FEC/ECC goes well beyond PAR. And technically, what you are describing is not "FEC"... it's parity data. You aren't "correcting errors"; you are "(re)constructing data you don't have." FEC is about being able to fix errors in the data as it's being received -- not so much about pulling data out of nothing. FEC comes before the data in a stream; ECC comes after... I'm just going to calling it "parity data" because that's what you're talking about.
The above is simply factually incorrect. Parity is a special case CRC that is only valid for handling single bit errors in a set. If you noticed, my previous example used 4 redundancy blocks, which is not possible with a parity scheme. Granted, parity sets can be grouped into very large blocks (e.g. RAID 4 and 5) however parity is not guaranteed to detect and repair more than a single bit error, meaning you can only handle one bad block. That is why you can't, for instance, lose more than one drive in a RAID 4 or 5 array. Also, FEC (forward error correction) specifically refers to supplying redundancy information in the form of ECC (error correcting code) such that completion of a data set does not require retransmission. These codes include (but are not limited to) duplication, parity schemes, and Reed-Solomon encoding. It does not matter if the redundancy data is at the end or beginning of the transmission; the "forward" part actually refers to the redundancy data being present in the initial transmission. The same logic applies to media encoding (e.g Reed-Solomon on CD tracks), and in that case transmission is effectively the act of writing to the media. Perhaps your confusion results from the poor naming choice for PAR sets, which actually use Reed-Solomon coding, and not a parity scheme.
Comments like this are why I don't think you understand how BT works... the site, or more specific, it's tracker(s), know almost nothing about what's going on in the swarm. The tracker isn't a BT client. So it's not part of any of the swarms it's tracking. Therefore it knows nothing at all about the distribution of data within the swarm. So, what's going to decide who gets a huge parity torrent and who get a tiny standard torrent?
With any given torrent, you'll either have to download the entire set of data or the equiv in data + parity (which presumablly you have from the torrent.) If you have enough to recover the complete set, th
I have to admit that I was really tempted to get dragged into an argument on this one, but I'm still going to try to address this without resorting to the same level of vitriol that you employed. First I want to ensure that we're on the same ground with respect to FEC here. Based on the latter half of your post you seem to have some understanding of Reed Solomon codes for forward error correction on arbitrary data packets of equivalent size. It should also be apparent that I am not suggesting a simple parity check or basic hamming code on top of the TCP transfer. There's obviously no value in that and I think your attempt to cling to it for argument's sake is why you're failing to understand what I'm proposing. Also, I'm fairly sure I have a good grasp of the core Bittorrent protocol, however I did assume the semantic equivalence of "chunk" and "piece". I would have guessed that it was obvious from context, but from now on I will use the term "piece" and you can assume that it is interchangeable with "chunk" in my earlier post.
If this has all been cleared up, I will move on to the real point. Based on your response it appears that your understanding of Reed Solomon FEC is from PAR files. So I'll try to put it in the most directly relevant context for you. I am *absolutely* suggesting an extension to the BitTorrent file format that would include entries for redundancy pieces, similar to the way PAR files are used. For example, take a 10% redundancy on a single 10mb file with 256kb pieces. In addition to the 40 (10mb / 256kb) pieces that make up the content, I'm proposing an extended section that would include entries for 4 additional redundancy pieces in this case. Because this is implemented as an extension, earlier clients will simply ignore it. However, the additional piece entries will increase the size of the torrent file roughly equivalent to the additional redundancy (10% in this case).
Now, you really don't seem to grasp why I consider this useful, so I'm going to try to explain the value to me personally. First, this is not an issue in a constantly seeded torrent, hence my suggestion to not distribute the FEC pieces when the there are more than a certain number of distributed copies visible or when you know a seed will always be available. The problem that I've run into is when a torrent's activity hits a trough, and the seeds start to drop out due to lack of activity (this is further exacerbated by any leeching that may have occurred previously). If activity later starts to rise, you encounter a situation where the remaining seeds have to exceed significantly more than a 1.0 share ratio in order to get the torrent back to health. I've hit this type of scenario on a few occasions where all the seeds drop out and the remaining peers sit for weeks with +95% of the content completed because the rarest first algorithm didn't quite work and too many peers dropped. However, by distributing the redundancy pieces and content as a set, using the same rarest first algorithm, you should drastically reduce the chances of this happening.
And if you look at this a little further, it should also be pretty beneficial for initial seeding. If both the content and redundancy pieces are distributed as a set, the initial seed can force a higher level of redundancy in the clients by applying the same rarest first algorithm to all peers. This means that the early sets of peers may have to stay in the channel a bit longer, but it significantly increases the redundancy of the swarm. And once the visible complete files pass a threshold, you can stop distributing the redundancy data because the swarm will have enough inherent redundancy to compensate for any issues. If it drops below the threshold, however, redundancy content should agin be distributed. And now that I look around a bit, it appears that Bram Cohen has even mentioned the same basic approach. If you scroll down the page you should find the following:
Sorry, I guess I didn't really explain properly. The point is to address losses due to unequal distribution of content, not transmission failures. If you read my post below I think I explain it better.
It's funny that you brought up PAR's, because the application to BitTorrent is essentially the same. The difference being that FEC embeds much more cleanly into the BitTorrent protocol than the typical PAR uses. So, just like with news groups, you're using FEC to compensate for the losses due to unequal chunk distribution, and not a lossy connection. I honestly assumed the concept was fairly straight forward, but I can tell from your response that I need to provide some further explanation. Here's the basic idea:
BitTorrent divides a file up into chunks (usually 256k) for distributed download. The content of the downloaded chunks is validated using the SHA1 hashes in the torrent file. My suggestion is to use the torrent chunks as packet units in the FEC. Some quick math shows that a 16 bit FEC will handle up to 17GB data plus FEC at a 256k chunk size, which should be more than acceptable for most applications. It should also be fairly easy to add this capability in a completely backwards compatible way. I only briefly looked at the torrent file format, but it appears that arbitrary metadata is adequately supported; the FEC content can just be a simple extension in the torrent file.
Bandwidth overhead should be minimal in this situation. Assuming that the chunk hashes are the majority of the content in a torrent file, the base overhead is a percentage increase equal to the amount of FEC (assuming the FEC chunks are also hashed for validity). So a slightly increased torrent file is the only cost a non-FEC aware client would incur. The actual content overhead should also be fairly minor assuming the FEC chunks are prioritized roughly the same as a torrent content standard download. The loss should should just be any partial chunks remaining after you've downloaded enough chunks to rebuild the main content.
Now there's no question that there will be storage overhead when seeding starts, but this can be addressed. The initial seed would need to generate the FEC chunks for the purposes of including the hashes in the torrent file, and it only makes sense to retain them at that point. But once a reasonable number of distributed copies are visible the seeds don't need to provide FEC chunks. If the number of distributed copies in the swarm drops below a certain threshold, seeds can then start providing FEC chunks to improve the chances of download completion. So if the swarm is saturated enough, later seeds will never download or generate the FEC chunks. This also addresses any lingering bandwidth concerns by eliminating FEC from swarms that don't need it. And in a small swarm where the chances of completion are low, I'm sure people would sacrifice the local storage to improve the chances they get their files.
So the only thing left is CPU overhead to generate the chunks. I realize that this could actually be a pretty significant burst but, when taken in context over the duration of the transfer, it will be fairly small. And given the power of CPU's these days, I'm really surprised that someone would argue that it's an issue. And once again we can fall back to making FEC optional. If it really bothers you, there's no need for you to use it on your client.
Assuming I get some time, I may actually try to throw together an Azureus plug-in using the FEC code from the Onion router project. It just seems like it would be fun project to test a few ideas.
I have to state that I strongly disagree with one of the comments at the end from Brahm Cohen. I mean, MS Avalanche is vaporware, but that doesn't mean that use of FEC (forward error correction) is a bad idea. Granted it would increase local storage requirements when seeding, but there would be almost no impact on network bandwidth and the CPU overhead is negligible. Personally, I'd be more than happy to sacrifice say a 10% increase in local size to ensure that I get a complete copy of the torrent. I've found numerous torrents that died out somewhere between 90 - 100%; And the worst is when you have a wasted download because you're missing only a fraction of a percent.
Personally, I would like to see a combination of the BitTorrent "send the least common block" approach and a selectable Reed-Solomon coding defaulting to around 10%. In my empirical experience that would clear up almost every failed torrent I've hit. Of course, it is an extendable protocol. Perhaps I should stop bitching and look into writing an Azureus plug-in to test this idea out.
The tab thing sounds interesting, so I'll give it a try and see what I think. I wouldn't use the IE anti-phishing system because it sends every URL to MS' servers for validation. I don't consider myself paranoid, but I'm not comfortable with handing over my entire browsing history to a third party.
In terms of cutting edge stuff I'd really like to see IE support SVG, XForms, more complete CSS, and other Web 2.0 features. I guess we just have different views and priorities on that one.
Lightning is the Thunderbird extension that provides the same capabilities as Sunbird. It supports iMIP and other email invitation formats. It also has a Provider for Google Calendar Extension .
I assumed the qualifier was understood; I meant publicly disclosed, not just disclosed to the vendor. Also, I'm not sure if you're familiar with how disclosure works, but it's not in Moore's best interests reveal that he's sitting on vulnerabilities unless he intends to disclose them soon. So he may be practicing responsible disclosure and allowing the vendor a reasonable amount of time to complete a patch. Or he may have other reasons for waiting.
Security disclosure in general is a pretty complicated game of posturing and politicking. At its best it can be a genuinely altruistic form of public service. At its worst it's extortion scams and weapons trafficking.
I think your definition of zero day is ops-centric, and not security-centric. In this post I give the generally accepted definition in the security community, which agrees with Moore's statement. To summarize, the security community only uses 0-day to refer to undisclosed vulnerabilities, and it does not address patch lag.
A 0-day refers to an undisclosed vulnerability; however, some people have stretched the definition to mean unpatched vulnerability. It's considered a stretch because an unpatched vulnerability is still known, so precautions can be taken. With a true 0-day vulnerability/exploit, you would have no knowledge of the issue and no way of protecting specifically against it.
The SHA and MD5 attacks are bait-and-switch, so the attacker must control both messages. The current research doesn't imply any weaknesses for HMAC applications, such as password storage. So yes, SHA and MD5 are probably still quite acceptable for these purposes if they're already in place.
Wow, talk about some FUD. Of the 14 vulns so far 10 are NULL pointer dereferences. HD must be really desperate for publicity if he's trying to pump these up as legitimate security vulns. I mean, you can argue that a server crash is a DoS, but crashing a browser? Get real.
Once again, I really don't think you took the article in the intended context. Here are the author's two major issues with both PHP and MySQL.
So, he was including issues beyond just the acknowledged bugs. MYSQL in particular has a lot of non-standard SQL quirks that create real problems for any serious development work. This site lists the major ones prior to 5.0. And PHP has numerous issues of comparable severity, like magic quotes and it's historic lack of a consistent interface for prepared queries.
I suppose the author devoted his attention to misfeatures, as opposed to more straight forward bugs. However, I think that he made his intended focus clear at the beginning. And I don't consider the second point to be hyperbole at all. I think he substantiated the first point enough to use it as a foundation for the second. Although, I will admit that my professional experience makes me significantly inclined to agree with both points. So perhaps I am taking certain background knowledge and technical details for granted.
I think you misunderstand the topic. Using an old version of MYSQL or PHP has almost nothing to do with exploitable SQL injection vulnerabilities. The author's point is that PHP and MYSQL encourage bad style that leads to poor quality code. Consider the recent issue with escaped multibyte characters; that further cements that concatenated queries are a bad practice. In the end, good coding practices would have significantly mitigated the impact of that particular vulnerability.
Personally, I wholehearteadly agree with the author. I've spent the last few years performing security reviews on millions of lines of web application code. My experience has shown PHP and MYSQL based web apps to be the worst by far. This is a combination of the level of developer they target and features absent or buggy in the platforms themself. I have reached the point where I actively discourage anyone from considering MYSQL or PHP for new development work. It's just too difficult to use them to produce secure, well structured, and maintainable code. Besides, there are a number of open source and commercial alternatives that do not share the same weaknesses.
FF2 still needs to be recompiled for Python support. However, FF3 sits on top of XULRunner and uses a more generic XPCOM interface. This will let developers use any language with an XPCOM binding (Python and Java already exist). Of course, you will still have to install the runtime for your favorite language, but they are working on a way to streamline this process.
One other nice feature of hosting on XULRunner is that the runtime will be shared. This will reduce the disk and memory footprint between apps (eg. FF and TB). It should also allow you to install an extension once for multiple apps.
Sorry to contradict, but you are incorrect. Extensions can be written in any any language that supports XPCOM bindings, and many are not portable across platforms. Enigmail and the Calendar extension are two perfect examples of compiled C/C++ binary extensions that can't be written in JavaScript. However, most extensions are cross-platform and written in JavaScript. Also, a JavaScript extension doesn't need to do any XPCOM wizardry to cause memory leaks. It just needs to maintain references to unused objects or create cyclic references.
Extensions exist in a global context for the process. They can maintain a permanent reference to objects that are never used again, and should otherwise be freed. They may also create cyclic references, in which one or more objects contains references to each other. This creates a situation where the objects are not referenced by an accessible code path, and the reference count can never drop to zero. The result is a leak, and it is an inherint weakness simple of reference counting garbage collection.
Even web pages can create circular JavaScript references that result in leaks. FF isn't alone in this area either. IE has always been vulnerable to memory leaks via JavaScript, theirs are just confined to bad pages. However, FF 3 will have a cycle detector that identifies unused cyclic references and frees the objects. But that still won't fix sloppy extensions that hang on to large objects for no goood reason.
In my experience, Plugins are pretty bad too. They operate outside the scope of the garbage collection and often don't clean up after themselves. For instance, my installation of Acrobat eats up a large chunk memory just for loading, and doesn't let it go after I navigate away from the page. The PDF Download extension helps, but it isn't perfect.
Exceptionally funny coming from the man with the term "Nutscrape" in his nick.
ESX is the host operating system. It's basically an embedded OS that is very tuned for it's purpose, but it still must provide the core services of a host OS. The ESX host overhead is significantly less than with GSX or Workstation, but it is still there.
It appears that you've run down a rabit hole and missed the main point, so I will restate it for clarity. VMWare is meant for x86 virtualization and would be an extremely poor choice as the basis for a distributed computing environment. I'm harping on this point because I keep running into people who try to use VMWare for problems that it can't really handle. Actually, even VMWare tried to do this when they marketted ACE as capable of protecting a guest OS from a malicious host system (which is a simple fallacy, btw).
So I'll respond briefly to your points, just to make sure that where I'm coming from is clear.
I very specifically stated "can incur" because a SAN can be used to address this issue. However, you have now added a very expensive hardware system into the mix as a counter to an environment that just wouldn't have the same base requirement. This is a very significant additional cost.
Based on my read you did catch my point. You just chose to argue in another direction because you could not dispute it. Most of the guest hardware cannot be reconfigured while the guest is active. This significantly limits how well you can manage and distribute resources. It really doesn't matter when VMWare is used for server consolidation, but it would be a big issue if you were trying to use it for distributed computing purposes.
I'm not implying this, I'm stating a simple fact. The overhead of a JVM is massively less than that of virtualizing an entire x86 system including the guest OS. A JVM still has the benefit of binaries being mapped and shared across multiple running instances. Guest OS images are significantly larger, and cannot be shared efficiently, thus they have much higher resource requirements. In VMWare you have an expensive world switch for IO and priveleged instruction execution (in addition to guest and host OS context switching overhead). This is exactly what I was referring to by "partial emulation", you have to emulate certain priveleged instructions to make the virtualization work. Java doesn't have that issue. After JITC, it just has the standard IO overhead of the host OS context switching. The basic point is that a single process is going to have significantly less implicit overhead than an entire virtualized system.
I completely agree that Java will introduce some overhead beyond native code, but it is just not that significant. With the exception of SWING and GUI code (which aren't a factor in server side processing), a good JITC will typically generate executables that run comparably to C/C++ in terms of speed and resource usage (after the base JVM cost). And garbage collection overhead can go either way depending on the type of code. Sometimes Java is ahead, sometimes it's behind, but generally the difference is small. So once again, a JVM simply will not introduce anywhere near as much overhead as a VMWare guest.
If you really work in the VMM group at VMWare, you should know enough to not be arguing this point. I'm not taking a shot at the company or product here. I honestly think that VMWare may be the most briliant and technically advanced software hack that I've ever seen. But the fact remains that that it is hack that virtualizes a processor architecturer that was not intended to be virtualized. As a result, some performance penalties are incurred.
I still think VMWare is the best currently available system for server consolidation in any moderate sized environment. I would also argue that it is almost essential in development and testing environments. And I have some seen some really interesting applications of VMWare for partitioning security domains on a single system (assuming a trusted host OS; none of that ACE crap). But it simply has no place in distributed computing.
Actually, I had VMotion in mind when I explained my scenario. You have to remember that it doesn't provide fine grained or predicitive load balancing. It also has to shovel whole guest images to transfer state between systems, which can incur significant bandwidth usage and latency issues. Plus you have to limit most of your guest requirements ahead of time (example: virtual memory cannot be dynamically resized in an active guest). So, ignoring obvious overhead that VMWare's partial emulation technique incurs, you have serious resource distribution issues.
The simple fact is that VMWare has no place in distributed computing. It's sole purpose is x86 virtualization, which it has done exceptionally well on an architecture that lacks native support. However, even that niche is starting to erode with the next generation of processors that will support true native virtualization.
Wow, you got an insightful mod when you didn't even understand the problem. The irony is overwhelming. Anyway, this really addresses a completely different problem than VMWare. It fits much more into the realm of distributed computing than virtualization. However, the JVM provides a *virtualized* platform that makes it easy to *distribute* the processing efficiently.
So, back to the VMWare thing, yes I suppose you could hack a cluster of ESX servers up to do this. Of course you would have all of the overhead that VMWare needs to introduce. This includes the host OS, world switch and priveleged instruction emulation overhead, guest os, and application image. On top of that, you would have to shovel images around your cluster to make it work so bandwidth would be a nuisance. You would also be severely limited in how dynamically you could reassign resources, given the requirements of the guest OS. And you would of course be restricted to x86 architectures, which may or may not be an issue.
So you could do it, but boy would it be dumb.
You do realize that EJB is a back-end technology, and is often deployed in conjuction with Struts? Also, the VB reference makes no sense since EJB is actually a wrapper around Corba IIOP.
Thanks for pointing that out. It turns out that SwarmCast is still functional and now under the Onion Networks umbrella at http://swarmcast.net/. They also have a good Java FEC library available. I suppose I really should stop arguing on Slashdot and spend some free time putting together an Azureus plug-in that would add a similar capability BitTorrent transfers. That's probably the best way to demonstrate my point.
I've been polite so far, and I specifically avoided citing mistakes point by point. But it's apparent that you are unwilling to accept something that should be a basic matter of reasoning. Simply put, adding a redundancy set to the BitTorrent protocol would improve the even distribution of torrent pieces and help address real world deficiencies in the "rarest first" approach. The net effect of this would be to increase the health and lifespan of many less active torrents. I can understand if you personally don't have a use for this, but you really can't logically argue the point. By adding a redundancy set, you would reduce the "rareness" of every piece in the data set and thus improve the chances of completing the torrent. It's just that simple.
Since you still seem to have some serious misconceptions, I'll make one last attempt to clarify things:
Incorrect. Off the top of my head I was suggesting adding 10% redundancy to the torrent *content* which would also increase the torrent file by roughly 10%. Of course, what I am more thoroughly suggesting is to add a selectable degree of redundancy information. Your introduction of inapplicable extremes implies a weak arguing position.
The above is simply factually incorrect. Parity is a special case CRC that is only valid for handling single bit errors in a set. If you noticed, my previous example used 4 redundancy blocks, which is not possible with a parity scheme. Granted, parity sets can be grouped into very large blocks (e.g. RAID 4 and 5) however parity is not guaranteed to detect and repair more than a single bit error, meaning you can only handle one bad block. That is why you can't, for instance, lose more than one drive in a RAID 4 or 5 array. Also, FEC (forward error correction) specifically refers to supplying redundancy information in the form of ECC (error correcting code) such that completion of a data set does not require retransmission. These codes include (but are not limited to) duplication, parity schemes, and Reed-Solomon encoding. It does not matter if the redundancy data is at the end or beginning of the transmission; the "forward" part actually refers to the redundancy data being present in the initial transmission. The same logic applies to media encoding (e.g Reed-Solomon on CD tracks), and in that case transmission is effectively the act of writing to the media. Perhaps your confusion results from the poor naming choice for PAR sets, which actually use Reed-Solomon coding, and not a parity scheme.
I have to admit that I was really tempted to get dragged into an argument on this one, but I'm still going to try to address this without resorting to the same level of vitriol that you employed. First I want to ensure that we're on the same ground with respect to FEC here. Based on the latter half of your post you seem to have some understanding of Reed Solomon codes for forward error correction on arbitrary data packets of equivalent size. It should also be apparent that I am not suggesting a simple parity check or basic hamming code on top of the TCP transfer. There's obviously no value in that and I think your attempt to cling to it for argument's sake is why you're failing to understand what I'm proposing. Also, I'm fairly sure I have a good grasp of the core Bittorrent protocol, however I did assume the semantic equivalence of "chunk" and "piece". I would have guessed that it was obvious from context, but from now on I will use the term "piece" and you can assume that it is interchangeable with "chunk" in my earlier post.
If this has all been cleared up, I will move on to the real point. Based on your response it appears that your understanding of Reed Solomon FEC is from PAR files. So I'll try to put it in the most directly relevant context for you. I am *absolutely* suggesting an extension to the BitTorrent file format that would include entries for redundancy pieces, similar to the way PAR files are used. For example, take a 10% redundancy on a single 10mb file with 256kb pieces. In addition to the 40 (10mb / 256kb) pieces that make up the content, I'm proposing an extended section that would include entries for 4 additional redundancy pieces in this case. Because this is implemented as an extension, earlier clients will simply ignore it. However, the additional piece entries will increase the size of the torrent file roughly equivalent to the additional redundancy (10% in this case).
Now, you really don't seem to grasp why I consider this useful, so I'm going to try to explain the value to me personally. First, this is not an issue in a constantly seeded torrent, hence my suggestion to not distribute the FEC pieces when the there are more than a certain number of distributed copies visible or when you know a seed will always be available. The problem that I've run into is when a torrent's activity hits a trough, and the seeds start to drop out due to lack of activity (this is further exacerbated by any leeching that may have occurred previously). If activity later starts to rise, you encounter a situation where the remaining seeds have to exceed significantly more than a 1.0 share ratio in order to get the torrent back to health. I've hit this type of scenario on a few occasions where all the seeds drop out and the remaining peers sit for weeks with +95% of the content completed because the rarest first algorithm didn't quite work and too many peers dropped. However, by distributing the redundancy pieces and content as a set, using the same rarest first algorithm, you should drastically reduce the chances of this happening.
And if you look at this a little further, it should also be pretty beneficial for initial seeding. If both the content and redundancy pieces are distributed as a set, the initial seed can force a higher level of redundancy in the clients by applying the same rarest first algorithm to all peers. This means that the early sets of peers may have to stay in the channel a bit longer, but it significantly increases the redundancy of the swarm. And once the visible complete files pass a threshold, you can stop distributing the redundancy data because the swarm will have enough inherent redundancy to compensate for any issues. If it drops below the threshold, however, redundancy content should agin be distributed. And now that I look around a bit, it appears that Bram Cohen has even mentioned the same basic approach. If you scroll down the page you should find the following:
Sorry, I guess I didn't really explain properly. The point is to address losses due to unequal distribution of content, not transmission failures. If you read my post below I think I explain it better.
It's funny that you brought up PAR's, because the application to BitTorrent is essentially the same. The difference being that FEC embeds much more cleanly into the BitTorrent protocol than the typical PAR uses. So, just like with news groups, you're using FEC to compensate for the losses due to unequal chunk distribution, and not a lossy connection. I honestly assumed the concept was fairly straight forward, but I can tell from your response that I need to provide some further explanation. Here's the basic idea:
BitTorrent divides a file up into chunks (usually 256k) for distributed download. The content of the downloaded chunks is validated using the SHA1 hashes in the torrent file. My suggestion is to use the torrent chunks as packet units in the FEC. Some quick math shows that a 16 bit FEC will handle up to 17GB data plus FEC at a 256k chunk size, which should be more than acceptable for most applications. It should also be fairly easy to add this capability in a completely backwards compatible way. I only briefly looked at the torrent file format, but it appears that arbitrary metadata is adequately supported; the FEC content can just be a simple extension in the torrent file.
Bandwidth overhead should be minimal in this situation. Assuming that the chunk hashes are the majority of the content in a torrent file, the base overhead is a percentage increase equal to the amount of FEC (assuming the FEC chunks are also hashed for validity). So a slightly increased torrent file is the only cost a non-FEC aware client would incur. The actual content overhead should also be fairly minor assuming the FEC chunks are prioritized roughly the same as a torrent content standard download. The loss should should just be any partial chunks remaining after you've downloaded enough chunks to rebuild the main content.
Now there's no question that there will be storage overhead when seeding starts, but this can be addressed. The initial seed would need to generate the FEC chunks for the purposes of including the hashes in the torrent file, and it only makes sense to retain them at that point. But once a reasonable number of distributed copies are visible the seeds don't need to provide FEC chunks. If the number of distributed copies in the swarm drops below a certain threshold, seeds can then start providing FEC chunks to improve the chances of download completion. So if the swarm is saturated enough, later seeds will never download or generate the FEC chunks. This also addresses any lingering bandwidth concerns by eliminating FEC from swarms that don't need it. And in a small swarm where the chances of completion are low, I'm sure people would sacrifice the local storage to improve the chances they get their files.
So the only thing left is CPU overhead to generate the chunks. I realize that this could actually be a pretty significant burst but, when taken in context over the duration of the transfer, it will be fairly small. And given the power of CPU's these days, I'm really surprised that someone would argue that it's an issue. And once again we can fall back to making FEC optional. If it really bothers you, there's no need for you to use it on your client.
Assuming I get some time, I may actually try to throw together an Azureus plug-in using the FEC code from the Onion router project. It just seems like it would be fun project to test a few ideas.
I have to state that I strongly disagree with one of the comments at the end from Brahm Cohen. I mean, MS Avalanche is vaporware, but that doesn't mean that use of FEC (forward error correction) is a bad idea. Granted it would increase local storage requirements when seeding, but there would be almost no impact on network bandwidth and the CPU overhead is negligible. Personally, I'd be more than happy to sacrifice say a 10% increase in local size to ensure that I get a complete copy of the torrent. I've found numerous torrents that died out somewhere between 90 - 100%; And the worst is when you have a wasted download because you're missing only a fraction of a percent.
Personally, I would like to see a combination of the BitTorrent "send the least common block" approach and a selectable Reed-Solomon coding defaulting to around 10%. In my empirical experience that would clear up almost every failed torrent I've hit. Of course, it is an extendable protocol. Perhaps I should stop bitching and look into writing an Azureus plug-in to test this idea out.
The tab thing sounds interesting, so I'll give it a try and see what I think. I wouldn't use the IE anti-phishing system because it sends every URL to MS' servers for validation. I don't consider myself paranoid, but I'm not comfortable with handing over my entire browsing history to a third party.
In terms of cutting edge stuff I'd really like to see IE support SVG, XForms, more complete CSS, and other Web 2.0 features. I guess we just have different views and priorities on that one.