ajs · Slashdot Mirror

Re:What .NET Remoting is on Advanced .NET Remoting · 2003-10-22 06:35 · Score: 1

More to the point, .Net remoting is much like RMI which is much like Corba RPCs which is much like NIDL (Network Interface Definition Language) RPCs from Domain/OS circa 1985. In fact, Corba was directly based on Apollo's work on NIDL which was bought by HP. .Net and RMI may have developed independantly, I'm not sure, but the ground work for all of this was mid-80s technology.

Domain/OS was so far ahead of its time, that even today most UNIX-like systems cannot profide some of Domain/OSes features (e.g. a truly seemless remote filesystem like AFS, but more integrated into everything the OS does; a central LDAP-like registry of users, permissions etc. much like Windows Active Directory, but which worked out of the box on a network of arbitrary size (and, again was much more tightly integrated with the OS); a light-weight windowing toolkit powerful enough to embed X, and yet light-weight enough to be far faster). The funny thing was, they did all this in Pascal (with some assembly in the kernel, of course). I'm not sure, but I think it was the only commercial OS written in Pascal, and oddly, it was the most responsive I've used relative to the hardware it was running on (Motorola 680x0s and later, their own proprietary precursor to the HP RISC platform).

Back in the 80s there were a handful of companies doing really amazing, cutting-edge stuff. It's sad that companies like Sequent, Apollo, Stellar and those of their ilk were all bought up by larger, less interesting companies who failed to preserve their technical legacy of excellence....

Ahem... sorry, that... that was a tangent. You may now return to your .Netisms ;-)

Re:MS on Patching Paranoia - How Fast Do You Patch? · 2003-10-21 07:41 · Score: 1

You already got your answer, but I just want to go a step further and point out that this happens all the time. It is correct and neccessary behavior for a good filesystem, or at least one of the very first steps in such.

Windows has this capability to some extent, if you use NTFS, but the OS can only apply these capabilities in limited ways due to the need to remain compatible with FAT filesystems.

Linux will eventually have the same problem in the sense that new filesystems (e.g. Reiser) are implementing new features, but they can only be taken advantage of in the context of backward compatibility with older filesystems. The API between the kernel and the filesystem will get no more sophisticated (at least not for a long while) nor will the basic file operations (again, not until the need for backward compatibility is gone).

Re:Fun with numbers on Text Mining the Multiverse · 2003-10-20 02:00 · Score: 1

Yep, binaries are a good example. Basically, in any data files that represent large systems with many variables, you should find that the Perl regular expression

/\b(\d)\d*\b/g

should match a 1 most often. In some types of text (especially code), you will find things like "0" show up a lot. That's why in my example, I didn't allow for single-digit numbers, but if you want to, that's cool.

I find that a large pool of USENET posts works best.

Re:Fun with numbers on Text Mining the Multiverse · 2003-10-17 09:24 · Score: 1

Ooops. While you see a distribution there, that's not what I was trying to point out. The correct one-liner would be

perl -ne '$x{$1}++ while /\b([1-9])\d*\b/g;END{$all+=$_ foreach map {$x{$_}} keys %x;print map {sprintf "%d occured %.1f%% (%d times)\n",$_,$x{$_}/$all*100,$x{$_}} sort {$a<=>$b} keys %x;print "$. lines read\n"}'

Benford's law is the name of this phenomenon. Its even more interesting because it is independant of base!

There are many ways that this is used, including detecting human tampering with complex systems (such as the accounting statements for a company, where the person modifying the numbers is likely to skew the results without realizing it).

Fun with numbers on Text Mining the Multiverse · 2003-10-17 08:50 · Score: 2, Interesting

Here's some fun you can have with numbers. Take this Perl one-liner:

perl -ne '$x{$1}++ while /(\d)/g;END{print map {"$_ occured $x{$_} times\n"} sort {$a<=>$b} keys %x}' xxxx

and run it with "xxxx" replaced by the name of some large text file that you create by saving email messages, web pages, log files, what have you.

The scary part (that took mathmeticians a long time to accept and longer to figure out) is that the distribution is the same for any sufficiently representitive sample of text....

Re:Complete Privatization = Death of the Net on VeriSign CEO on Commercializing the Internet · 2003-10-17 08:40 · Score: 5, Interesting

And that will probably happen. We're at the point now where it's starting to get a little painful for people who step outside of the black-and-white vision of the Net that businesses tend to have. People like me, for example, who run our own mail server at home. AOL won't listen to my mail. Why? Because I'm residential. A residential user should be sending mail through a business, or so AOL thinks.

That hurts a bit, but my reaction is to say that AOL doesn't need my mail. But what happens when ISPs start to enforce no-server limitations? What happens when governments start to enforce them?!

The same thing with name service. There are already several alternate roots, and they will only become more popular as Verisign pushes the "get the roots out of the hands of the accedmics" attitude.

Eventually, this will lead to healthy competition between the "subculture nets" and "The Internet" (we all know there's no such thing as The Internet, right? that it's just a generic term that we use to refer to consumers of IPV4 address space).

I'm hoping that wireless networks will eventually replace the default "Internet" that we've known with a decentralized cloud of mini-networks with physical routing information collected dynamically. That will require some major changes in the technology and pervasiveness of its use, but it could easily happen, and would be far more reliable and "ownership proof" than what we have today (lost all the nodes between you and your target? pause a second to re-calculate your routes and continue... self-healing network topologies are not new tech, and many useful designs exist).

Let's take the root out of the hands of these corporate greed-mongers and give it back to the people who created the world's most powerful computing infrastructure in the first place: all of us!

Re:where I stopped reading on PHP Scales As Well As Java · 2003-10-17 07:31 · Score: 1

And here you see why language (toolkit, OS, etc) comparisons are usually useless.

It's not that tool A lacks the facilities of tool B, but that there are very few people well versed enough in tool A *and* tool B to give you any kind of reasonable comparison, and when they do, their voices are usually drowned out by the ignorant masses who are all too willing to compare their pet tool to a tool they've heard a bit about and/or used a few times.

Perl suffers from this a great deal, as does every language. For example, I've heard PHP users complain that Perl is terrible for constructing Web apps. I find this confusing until I hear that they think Perl Web apps should be written using mod_perl for server interaction and the CGI module for display... And they're right or would have been long ago. Circa 1998 those were the only tools at your disposal. Now of course, there are FAR higher level tools (Mason, bricolage, etc). I only use this as an example.

The point here is not a PHP vs Perl one or a PHP vs Java one. It's simply that you have to know your toolsets well in order to compare them meaningfully.

Re:Thoughts on XML on CNet on WinFS · 2003-10-17 04:04 · Score: 3, Interesting

This is not a new concept, and it does work out ok.

Your basic linear span of bytes file type becomes a subset of all possible data-structures. Structured file filesystems with various semantics have been around since the 70s (perhaps earlier). XML gives you a way that everyone can agree on to store the schema (of course you'd use a binary XML representation on disk, and have a few prefab schemas (like the DBM type key-value pair) hard-coded into libraries for speed).

This is a good idea, and perhaps one of the few places that I've heard people talk about using XML at a low level that I would agree with in princable.

Of course Microsoft will get it wrong. Because they're idiots? Not at all, there are a lot of bright people there, but Microsoft's priorities are set by their largest customers and for those customers and for marketing reasons, they make some truly AWFUL descisions, like consolodating the Win32 API and throwing away the multi-tiered, user-space service approach that NT was originally supposed to have on top of HAL and the NT microkernel. That was done because MS saw a need to give their largest base of developers (corporate in-house mostly) as close to a seemless transition from Win3.1 as possible, and for no real technical reason that had to do with NT.

That ruined what was likely to to have been one of the coolest pieces of software that Microsoft would have ever produced. NT (now the heart of XP and Longhorn) is still a cool OS at its core, but as I expect to happen with this filesystem, it was so hobbled by the needs of their business customers that it took a decade to extract any real value from that.

I'm not MS-bashing. I'm a Linux/UNIX/BSD user, but I'm willing to accept that good developers work on all sorts of software, open and proprietary. The problem is that the larger a software business is, the less voice the developers have.

Personally, I'm waiting to see where Reiser goes...

Slashdotted on Project Gutenberg Publishes 10,000th Free eBook · 2003-10-16 08:58 · Score: 1

The site is slashdotted, could someone post the books here, please? ;-)

Re:more please on Microsoft Patents Your Local Weather Report · 2003-10-15 05:15 · Score: 1

Actually, just the opposite is true. Bad patents are creating an "old boys club" of sorts among established companies where mutual assured suing-into-the-ground prevents most patent suits, but when an young upstart moves in and starts taking market share, they can be shut down with a landslide of patent suits over technology that that company needed to get started.

This is preventing innovatin and growth in the market!

Re:Mebbe learn to write a bayesian filter? on Another Whack at Spam · 2003-10-14 08:44 · Score: 1

I lost your email address BTW so send me some more mail

Will do. It was a good conversation.

Not to be picky, but unless this is a change in the new version of SA, most people have been a reporting 0.06% or worse FP rate

I'm not sure what the numbers are in development versions these days, but SA 2.60 was just released, and the tests that are used centrally show 0.02-0.06% FP rate depending on corpus... however, I see a much lower rate in practice. My explanation for that is that the database of spam and non-spam used by SA for evolving the scores is populated by a statistically improbable number of worst-case messages submitted specifically to point out false positive test cases.

So, while SA is nominally only getting around 0.05% success on average, it's really a much rosier picture, and here's the really nice part: SA gets this rate out of the box, before Bayes has a chance to really train up.

I still think that using Bayes instead of a genetic algorithm would be a better way of scoring messages. The genetic algorithm has some serious problems in terms of reacting to changes in score makeup (e.g. a test is removed or a whole class of tests cannot be run) which have been band-aided for now, but the long term solution is to have a more dynamic reaction to changes in rule behavior.

There are also a lot of tokens that are spotted via regular expression that could probably just be turned into message tokens to be handled by Bayes the way you suggest.

I think the perfect arcitecture for spam analysis lies somewhere between SpamAssassin and DSPAM, but I have to admit I've not yet looked into tangential things like inncoculation.

I don't think the buzzword 'Bayesian' is going to solve anything (on a side-note we implemented three different algorithms into DSPAM lately including Chi-Square).

Agreed. I told a co-worker that SpamAssassin was using Bayesian analysis and he groaned, saying that naive Bayes was just the worst case scenario that you apply when you don't understand your data. I had to point out that a) SA uses a variant of Chi-Square not naive Bayes and b) SA uses a whole heck of a lot more analysis tools than just Bayes. Buzzwords don't solve anyone's problems, and I think spam filtering is probably the best place to get that point across to naive users.

Re:Mebbe learn to write a bayesian filter? on Another Whack at Spam · 2003-10-14 03:55 · Score: 1

So that's the first two... The others are equally (if not more) important. Your false positive rate is admirable. It took SA a long time to get down into that range, so any other tool that does it meets one of my benchmarks. Still, how long it takes you to get there, and how much pain new users suffer to achieve that is important, especially when new users are probably the ones who need to get zero false positives the most (e.g. students who have just started at a college and are getting somewhat suspicious looking course information via email or new employees at a company). It's a limit thing, and you never achieve 100% of your ideal, you just move toward it, along some sort of function. You need to figure out the integral of that curve over t=0 to t=x where x is some "reasonable" amount of time for a user to come up to speed. Then you can compare the first two values start, limit and integrals to determine how good they are.

DSPAM's author (was that you? I don't know Slashdot IDs, sorry) and I have spoken about this a lot, and he actaully added a number of "pre-processing features" to DSPAM that make it sort of an inside-out SpamAssassin. Where SA has an overall scoring system and uses Bayes as just one of the tests to feed that, DSPAM uses Bayes AS the scoring system, and has some rules (like header analysis and he was going to add some blacklist support last I heard) as Bayes tokens. It was interesting that he chose this route, as I was trying to get him to consider integrating with SA as a replacement for the existing Bayes in SA (Bayes in SA is fairly reasonable speed-wise, but DSPAM is faster). He chose instead to add more "SA-like" features to DSPAM, which is cool too.

In the end, I'm glad there are at least two tools that are taking the "no one solution" approach. I don't buy the idea that any pure-word-analysis approach is going to work, but those who understand that Bayesian analysis is just a statistical tool can make it work far harder for them than that.

Re:Mebbe learn to write a bayesian filter? on Another Whack at Spam · 2003-10-14 03:01 · Score: 1

Keep in mind that 99.9% success on identifying spam is meaningless on its own.

When you talk about succes in spam filtering, you need to talk about several statistics: 1) false positive rate 2) false negative rate 3) both initial and limit values for the above 4) function for rate of change.

That is, you have two (somewhat independant parameters) and they are going to start at a "less acceptable" rate (e.g. perhaps 5-10% false negative and 1-2% false positive) and there will be some function that describes how fast it approaches its "optimal" performance (e.g. perhaps 0.1% false negative and 0.001% false positive).

This is where something like SpamAssassin shines. It comes with a huge database of rules for identifying spam by looking at the text, blacklists, header formatting, received chains, etc, etc. SA's Beysian component is then trained automatically based on the results of the rest of the ruleset. In comparisons with other systems, it seems that this sort of automatic training is significantly more accurate than self-training off of itself (as most Beysian systems do). Over time a user can tune up the Beysian part so that its results are weighed higher than the rest of the rules, thus giving you a more "Beysian" system and less static. Personally, I find that a combination of global checksumming (Razor2) combined with a large number of blacklists and the Beysian part of SA will yield the best results after it has "gotten used to" a user's mail.

SA 2.60 has just come out. Give it a look, you may be impressed!

Re:Short summary on Linux File System Shootout · 2003-10-09 00:08 · Score: 1

Reiser seems to be caching a lot of flak

Ah, that explains the CPU overhead then.... ;-)

Re:SVG a Huge plus on GIMP goes SVG · 2003-10-08 05:31 · Score: 1

Heh.

No need to be rude. What you're getting at is right, but irrelevant. You can express the entire RGB color-space in a well defined way in CMYK.

What you're saying is that for any given non-linear conversion there are "compromises" made, and that's 100% true, but totaly ignorable for the cinepaint folks. They don't care about that because they have a job to do: convert RGB images to 8-bit tiff CMYK output.

That conversion is well defined, but given an 8-bit RGB image to start, there is some unfortunate slop that occurs in terms of bridging color-gaps. When you start with a 24-bit-per-channel RGB image you have a much more realistic conversion result in 8-bit CMYK.

THAT is what the cinepaint folks want to take advantage of.

To quote them,

Because CinePaint is 32-bit, RGB to CMYK is a down-conversion and doesn't crush color like other programs. With 8-bit programs 3-channel RGB (24-bit) is up-converted to 4-channel CMYK (32-bit). No need to work natively in CMYK in CinePaint -- simply convert from 16-bit per channel RGB (48-bit).

Feel free to follow the link in my original post....

Re:WBXML on Frontiers: A New Xlib Compatible Window System · 2003-10-07 16:51 · Score: 1

By "higher level in the protocol" I meant that existing features should have hooks to better support the use of hardware acceleration. To have an entirely separate hardware acceleration extension kind of misses the point.

Still, it's nice to see tha this has been done. Adding extensions to X was originally intended as a way to test features that would be later integrated into the core protocol.

Re:SVG a Huge plus on GIMP goes SVG · 2003-10-07 16:47 · Score: 1

Heh. Try re-reading. They're not trying to map into the entire CMYK color-space, they just want to save RGB images.

RGB maps into CMYK just fine, but at the bit-depth that Gimp uses, it doesn't.

Re:SVG a Huge plus on GIMP goes SVG · 2003-10-07 04:26 · Score: 1

Hmmm, an interesting point to follow up my post: cinepaint does not have a strong push to do CMYK, according to their roadmap. Instead they cite their primary advantage over The Gimp (32-bit-per-channel color) as negating this need. They can output 8-bit-per-channel CMYK without "crushing" colorspace. Nifty.

Re:SVG a Huge plus on GIMP goes SVG · 2003-10-07 04:20 · Score: 1

I *think* you are incorrect (based on my reading of those docs, long-standing usage and development for The Gimp, and my recent usage of the 1.3.x series).

As I understand it there are several problems surrounding CMYK and you're blurring them a bit:

Output of RGB images to CMYK (e.g. for printing or saving to CMYK formats) -- Gimp has had this capability for a long time, but it's not very good, and continues to be "better but sub-optimal". Why? Because the best known ways of doing this are patented, mostly by Adobe.
Selecting colors via CMYK -- Gimp has long had this feature, but you're really selecting in the RGB colorspace via CMYK controls.
Decomposing an image into CMYK layers for CMYK-process printing -- long standing feature, as well implemented as possible given patents.
Native use of CMYK color space for images -- I'm not sure, but it's my impression that this is still lacking. At least that was the case circa 1.3.18, and I don't see anything in this page you referecne (collected change logs) that indicates that it has changed. This would really be a 2.0 feature as it requires massive changes to the way image data is stored and managed, not to mention overhauling most of the plugins to deal with this data.

I think cinepaint (ne "film gimp") is also working on these features, but it's not clear to me how far they've gotten. Most of the cinematic use of digital artwork is, AFAIK still RGB, so it's not likely to be a huge priority.

Re:WBXML on Frontiers: A New Xlib Compatible Window System · 2003-10-07 03:23 · Score: 1

Good points.

Using WBXML is probably a bad idea, but I'll reserve judgement until I see it.

I would like to point out, however, that SVG (and any other vector graphics system) is NOT inherently incompatible with the X protocol. In fact, X supports several scalable, vector-based formats today including PostScript and SVG.

These formats are supported via standard (or in the case of SVG, proto-standard) extensions to the X protocol. To say "you can send your foobar-windowing-system client to any X server that supports X extension" is much better than not supporting X at all, and indicates to me a desire to produce a system that is usable, which is usually the first step in producing a system which is USED.

I'm not saying it's all easy, and I'm not sure that X needs to be replaced (so much as the X protocol should be revised to match modern hardware and usage). The XML buzzword-compliance worries me, but I'll try this software out when/if it's usable and judge for myself.

I'm convinced that the X protocol could be modified to suit everything that we need today, though. There are several things that we need: 1) more hardware acceleration features at a higher level in the protocol and better OpenGL support 2) a slightly higher level set of widgets beyond XWindow ... not GUI toolkit widgets, but the primatives on which they are most often built 3) much, much better text drawing 4) GC and XDrawing feature revamp to support antialiasing, vector graphic primatives (not all of a behemoth like PostScript or SVG, but again the most commonly used primatives) and perhaps some basic transformations 5) an overhaul to the XEvent structures, ICCCM, Atoms and D&D to give a unified event/data/management protocol.

That would clearly be incompatible, but X12 has been long anticipated, and certainly any server that spoke this protocol could also speak X11. Also, there's no reason that X11 apps could not compile against a compatibility layer.

You could probably even mock up a WBXML layer in front of X12 if you really wanted to program that way (or if you were writing your toolkit in a high-level language when XML DTD processing is native and much easier than talking to a binary protocol).

Best of all worlds!

Re:A little more detail please on SGI Compares Linux & System V Source Code · 2003-10-06 07:16 · Score: 1

Thanks for the link! And that does make it clear that they specifically cite all of the code that was the same. Read the section carefully.

They talk about two places in the code (by filename and function) and then say that those two comprised the only common code they could find, and that was 200 lines.

Re:A little more detail please on SGI Compares Linux & System V Source Code · 2003-10-06 07:14 · Score: 1

I did use google, and didn't find anything (either I used search criteria that weren't as effective as yours or google's results have changed since I tried). I also searched SGI's site extensively and didn't find it.

Thankfully someone provided the exact link (why didn't you?)

I don't ask questions on Slashdot without checking first. That would be foolish.

Re:A little more detail please on SGI Compares Linux & System V Source Code · 2003-10-06 06:11 · Score: 1, Interesting

I'm not sure to what extent they have or have not done this. I can't fine SGI's original "open letter" anywhere, only this article that slashdot references that in turn talks about and quotes from the letter.

Anyone have the letter?

I could not find it on SGI's site.

Re:Didn't see it on MPAA Ruins Own Films As Anti-Piracy Measure · 2003-10-06 06:06 · Score: 1

Ah good point. I knew it was illegal for a TV show to contain text or images that were inserted for less than 1/10th of a second (I could be off on the exact number, but I know at least one show that put something in for 4 frames where they wanted it to be 1 because that made the lawyers feel safe, since 3 was technically legit...).

I did not know that it was limited to broadcast television though. Thanks for the correction!

Didn't see it on MPAA Ruins Own Films As Anti-Piracy Measure · 2003-10-06 04:35 · Score: 2, Informative

I've seen a couple of these films, and I did not see this. I'm wondering if it's just a single frame (BTW that makes it illegal in the US) or if it's only in certain theaters....

Slashdot Mirror

User: ajs

Comments · 4,773