I'm no cryptologist, but with one-way encryption (MD5 for example), my understanding is that you only have brute-force as an option. In that circumstance, how would having 570PB of information pre-stored be a faster solution than simply using 9 nested loops to generate them on the fly? Well, that's what file systems do...
Let's say your password MD5 sum is "0123456789ABCDEFFEDCBA9876543210". I have my full set of MD5 rainbow tables on my giant SAN. I type "cat/rainbow/01/23/45/67/89/AB/CD/EF/FE/DC/BA/98/76/54/32/10.txt" and it prints your password. The system reads a total of 16 directory contents (a few milliseconds each) and it gets to the file with your password.
As for the storage, it's not just 94^9, as that's only passwords that are exactly 9 characters long. You also need 94^8, 97^7,... and it would be that many files, not that many bytes.
"The two enemies of the people are criminals and government, so let us tie the second down with the chains of the Constitution so the second will not become the legalized version of the first".
What is the difference between governments and organised crime?
Most developers are, like you, writing bespoke software that never leaves the company it was written for. You are not releasing your proprietary software to the public in the hope of competing with open source software.
On the other hand, open source software can probably help you get your work done faster. With projects like GCC, GNU binutils, Eclipse, Apache and MySQL or PostgreSQL, you don't need to pay huge licensing fees to proprietary software developers in order to get basic tools for software development and creation.
Grepping anything out of even a modestly sophisticated XML document is pretty much unreliable and generally slow.
If by "grepping" you mean "searching for", then that's not the case. Just use an xquery. For example, "xml sel -t -c//title *.rss" will print all the title tags in all the *.rss files. That's using the xmlstarlet package. If you don't like the syntax of the command, use another package.
If by "grepping" you mean "use the line-oriented grep(1) regexp matching tool", then yes you're going to run into difficulties working with data that isn't just line oriented. That's why UNIX admins contort everything into line-oriented data, so they can run their precious sed, grep, sort and uniq on it. They take 2D data, munge it into 1D line data, grep it, then unmunge it. They could just act on the data directly (XML or not), but then they'd be programmers.
You have to wade through the nested close and open tag, balance all nested quotations
Just use xpath. That's what it was invented for. You get the document parsed for free, and you can match against any element of its structure or content.
let's take a for instance. Suppose I was looking for all the hash entries that has the key "highschool".//hash[@key='highschool']
would match any tags (and their contents), but not key="your highschool" or anything else. Isn't that easier than pissing about trying to come up with a regexp that parses the data format itself, instead of just searching the data and letting the tool do the work of parsing the format?
In yaml you would grep like this: grep "^\s*highschool\s*:"
I don't think you would. Vanilla grep doesn't have Perl's "\s" whitespace markers. Perhaps you mean "[\n\r\t ]"
Define an XML format which lets you have any number of a top level element, then just keep appending like you would CSV. In most languages, you can generate XML to a file or into a buffer. Just write the XML to your buffer, and then append the buffer to your file.
Of course, that's presuming you need/want XML output for whatever you're appending to. Do you? If it's just an internal format and it's not for interchange with any other applications, there's no real need for XML (or CSV, or any format).
Well, sorry about the hydro example. You're right that the sun is the original source, although it also needs the gravity for the system to work. But I'm fairly sure about tidal power being gravity alone.
The moon is orbiting the earth due to gravity alone, it's not driven by or replenished by the sun. Anything we extract from that source will slow the moon down by a tiny amount, eventually bringing it to a stop. So it's a stored energy source that we can extract power from. But then, that's exactly the same as the sun! It too will eventually run out of hydrogen to fuse into helium.
In fact, several of those items have to be replaced in XML too.
Yes, and there is ONE SINGLE STANDARD THAT EVERYONE AGREES ON for how to quote special characters. You can actually be conforming or non-conforming to this standard. You can't just make it up on the fly and be right no matter what you do, like you can with CSV.
But that's not "CSV", that's "OpenOffice.org's CSV", and it's different from "Foo 2.0's CSV", "MegaApp's CSV" and many other differing, incompatible CSVs out there.
For example, how should I store the following three columns of data? Column 1: Hello"there Column 2: Hello there Column 3: Hello,"'",\
might be one answer. But it's only one answer, and I just invented it now. It is not the only answer ever implemented in CSV. It is not the canonical, centrally agreed standard answer like exists for XML parsing. THERE IS NO CANONICAL CSV STANDARD. Nowhere can you tell someone writing "CSV" that they're writing it badly or parsing it badly, because there's NO STANDARD.
As another example, you could choose to quote doublequotes by doubling them:
You might choose to quote doublequotes by using singlequotes for cells which have doublequotes, and simply break if the cell has both singlequotes and doublequotes. And then you could ship this software and make anyone who wants to interact with your "CSV" output have to write a special case for you.
You might want to never consider that data might include newlines, so just emit them in the clear, and make the data indistinguishable from two seperate rows. This can only be resolved by knowing how many columns should be present, and to read on past newlines until you have enough columns.
Even "professional programmers" can't anticipate how a future program might implement CSV. There are hundreds of possible ways to escape data, and everybody has chosen a different one. No standards.
"Professional programmers" show their professionalism by demanding full and accurate specifications up-front, rather than spending time implementing something only to be told it's "wrong". You want CSV input or output? OK, whose CSV?
I have, and I can tell you that it's a waste of time.
It amazes me how something that looks so simple can have so many corner cases, and how they can be solved so differently by different implementations.
CSV is fine if you want to store data that has no quote marks, commas, carriage returns or linefeeds. For everything else, please use a better specified format, preferably one that has a formal definition. Like XML, for example.
You're kidding, right? There were a whole string of bombings committed by Suffragettes in the 1910s, particularly by the Women's Social and Political Union (WSPU).
Gravity is a force that we exploit as a source of potential and kinetic energy. How do you think weight-driven clocks, hydroelectricity, or tidal energy work?
I really wish that, instead of running "person claims they can do the impossible" stories, newspapers would run exciting stories explaining the basics of thermodynamics, what energy is, and nuclear physics and the background behind Boyle's gas law.
These days, we're wondering when the oil's going to run out, and we need to look to how we get the most energy out of our Sun and gravity - the only real sources of energy on this planet, all other sources being derivatives.
Why can't we have a more intelligent public? It wouldn't hurt for them to have an understanding of the world around them, maybe they'd be less likely to fall for scams based on breaking the observed laws of physics.
Yes, it is. It's predominantly admins that are running amok through the articles and setting them up for deletion. If it's not Dragonfiend purging comics articles, it's Improv deleting all the articles on brand names.
Being an administrator on Wikipedia is a serious position of responsibility, yet 12 year olds are free to get themselves voted into the clique by ingratiating themselves with other admins and doing nothing but minor edits. If they actually knew the effort needed to research, source, verify and compose an article, perhaps they'd be less eager to delete it.
And when they run rampages on Wikipedia, abusing their position either to delete or force particular content into an article, they usually get away scot-free. If they're admonished, they're usually free to leave and come back under another name. Nobody knows who they really are. The people who do the same thing without becoming admins first are labeled "vandals" and indefinitely banned.
In a world without copyright, free software has no protections. Evilcorp can take your code, extend it, release it closed source and give you the finger because you have no claim of ownership over it.
In the current model, Evilcorp can abuse those who let you take without giving, but have no such law forcing them to let people take without giving. That's what a copyleft world would force upon them.
In a world without copyright, Evilcorp have no "ownership" of your code either. Or their code. Or anybody's code. Therefore, keeping source code hidden doesn't benefit anyone, because "shrinkwrapped software" is economically unviable. All they can do is sell their future labours, because anything they've already created now belongs to everybody. Seeing as everybody owns it, and they're not getting paid anything once it exists, they might as well release the source code, it's just a waste of money trying to hide it.
The cost of developing software will remain static, or will drop due to the pooling of resources now that everybody's in the same boat. It still costs X million to develop Autocad, no matter what the retail price is or how many copies you sell. The price per unit and the business model is what will change.
Coca-Cola's "secret recipe" is basically just to add massive amounts of sugar.
Nonsense! All fizzy drinks are primarily sugar water (or HFCS water in the USA). What distinguishes them, and gets people to buy one rather than the other, is marketing and flavour. Coke retails at over 7 times the cost of generic cola. How the fuck do you think they get anyone to buy that?
Firstly, extensive brand marketing. This is the main reason people buy Coke and not generic cola. Generic cola doesn't sponsor your favourite sports, associate itself with your favourite bands and saturate the media with feel-good advertising.
Secondly, consistent product. Look at the New Coke debacle if you somehow think this is unimportant.
Coke's "secret recipe" is the difference between generic cola and "tastes exactly like Coke" generic cola. You can trademark a "dynamic ribbon device", but you can't trademark white writing on red background, and you can't trademark a flavour. Now do you see why they protect it as a trade secret?
Personally, I think Rails has a way to go. I agree with you that it should auto-detect a lot more of the database schema and require much less manual configuration in the model layer and better parsing of database errors. Rails' intention is Don't-Repeat-Yourself by means of making your database schema a first-class citizen in the Ruby language. So future Rails releases will be more along those lines.
I think the reason it doesn't right now because it was designed to work with MySQL MyISAM tables, which are not really relational database tables.
I try to judge computer languages/framework by their authors' intents rather than what its novice users think about them. So I approve of Rails, even if it could do more to teach people about database design, because it successfully teaches people about the MVC pattern and Don't-Repeat-Yourself.
I've now written two large business applications in Rails. I did the UML modelling first. Then I wrote a fully constrained RDBMS schema, normalised to 3NF, using the Rails naming conventions. Then I wrote the Rails app on top of the rigorously constrained database.
If you want multi-key uniqueness constraints, just define it in your database already! Why do you think Rails prevents you from configuring your database layer?
I let Rails save me hours of backbreaking labour writing conventional SQL queries. Then I use the completed application, identify the query bottlenecks (thanks to Rails' built-in profiling) and re-write the slowest of the dumb auto-created SQL using hand-written SQL, which I can get to using find_by_sql, finder_sql, etc. Rails lets you put your own SQL into the application almost anywhere.
Jayjg would doubtless do anything SlimVirgin tells him to do. Jayjg has oversight capabilities. Haven't you seen them in action together? Slim could ask one of the other 26 oversight users, but as Jayjg's on that list, why even bother with them?
Jimbo's a fucking idiot. He thinks it's perfectly OK to employ people who lie about their credentials and use those lies as leverage. He lies about being the sole founder of Wikipedia and orders people on IRC to do his dirty work for him so it doesn't look like he requested it. Why do you trust anything this man says?
I noticed you're NOT an oversight user. Therefore, DUH, of course you're not going to see the Slimv edits deleted using oversight. Only the 27 oversight users get to see those, and they're not going to tell anyone else about them.
The oversight feature, or "the memory hole" to use Ninteen Eighty-Four parlance, is what really, really convinces me to stop contributing to Wikipedia. The cliques of abusive administrators, protecting one another from the consequences of their power trips; they think they're above the law, and on Wikipedia they ARE above the law. The reason criticism of Wikipedia appears mainly off-wiki is because it's a place where the ruling elite of Wikipedia have no jurisdiction or convenient super-secret delete facilities.
That means: if you want to release a binary version of your software, you have to compile and package it for each and every distribution you wish to support.
Firstly, this is an irrelevant concern for the end-user. They don't even know what binary releases are, let alone need to create one. All they need to know is how to operate their particular distribution's "add new software" function.
Secondly, it's an irrelevant concern even for the developer. As a developer, I can say that I've only ever released one binary version of my software, and I didn't really need to do that. Each Linux distribution has its own team of packagers who package my software for their distributions. If their distribution standards demand a manual page entry, and I didn't write one, they write one. And so on. They also handle the support for their users and pass "upstream" to me the small fraction of the bugs and feature requests that would concern me directly.
There is very little different between different Linux distributions. They can all pretty much be encapsulated by the GNU build standards, which are automatically implemented in full when your software uses autoconf and automake.
This is how free software works. It's a complete waste of time for each software developer to handle their own support for users on different platforms, when those platforms are all more than happy to provide free support for your software.
All I need to do as a developer is follow compatibility standards, which is what I should be doing anyway. Following compatibility standards doesn't just mean Linux, it also means compatibility with FreeBSD, OpenBSD, NetBSD, Solaris, HPUX, IRIX, Xenix, BeOS, Cygwin and other forms of UNIX.
Just as any Linux software vendor has to build and test packages for a dozen or more different distributions.
I'm a Linux software vendor. You're not talking about me, or the majority of Linux software vendors. You're talking about the absolute minority: proprietary software vendors, who refuse to take the free support offered to them by each Linux distribution. But if they absolutely insist on that, then all they have to do is say "supported only on Red Hat Linux and Ubuntu Linux on 80x86 based computers", which is pretty much what most proprietary developers say, although it's usually in the form of "supported on Windows XP only" or "supported on Mac OS X 10.3 or later only".
The basic premise is that a specific sequence of symbols, based on probability, boils down into a thin fractional range. The Wikipedia article on arithmetic coding explains it quite well.
When you add a symbols, sometimes the binary representation is not precise enough to represent that range, so you add a bit (or several bits) to make it more precise. At other times, the binary fraction is already precise enough to represent the updated range after you've added a symbol, and in those cases you don't need to add any more bits to the output value.
Yes, they convert JPEGs to.SITX files. But the data inside the.SITX file is basically that JPEG data unpacked and then repacked with an arithmetic coder. The actual method used is called "Arsenic" and uses arithmetic coding, RLE and a block sorting compressor based on the Burrows-Wheeler Transform.
Let's say your password MD5 sum is "0123456789ABCDEFFEDCBA9876543210". I have my full set of MD5 rainbow tables on my giant SAN. I type "cat
As for the storage, it's not just 94^9, as that's only passwords that are exactly 9 characters long. You also need 94^8, 97^7,
That's exactly what I thought as well.
Are hitting on women in bars, playing the bongos and searching for Tuva all indicators for scientific success?
What about having motor neurone disease or ditching your wife and marrying your nurse?
"The two enemies of the people are criminals and government, so let us tie the second down with the chains of the Constitution so the second will not become the legalized version of the first".
What is the difference between governments and organised crime?
One is organised.
Most developers are, like you, writing bespoke software that never leaves the company it was written for. You are not releasing your proprietary software to the public in the hope of competing with open source software.
On the other hand, open source software can probably help you get your work done faster. With projects like GCC, GNU binutils, Eclipse, Apache and MySQL or PostgreSQL, you don't need to pay huge licensing fees to proprietary software developers in order to get basic tools for software development and creation.
Grepping anything out of even a modestly sophisticated XML document is pretty much unreliable and generally slow.
//title *.rss" will print all the title tags in all the *.rss files. That's using the xmlstarlet package. If you don't like the syntax of the command, use another package.
//hash[@key='highschool']
If by "grepping" you mean "searching for", then that's not the case. Just use an xquery. For example, "xml sel -t -c
If by "grepping" you mean "use the line-oriented grep(1) regexp matching tool", then yes you're going to run into difficulties working with data that isn't just line oriented. That's why UNIX admins contort everything into line-oriented data, so they can run their precious sed, grep, sort and uniq on it. They take 2D data, munge it into 1D line data, grep it, then unmunge it. They could just act on the data directly (XML or not), but then they'd be programmers.
You have to wade through the nested close and open tag, balance all nested quotations
Just use xpath. That's what it was invented for. You get the document parsed for free, and you can match against any element of its structure or content.
let's take a for instance. Suppose I was looking for all the hash entries that has the key "highschool".
would match any tags (and their contents), but not key="your highschool" or anything else. Isn't that easier than pissing about trying to come up with a regexp that parses the data format itself, instead of just searching the data and letting the tool do the work of parsing the format?
In yaml you would grep like this: grep "^\s*highschool\s*:"
I don't think you would. Vanilla grep doesn't have Perl's "\s" whitespace markers. Perhaps you mean "[\n\r\t ]"
Define an XML format which lets you have any number of a top level element, then just keep appending like you would CSV. In most languages, you can generate XML to a file or into a buffer. Just write the XML to your buffer, and then append the buffer to your file.
Of course, that's presuming you need/want XML output for whatever you're appending to. Do you? If it's just an internal format and it's not for interchange with any other applications, there's no real need for XML (or CSV, or any format).
It's an informational guideline on what MIME data of type text/csv should contain, and it's ignored by the majority of CSV implementations.
Well, sorry about the hydro example. You're right that the sun is the original source, although it also needs the gravity for the system to work. But I'm fairly sure about tidal power being gravity alone.
The moon is orbiting the earth due to gravity alone, it's not driven by or replenished by the sun. Anything we extract from that source will slow the moon down by a tiny amount, eventually bringing it to a stop. So it's a stored energy source that we can extract power from. But then, that's exactly the same as the sun! It too will eventually run out of hydrogen to fuse into helium.
In fact, several of those items have to be replaced in XML too.
Yes, and there is ONE SINGLE STANDARD THAT EVERYONE AGREES ON for how to quote special characters. You can actually be conforming or non-conforming to this standard. You can't just make it up on the fly and be right no matter what you do, like you can with CSV.
But that's not "CSV", that's "OpenOffice.org's CSV", and it's different from "Foo 2.0's CSV", "MegaApp's CSV" and many other differing, incompatible CSVs out there.
For example, how should I store the following three columns of data?
Column 1: Hello"there
Column 2: Hello there
Column 3: Hello,"'",\
"Hello\"there","Hello \013 there","Hello,\"'\",\\\013"
might be one answer. But it's only one answer, and I just invented it now. It is not the only answer ever implemented in CSV. It is not the canonical, centrally agreed standard answer like exists for XML parsing. THERE IS NO CANONICAL CSV STANDARD. Nowhere can you tell someone writing "CSV" that they're writing it badly or parsing it badly, because there's NO STANDARD.
As another example, you could choose to quote doublequotes by doubling them:
"Hello""there","Hello \013 there","Hello,""'"",\\\013"
You might choose to quote doublequotes by using singlequotes for cells which have doublequotes, and simply break if the cell has both singlequotes and doublequotes. And then you could ship this software and make anyone who wants to interact with your "CSV" output have to write a special case for you.
You might want to never consider that data might include newlines, so just emit them in the clear, and make the data indistinguishable from two seperate rows. This can only be resolved by knowing how many columns should be present, and to read on past newlines until you have enough columns.
Even "professional programmers" can't anticipate how a future program might implement CSV. There are hundreds of possible ways to escape data, and everybody has chosen a different one. No standards.
"Professional programmers" show their professionalism by demanding full and accurate specifications up-front, rather than spending time implementing something only to be told it's "wrong". You want CSV input or output? OK, whose CSV?
I have, and I can tell you that it's a waste of time.
It amazes me how something that looks so simple can have so many corner cases, and how they can be solved so differently by different implementations.
CSV is fine if you want to store data that has no quote marks, commas, carriage returns or linefeeds. For everything else, please use a better specified format, preferably one that has a formal definition. Like XML, for example.
You're kidding, right? There were a whole string of bombings committed by Suffragettes in the 1910s, particularly by the Women's Social and Political Union (WSPU).
Gravity is a force that we exploit as a source of potential and kinetic energy. How do you think weight-driven clocks, hydroelectricity, or tidal energy work?
I really wish that, instead of running "person claims they can do the impossible" stories, newspapers would run exciting stories explaining the basics of thermodynamics, what energy is, and nuclear physics and the background behind Boyle's gas law.
These days, we're wondering when the oil's going to run out, and we need to look to how we get the most energy out of our Sun and gravity - the only real sources of energy on this planet, all other sources being derivatives.
Why can't we have a more intelligent public? It wouldn't hurt for them to have an understanding of the world around them, maybe they'd be less likely to fall for scams based on breaking the observed laws of physics.
Yes, it is. It's predominantly admins that are running amok through the articles and setting them up for deletion. If it's not Dragonfiend purging comics articles, it's Improv deleting all the articles on brand names.
Being an administrator on Wikipedia is a serious position of responsibility, yet 12 year olds are free to get themselves voted into the clique by ingratiating themselves with other admins and doing nothing but minor edits. If they actually knew the effort needed to research, source, verify and compose an article, perhaps they'd be less eager to delete it.
And when they run rampages on Wikipedia, abusing their position either to delete or force particular content into an article, they usually get away scot-free. If they're admonished, they're usually free to leave and come back under another name. Nobody knows who they really are. The people who do the same thing without becoming admins first are labeled "vandals" and indefinitely banned.
In a world without copyright, free software has no protections. Evilcorp can take your code, extend it, release it closed source and give you the finger because you have no claim of ownership over it.
In the current model, Evilcorp can abuse those who let you take without giving, but have no such law forcing them to let people take without giving. That's what a copyleft world would force upon them.
In a world without copyright, Evilcorp have no "ownership" of your code either. Or their code. Or anybody's code. Therefore, keeping source code hidden doesn't benefit anyone, because "shrinkwrapped software" is economically unviable. All they can do is sell their future labours, because anything they've already created now belongs to everybody. Seeing as everybody owns it, and they're not getting paid anything once it exists, they might as well release the source code, it's just a waste of money trying to hide it.
The cost of developing software will remain static, or will drop due to the pooling of resources now that everybody's in the same boat. It still costs X million to develop Autocad, no matter what the retail price is or how many copies you sell. The price per unit and the business model is what will change.
Coca-Cola's "secret recipe" is basically just to add massive amounts of sugar.
Nonsense! All fizzy drinks are primarily sugar water (or HFCS water in the USA). What distinguishes them, and gets people to buy one rather than the other, is marketing and flavour. Coke retails at over 7 times the cost of generic cola. How the fuck do you think they get anyone to buy that?
Firstly, extensive brand marketing. This is the main reason people buy Coke and not generic cola. Generic cola doesn't sponsor your favourite sports, associate itself with your favourite bands and saturate the media with feel-good advertising.
Secondly, consistent product. Look at the New Coke debacle if you somehow think this is unimportant.
Coke's "secret recipe" is the difference between generic cola and "tastes exactly like Coke" generic cola. You can trademark a "dynamic ribbon device", but you can't trademark white writing on red background, and you can't trademark a flavour. Now do you see why they protect it as a trade secret?
Personally, I think Rails has a way to go. I agree with you that it should auto-detect a lot more of the database schema and require much less manual configuration in the model layer and better parsing of database errors. Rails' intention is Don't-Repeat-Yourself by means of making your database schema a first-class citizen in the Ruby language. So future Rails releases will be more along those lines.
I think the reason it doesn't right now because it was designed to work with MySQL MyISAM tables, which are not really relational database tables.
I try to judge computer languages/framework by their authors' intents rather than what its novice users think about them. So I approve of Rails, even if it could do more to teach people about database design, because it successfully teaches people about the MVC pattern and Don't-Repeat-Yourself.
I've now written two large business applications in Rails. I did the UML modelling first. Then I wrote a fully constrained RDBMS schema, normalised to 3NF, using the Rails naming conventions. Then I wrote the Rails app on top of the rigorously constrained database.
If you want multi-key uniqueness constraints, just define it in your database already! Why do you think Rails prevents you from configuring your database layer?
When I need to do transactions, I use Rails' full support for transactions. There, that wasn't so difficult, was it?
I let Rails save me hours of backbreaking labour writing conventional SQL queries. Then I use the completed application, identify the query bottlenecks (thanks to Rails' built-in profiling) and re-write the slowest of the dumb auto-created SQL using hand-written SQL, which I can get to using find_by_sql, finder_sql, etc. Rails lets you put your own SQL into the application almost anywhere.
Jayjg would doubtless do anything SlimVirgin tells him to do. Jayjg has oversight capabilities. Haven't you seen them in action together? Slim could ask one of the other 26 oversight users, but as Jayjg's on that list, why even bother with them?
Jimbo's a fucking idiot. He thinks it's perfectly OK to employ people who lie about their credentials and use those lies as leverage. He lies about being the sole founder of Wikipedia and orders people on IRC to do his dirty work for him so it doesn't look like he requested it. Why do you trust anything this man says?
I noticed you're NOT an oversight user. Therefore, DUH, of course you're not going to see the Slimv edits deleted using oversight. Only the 27 oversight users get to see those, and they're not going to tell anyone else about them.
The oversight feature, or "the memory hole" to use Ninteen Eighty-Four parlance, is what really, really convinces me to stop contributing to Wikipedia. The cliques of abusive administrators, protecting one another from the consequences of their power trips; they think they're above the law, and on Wikipedia they ARE above the law. The reason criticism of Wikipedia appears mainly off-wiki is because it's a place where the ruling elite of Wikipedia have no jurisdiction or convenient super-secret delete facilities.
Firstly, this is an irrelevant concern for the end-user. They don't even know what binary releases are, let alone need to create one. All they need to know is how to operate their particular distribution's "add new software" function.
Secondly, it's an irrelevant concern even for the developer. As a developer, I can say that I've only ever released one binary version of my software, and I didn't really need to do that. Each Linux distribution has its own team of packagers who package my software for their distributions. If their distribution standards demand a manual page entry, and I didn't write one, they write one. And so on. They also handle the support for their users and pass "upstream" to me the small fraction of the bugs and feature requests that would concern me directly.
There is very little different between different Linux distributions. They can all pretty much be encapsulated by the GNU build standards, which are automatically implemented in full when your software uses autoconf and automake.
This is how free software works. It's a complete waste of time for each software developer to handle their own support for users on different platforms, when those platforms are all more than happy to provide free support for your software.
All I need to do as a developer is follow compatibility standards, which is what I should be doing anyway. Following compatibility standards doesn't just mean Linux, it also means compatibility with FreeBSD, OpenBSD, NetBSD, Solaris, HPUX, IRIX, Xenix, BeOS, Cygwin and other forms of UNIX.
I'm a Linux software vendor. You're not talking about me, or the majority of Linux software vendors. You're talking about the absolute minority: proprietary software vendors, who refuse to take the free support offered to them by each Linux distribution. But if they absolutely insist on that, then all they have to do is say "supported only on Red Hat Linux and Ubuntu Linux on 80x86 based computers", which is pretty much what most proprietary developers say, although it's usually in the form of "supported on Windows XP only" or "supported on Mac OS X 10.3 or later only".
The basic premise is that a specific sequence of symbols, based on probability, boils down into a thin fractional range. The Wikipedia article on arithmetic coding explains it quite well.
When you add a symbols, sometimes the binary representation is not precise enough to represent that range, so you add a bit (or several bits) to make it more precise. At other times, the binary fraction is already precise enough to represent the updated range after you've added a symbol, and in those cases you don't need to add any more bits to the output value.
Yes, they convert JPEGs to .SITX files. But the data inside the .SITX file is basically that JPEG data unpacked and then repacked with an arithmetic coder. The actual method used is called "Arsenic" and uses arithmetic coding, RLE and a block sorting compressor based on the Burrows-Wheeler Transform.
Yes, it recodes it into Huffman after extraction. Here is Aladdin's white paper on how they do it.