This is what the LGPL is for. For anyone who's done C/C++ coding off-web, this is almost a no-brainer. LGPL allows "linking". In WordPress's case, if the core were written under the LGPL (instead of the GPL, themes written from-scratch, only linking against documented API calls, ought to be in the clear. (I'd want to go back and re-read the LGPL again to be sure, though. For an interpreted language like PHP, there may be caveats based on whether #require_once lines or the like needed to be added to core code--but WordPress likes to put most of its configuration into a database, which might obviate that. Dunno; I never switched themes in WP.)
But, yeah, he's on crack if he thinks that written-from-scratch CSS files for a theme get sucked into the GPL. That's like saying an XSLT that works on a particular schema has to be GPL'd if it's used on a GPL'd XML file written to that schema spec. Another example might be requiring a STDIN->logic->STDOUT filter to be GPL'd if it operates on GPL'd data.
Of course, the age-old/. cry of IANAL applies very strongly here...
Apple wants Flash - and any other platform which can be used to create something resembling an application - to go away because those platforms allow others to target their precious without paying the ferryman.
So Apple is a Gollum/Hades geekship construct, their realm the world of the iPad/iPhone, and The One True Ring is on the other side of the river Styx?
Ah, I see. My comment on using markov models to refine matches (and get sensible resulting symbol sequences) still applies, I think.
My interest in IPA simply derives from its being an existing standard representation. Also, taking an approach like* double metaphone in converting written language to the same symbol set might go a good way to way to get source material for training the markov models.
* "like", in that multiple potential pronounciations are considered for each character sequence
That's one thing markov models are useful for; they help determine a symbol's probable meaning in a given context. Rather than randomly selecting a subsequent symbol based on the current symbol, you can estimate the current symbol's fit based on the last symbol seen.
And I intentionally used a phonetic hash I threw together in the key lookup. The script produced some cool output, but didn't do quite what I wanted to do.
Then I learned about Soundex. And then, even better, Metaphone. Better still, Double Metaphone. DM's benefit is that it returns multiple keys for a processed symbol, under the assumption that the symbol might be pronounced multiple ways. It was *almost* what I wanted, except it was still more or less limited to mostly-English words. I'd like to work with IPA, but whenever I asked about a library that attepts to take text and convert it to IPA symbols, I'm reminded that different dialects will say the same words different ways (engaging the vocal chords or not, for example.), and the same word may have a different meaning depending on how it's pronounced, which is also related to its context. A first-order markov model is likely to grant some self-correcting accuracy, though while a second-order or third-order model should do a decent job, they'd represent *huge* data sets.(When I was working with a 1st-order model, and considering moving to 2nd-order, I almost convinced myself to buy an SSD to dedicate to InnoDB.)
It seems obvious to me that you should be able to apply Metaphone's approach (a returned key for each possibility), and then use a markov model to refine which key has the most likely meaning in context. (Feeding it a language's dictionary with word/part-of-speech/IPA tuples would be most excellent)
As for speech recognition, aren't there any libraries or code bases out there that convert sound to IPA? It seems the most obvious solution. Heck, you could probably get away with some on-body sensors for more accurate detection of particular IPA symbols.
Incidentally, if you want the data and code I was playing around with, I put it here. Read the thirty or so lines of disclaiming comments before you complaint about it being a 65MB Perl script. (I didn't want to bother packaging multiple files, among other concerns.) LZMA compressed, so install the lzma package or grab 7zip, depending on your OS. Compressed, it's 6.4MB.
Sorry for the offense. I already mentioned that I misinterpreted your original comment. What I probably didn't mention was that the original post was in the middle of a 14-hour work day, and the one you just replied to came after six hours of sleep.
The biggest problem with writing a pure-JavaScript spellchecking tool would be providing the dictionary for it to work with, and pulling in that dictionary would add significantly to the page load time.
Of course, there are uses of JavaScript that don't require the code and data to be pulled from a network connection--in those cases, it should fairly easy to hack together one of your own by implementing a phonetic algorithm and/or using an edit distance calculation to identify plausible alternatives.
- Having to walk to places I would normally drive to
If you're the curious type, that can actually be a surprising amount of fun; when you're not rolling past at 35+mph, there's a lot of detail about your neighborhood that's harder to miss.
You're asking people to take a lifelong tax in order to start a business which may or may not last longer than a quarter, and that tax is cumulative with each business they attempt. Consider that the length of one's life is unknown (if it were known, insurance companies would be all over the guy handing out the "knowing"), and that seems like an awful lot to ask. Even a 30-year mortgage has an end date. Your suggested solution would be murderous on serial entrepreneurs.
Consider further that the email accounts and business records could live past the lifetime of the proprietor himself. That still leaves his former clients open to the same kind of fraud, particularly if the person/system rubber-stamping the payments isn't aware that the business that sent them the invoice no longer exists.
Also, if Bourdain's post wasn't directed at the OP, it should have started off with, "I don't know how to help you, but perhaps this should be a warning to..."
As it was, it read in a rather condescending tone.
Your use of the word "nominal" reminds me of this. The word "nominal" always left a foul taste in my mouth; it's like asking someone to give "only" some recurring amount. Aggregate that over a half-dozen someone's, and that recurring amount stacks up.
Let's say that the OP tries his hands at a few dozen businesses during his life. For every one of those domains, he's stuck with another recurring fee to manage. Even if the individual fee is low, it adds up.
Actually, kinda reminds me of the crap I cleaned off my hard drive this afternoon; tiny files can still fill up a drive, if you have enough of them.
If it simply allows them to pack more pixels onto a sensor without being able to collect accurate color data with fewer photons, then quantum film is absolutely worthless.
Not true. Existing digital cameras have noise, particularly at the higher ISOs. The more readings you take from a "pixel" in the frame, the more you can negate this noise by averaging it out. One way to increase the number of samples is to stack several readings--increasing your ISO level, more or less.
Another way to increase the number of samples is to scale your resulting pixel array down, so that a pixel and its immediate neighbors get averaged into the same pixel, drowning out more of the noise. So if you can increase sensor pixel density without losing per-pixel quality compared to other technologies, then you can take those additional pixels, blend them, and come out with a better-quality apparent pixel.
So you're really asking for distinct improvement on two fronts, when the two values can be converted.
So now the average Slashdotter might know what the "bloat" you're talking about is for: Making it so that the programmer doesn't require as much in-depth knowledge of his platform, or have to do as much of the mundane work himself.
(At least, the average article-reading Slashdotter. Needed to clarify that before someone else points out the joke...)
(and apparently Slashdot thinks that pre-mixed-case architecture code is like yelling. i'm inclined to agree, but i've added this bit of completely-lowe-case code to the end to be a little less lame.)
(yes, that's right. in order to be less lame, i need to avoid using caps where occasionally appropriate. whatsnxt? avd xcssv vwls?)
It's paranoia and naiveté like yours that led me to stop hanging around here so much.
Paranoia in that everything company X does is evil or has an inappropriate or immoral ulterior motive. Naiveté in that you don't stop to recognize that the not all of the developers who work for an institution are going to output code of the caliber of its most senior, experienced and/or knowledgeable developers, nor can code review and automated tests catch all of the problems and gotchas known to computer science, academia and the body of professional programmers.
So can the "the devil is in the details" crap; you don't know what you're talking about. Building a complex software package that takes into account every possible detail in both process and implementation is impossible in any environment currently available for consumer software and general computing hardware. Just when you think you've got everything covered, nature builds a vendor builds a buggy component, security specialists discover a flaw in the way you learned to write your software, nature builds a better idiot, or a piece of a radioactive isotope in a memory module emits a beta particle, just to ruin your day.
The REAL solution to your problem is for everyone to abandon the dumb-as-shite "www" prefix.
Why bother with www.example.com and example.com? Get rid of it. Anyone who still puts "www." on their business cards is a dufus.
REAL solutions to immediate problems don't depend on the rest of the world changing to suit my needs. Also, the fact remains that there are links out there that point to "http://www.rosettacode.org/w/index.php?something_or_other", not all of those links will (or can) change, and I would be an absolute fool to knowingly break them, if I want people to visit RCo via referral traffic.
A quick guess? Identifying unique sites by domain name, rather than by IP address, and either the bot or server not respecting HTTP 301 redirects.
With Rosetta Code, I once had www.rosettacode.org serving up the same content as rosettacode.org. My server got pounded by two bots from Yahoo. I could set Crawl-Delay, but it was only partially effective; One bot had been assigned to www.rosttacode.org, while another to rosettacode.org, and they were each keeping track of their request delay independently. I've since corrected things such that www.rosettacode.org returns an HTTP 301 redirect to rosettacode.org, and have was eventually able to remove the Crawl-Delay entirely.
I've since worked towards only serving up content for any particular part of the site on a single domain name, and have subdomains such as "wiki.rosettacode.org" redirect to "rosettacode.org/wiki", and "blog.rosettacode.org" to "rosettacode.org/blog". Works rather nice, though it does leave me a bit more open to cookie theft attacks.
...and then sued if anything happened because of it.
Even that's a tricky path to cleanly draw; How can you know that that USB keyfob didn't have something on it that exploited a flaw in the FAT filesystem driver, and leave a clock-triggered piece of malware? Safest bet for a known incident is to wipe and reinstall. There are ways of doing such things automatically.:)
I think you're looking for the Slashdot Personals.
This is what the LGPL is for. For anyone who's done C/C++ coding off-web, this is almost a no-brainer. LGPL allows "linking". In WordPress's case, if the core were written under the LGPL (instead of the GPL, themes written from-scratch, only linking against documented API calls, ought to be in the clear. (I'd want to go back and re-read the LGPL again to be sure, though. For an interpreted language like PHP, there may be caveats based on whether #require_once lines or the like needed to be added to core code--but WordPress likes to put most of its configuration into a database, which might obviate that. Dunno; I never switched themes in WP.)
But, yeah, he's on crack if he thinks that written-from-scratch CSS files for a theme get sucked into the GPL. That's like saying an XSLT that works on a particular schema has to be GPL'd if it's used on a GPL'd XML file written to that schema spec. Another example might be requiring a STDIN->logic->STDOUT filter to be GPL'd if it operates on GPL'd data.
Of course, the age-old /. cry of IANAL applies very strongly here...
Apple wants Flash - and any other platform which can be used to create something resembling an application - to go away because those platforms allow others to target their precious without paying the ferryman.
So Apple is a Gollum/Hades geekship construct, their realm the world of the iPad/iPhone, and The One True Ring is on the other side of the river Styx?
Thanks! That makes it all so much clearer!
Ah, I see. My comment on using markov models to refine matches (and get sensible resulting symbol sequences) still applies, I think.
My interest in IPA simply derives from its being an existing standard representation. Also, taking an approach like* double metaphone in converting written language to the same symbol set might go a good way to way to get source material for training the markov models.
* "like", in that multiple potential pronounciations are considered for each character sequence
That's one thing markov models are useful for; they help determine a symbol's probable meaning in a given context. Rather than randomly selecting a subsequent symbol based on the current symbol, you can estimate the current symbol's fit based on the last symbol seen.
And I intentionally used a phonetic hash I threw together in the key lookup. The script produced some cool output, but didn't do quite what I wanted to do.
Then I learned about Soundex. And then, even better, Metaphone. Better still, Double Metaphone. DM's benefit is that it returns multiple keys for a processed symbol, under the assumption that the symbol might be pronounced multiple ways. It was *almost* what I wanted, except it was still more or less limited to mostly-English words. I'd like to work with IPA, but whenever I asked about a library that attepts to take text and convert it to IPA symbols, I'm reminded that different dialects will say the same words different ways (engaging the vocal chords or not, for example.), and the same word may have a different meaning depending on how it's pronounced, which is also related to its context. A first-order markov model is likely to grant some self-correcting accuracy, though while a second-order or third-order model should do a decent job, they'd represent *huge* data sets.(When I was working with a 1st-order model, and considering moving to 2nd-order, I almost convinced myself to buy an SSD to dedicate to InnoDB.)
It seems obvious to me that you should be able to apply Metaphone's approach (a returned key for each possibility), and then use a markov model to refine which key has the most likely meaning in context. (Feeding it a language's dictionary with word/part-of-speech/IPA tuples would be most excellent)
As for speech recognition, aren't there any libraries or code bases out there that convert sound to IPA? It seems the most obvious solution. Heck, you could probably get away with some on-body sensors for more accurate detection of particular IPA symbols.
Incidentally, if you want the data and code I was playing around with, I put it here. Read the thirty or so lines of disclaiming comments before you complaint about it being a 65MB Perl script. (I didn't want to bother packaging multiple files, among other concerns.) LZMA compressed, so install the lzma package or grab 7zip, depending on your OS. Compressed, it's 6.4MB.
I have an irrational need for an accurate mathematical calculation model. What, precisely, can you do for me?
Sorry for the offense. I already mentioned that I misinterpreted your original comment. What I probably didn't mention was that the original post was in the middle of a 14-hour work day, and the one you just replied to came after six hours of sleep.
Weird things is what I do, though in this case, somehow, I thought you were talking about an English spellchecker, not an intuitive parser.
Embed something like this in your code editor, and have the editor display the lint output after N seconds of inactivity.
The biggest problem with writing a pure-JavaScript spellchecking tool would be providing the dictionary for it to work with, and pulling in that dictionary would add significantly to the page load time.
Of course, there are uses of JavaScript that don't require the code and data to be pulled from a network connection--in those cases, it should fairly easy to hack together one of your own by implementing a phonetic algorithm and/or using an edit distance calculation to identify plausible alternatives.
- Having to walk to places I would normally drive to
If you're the curious type, that can actually be a surprising amount of fun; when you're not rolling past at 35+mph, there's a lot of detail about your neighborhood that's harder to miss.
Hey, I don't like it any better than you do. Take it up with Taco.
However, I think we've drifted offtopic...
You're asking people to take a lifelong tax in order to start a business which may or may not last longer than a quarter, and that tax is cumulative with each business they attempt. Consider that the length of one's life is unknown (if it were known, insurance companies would be all over the guy handing out the "knowing"), and that seems like an awful lot to ask. Even a 30-year mortgage has an end date. Your suggested solution would be murderous on serial entrepreneurs.
Consider further that the email accounts and business records could live past the lifetime of the proprietor himself. That still leaves his former clients open to the same kind of fraud, particularly if the person/system rubber-stamping the payments isn't aware that the business that sent them the invoice no longer exists.
See a moderation called "Offtopic"
Also, if Bourdain's post wasn't directed at the OP, it should have started off with, "I don't know how to help you, but perhaps this should be a warning to..."
As it was, it read in a rather condescending tone.
Your use of the word "nominal" reminds me of this. The word "nominal" always left a foul taste in my mouth; it's like asking someone to give "only" some recurring amount. Aggregate that over a half-dozen someone's, and that recurring amount stacks up.
Let's say that the OP tries his hands at a few dozen businesses during his life. For every one of those domains, he's stuck with another recurring fee to manage. Even if the individual fee is low, it adds up.
Actually, kinda reminds me of the crap I cleaned off my hard drive this afternoon; tiny files can still fill up a drive, if you have enough of them.
Only us loners enjoy making groaners. ;)
(most notably Josh Bloch, who really had more to do with the Java APIs in their current form than any one person)
You mean Josh Bloch is to Java APIs what Alan Smithee is to films?
(I'm sorry; I have nothing against Java, but your sentence was just too funny to pass up.)
If it simply allows them to pack more pixels onto a sensor without being able to collect accurate color data with fewer photons, then quantum film is absolutely worthless.
Not true. Existing digital cameras have noise, particularly at the higher ISOs. The more readings you take from a "pixel" in the frame, the more you can negate this noise by averaging it out. One way to increase the number of samples is to stack several readings--increasing your ISO level, more or less.
Another way to increase the number of samples is to scale your resulting pixel array down, so that a pixel and its immediate neighbors get averaged into the same pixel, drowning out more of the noise. So if you can increase sensor pixel density without losing per-pixel quality compared to other technologies, then you can take those additional pixels, blend them, and come out with a better-quality apparent pixel.
So you're really asking for distinct improvement on two fronts, when the two values can be converted.
So now the average Slashdotter might know what the "bloat" you're talking about is for: Making it so that the programmer doesn't require as much in-depth knowledge of his platform, or have to do as much of the mundane work himself.
(At least, the average article-reading Slashdotter. Needed to clarify that before someone else points out the joke...)
No, no, no!
30 GOTO 10 ' FOR GREAT RANDOMNESS!!!1!!
(and apparently Slashdot thinks that pre-mixed-case architecture code is like yelling. i'm inclined to agree, but i've added this bit of completely-lowe-case code to the end to be a little less lame.)
(yes, that's right. in order to be less lame, i need to avoid using caps where occasionally appropriate. whatsnxt? avd xcssv vwls?)
It's paranoia and naiveté like yours that led me to stop hanging around here so much.
Paranoia in that everything company X does is evil or has an inappropriate or immoral ulterior motive. Naiveté in that you don't stop to recognize that the not all of the developers who work for an institution are going to output code of the caliber of its most senior, experienced and/or knowledgeable developers, nor can code review and automated tests catch all of the problems and gotchas known to computer science, academia and the body of professional programmers.
So can the "the devil is in the details" crap; you don't know what you're talking about. Building a complex software package that takes into account every possible detail in both process and implementation is impossible in any environment currently available for consumer software and general computing hardware. Just when you think you've got everything covered, nature builds a vendor builds a buggy component, security specialists discover a flaw in the way you learned to write your software, nature builds a better idiot, or a piece of a radioactive isotope in a memory module emits a beta particle, just to ruin your day.
REAL solutions to immediate problems don't depend on the rest of the world changing to suit my needs. Also, the fact remains that there are links out there that point to "http://www.rosettacode.org/w/index.php?something_or_other", not all of those links will (or can) change, and I would be an absolute fool to knowingly break them, if I want people to visit RCo via referral traffic.
A quick guess? Identifying unique sites by domain name, rather than by IP address, and either the bot or server not respecting HTTP 301 redirects.
With Rosetta Code, I once had www.rosettacode.org serving up the same content as rosettacode.org. My server got pounded by two bots from Yahoo. I could set Crawl-Delay, but it was only partially effective; One bot had been assigned to www.rosttacode.org, while another to rosettacode.org, and they were each keeping track of their request delay independently. I've since corrected things such that www.rosettacode.org returns an HTTP 301 redirect to rosettacode.org, and have was eventually able to remove the Crawl-Delay entirely.
I've since worked towards only serving up content for any particular part of the site on a single domain name, and have subdomains such as "wiki.rosettacode.org" redirect to "rosettacode.org/wiki", and "blog.rosettacode.org" to "rosettacode.org/blog". Works rather nice, though it does leave me a bit more open to cookie theft attacks.
YMMV; As I said, that was a quick guess.
Even that's a tricky path to cleanly draw; How can you know that that USB keyfob didn't have something on it that exploited a flaw in the FAT filesystem driver, and leave a clock-triggered piece of malware? Safest bet for a known incident is to wipe and reinstall. There are ways of doing such things automatically. :)
Bit me on Ubuntu 8.04. Which is still the most recent LTS release, and readily available at VPS providers like Slicehost and Linode.