mdmkolbe · Slashdot Mirror

Re:Where's the OCR? on Malicious QR Code Use On the Rise · 2011-12-30 08:15 · Score: 2

Yes! Please! So many QR codes are in-place-of rather than in-addition-to a human-readable URL. If I don't have my phone with me or don't want to bother digging it out of my pocket (or don't even have a QR-enabled phone), then the QR code is just obfuscation.

Smart people will always include a human-readable URL next to the QR code, but given that most QR designers evidently aren't smart enough for that, I'll settle for a human-readable QR.

Re:If there are two of them in the room, on Picture Blocking Beer Cooler Keeps Your Face Out of Embarrassing Photos · 2011-12-14 14:43 · Score: 1

I think you've hit upon how to defeat it. Time to start getting suspicious of anyone carrying two cameras.

Sigularity or L4 as a browser? on Firefox Too Big To Link On 32-bit Windows · 2011-12-14 07:02 · Score: 1

I wonder if you could borrow ideas from operating systems like micro-kernels (e.g. L4) or software isolated processes (e.g. Sigularity)? It's not a project that I have the time or experience for, but I'd be interested to see the results if someone else tried.

Re:Dec 2010 Slashdot Comments on IBM Makes First Racetrack Memory Chip · 2011-12-05 09:49 · Score: 1

Yeah, but then the second page says "the IBM work doesn't yet demonstrate all of the key components that make racetrack memory desirable", so I guess we technically have a flying car, but not the kind that people want.

Re:Something To Think About on Google Researchers Propose Plan To Fix CA System · 2011-11-30 04:42 · Score: 1

How does DNSSEC interact with the recent SOPA/PIPA/ICE-takedowns? Some have suggested alternatives to DNS to avoid government interference. Is DNSSEC subject to the same sorts of interference?

(It strikes me as comical that the pro-DNSSEC and the anti-SOPA/PIPA/ICE crowds might be working at cross purposes.)

Re:I'm confused on US Gov't Seizes 130+ More Domains In Crackdown · 2011-11-26 15:02 · Score: 1

When you post bail, you go free until they prove you guilty. When you object to a seizure, the domain stays seized until you prove your innocence.

Re:I'm confused on US Gov't Seizes 130+ More Domains In Crackdown · 2011-11-26 07:14 · Score: 1

With an arrest you can post bail. The point isn't to punish you until a court clear you, but to ensure that you don't run away (or go on a killing spree) while the trial is ongoing. This isn't anything like an arrest.

If it makes you feel better, according to DoJ/ICE, they have a seizure warrant issued by a federal judge. However, this still reeks of bad things to come (e.g., DoJ/ICE calling it "theft" doesn't inspire me with hope). I may not defend counterfeiting, but I will defend our Internets.

On the plus side, we've needed a DNS2 for a while and these actions might be enough to get it started.

Re:What's the difference? on Penguin Yanking Kindle Books From Libraries · 2011-11-22 07:43 · Score: 1

it's fairly obvious the GP was talking about tangible goods.

"Intellectual property" isn't a tangible good. The GGP talks about "intellectual property" being like any other property.

Re:What's the difference? on Penguin Yanking Kindle Books From Libraries · 2011-11-21 18:03 · Score: 3, Insightful

Things that you can buy/sell but that most people wouldn't call property:

Education
Legal advice
Delivery of a letter or package
Medical treatment
Advertising
A hair cut
Maid services
Someone choosing to settle out of court instead of suing you
A judge's favor (i.e. bribery)
etc.

Re:Is it just me... on Penguin Yanking Kindle Books From Libraries · 2011-11-21 13:53 · Score: 4, Insightful

And I'm sure those /.ers are just as frustrated when you act as if information is a form of property subject to the same rules as physical goods.

Re:It's a well studied problem on Pakistan Bans 1600 Words and Phrases For Texting · 2011-11-21 13:11 · Score: 1

True, but why did you launch directly into automata?

Because I overlooked the fact that the words are delimited by spaces. Let me reanalyze the problem. (Yes, I do this kind of thing for fun.)

Once you split messages into words, all matches are exact matches instead of searching for something in the middle of the string. In that case, as you say, hashes may do the job better since you don't need the extra power than automata give you.

On the other hand, depending on whether you are optimizing for CPU time vs memory accesses, tries might do better than a hash (and have no false positives that require a second level of scanning). For 1600 words, the space to store the trie should be quite small (
With a trie, you can also integrate the process of splitting up the message into the trie matching algorithm by making the 'space' edges point back to the root of the trie. If you do that, you end up with a deterministic automaton (which is probably also minimal) that takes less than 13KB to store.

Costs of hash:
- n linear, cold memory reads (reading message);
- n character compares against a constant (space finding);
- k*n math ops (hash computation);
- ~n/8 (one per word) random, hot memory reads (bloom table lookup).
Cost of trie:
- n linear, cold memory reads (reading message);
- n random, hot memory reads (trie table lookup).

Which of those is better comes down to whether k*n math ops are cheaper or more expensive than n table lookups. On an embedded CPU that answer is going to be different than on a desktop GHz CPU. However, all of that having been said, the bloom-filter has an important advantage: it can be tuned to handle larger sets of words in exchange for a larger false positive rate. (The trie would be less susceptible to DOS, but I don't think that is likely to be a major consideration.)

Conclusion: It was stupid of me to over look the space delimiters. Without knowing more about the performance characteristics of the target platform, it is difficult to determine for certain which one will perform better, but I agree that Bloom-filters are a good idea and are likely to do a better job than automata.

It's a well studied problem on Pakistan Bans 1600 Words and Phrases For Texting · 2011-11-20 10:14 · Score: 5, Interesting

With a maximum character length of 140 characters, 1600 strings to match, and assuming 8 character long strings, it would take 140*8*1600=1,792,000 character matches per message if you do it naively. That is only a millisecond on modern GHz processors, but when processing large numbers of messages using embedded processors, that is probably a few more cycles than you want to spend on each message. You can do better by using Knuth-Morris-Pratt or Boyer-Moore. Since we can pre-process the strings to be matched, this means it takes only 140*1600*k=224,000*k (for some k determined by the algorithm). This is better, but not by much.

Notice that the dominant factor is the 1600 strings to be matched. If you really care about performance, then you want to get rid of that factor. Simplest way is to build a finite-state automaton. If it is encoded as an NFA, the performance won't be much better than before, but if you encode it as a DFA, then each message can be processed in only 140 table lookups. The downside of this is the size of the lookup tables. In the worse case, expect them to take terabytes of space depending on the particular 1600 strings being matched.

There are algorithms like Rabin-Karp and Aho-Corasick that might take less space while still taking only ~140 character operations. The practical answer, is to try DFA, RK, and AC to see which, if any, don't require too much preprocessing space, and then use one of those. The space requirements will depend on the particular text involved, but there are good odds that the tables for DFA will be small, and even better odds that the tables for RK and AC will be small.

Searching and sorting are two of the most well studied algorithmic problems in computer science. If you ever find yourself wondering how to do them efficiently, there is a good chance that very smart people have already figured out how to do it.

Re:There are two aspect of the problem on Tipping Point For Open Access CS Research? · 2011-10-22 04:40 · Score: 4, Informative

Actually, the ACM recently refused to publish an author because he posted it on ArXiv.

This was a copyright assignment issue, but it directly impacts the strategy you suggest. As an academic myself, the copyright assignment issue is as big an issue as open access. For example, ACM does not allow me to let others use any figures I publish with the ACM. Sorry, Wikipedia, I may have the perfect figure to illustrate one of your articles, but the ACM won't let me give it to you.

I'm not even allowed to use my own figures for my own uses unless I put an ACM copyright notice on every copy of the figure and every slide with such a figure. This is not consistent with academic practice and custom (almost all presentations at ACM conferences violate this rule).

What about IFRAMEs and linked images? on Canadian Supreme Court Rules Linking Is Not Defamation · 2011-10-19 05:03 · Score: 2, Interesting

Technically IFRAME and IMG are just links, but would they qualify as a more affirmative action and thus constitute defamation under this ruling?

Re:What if the defamation is in the link? on Canadian Supreme Court Rules Linking Is Not Defamation · 2011-10-19 04:59 · Score: 1

If we following the footnote analogy, then this is equivalent to a footnote to a book with a defamatory title. Citing such a book is probably protected, but creating a made up title for the purpose of defaming someone might not be defamatory.

Re:Problems with R7RS on R7RS Scheme Progress Report · 2011-10-06 09:16 · Score: 1

First, thank you for taking the time to address these concerns. If I am critical, it is only because I want to see R7RS succeed and be a good language spec.

1) I'm glad to hear about the use of "textual". Watching from the sidelines, it seems like the process has been prone to reinventing things in ways that different than R6RS without being better. That makes it hard for implementors to provide compatibility and smooth upgrade paths.

2) I'll grant that Chicken is a major Scheme. Your point about phasing being unneeded in R7RS-small is well taken, provided R7RS-small libraries/modules can be scaled up to include phasing as any implementation with syntax-case will have to do. Still, once you eliminate the phasing issue, I guess I don't understand why R7RS-small modules are any better or easier to use or implement than R6RS libraries sans phasing.

3) I guess I have a different view of the world than you. In my neck of the woods, SRFI's don't get a lot of support and aren't actually de facto standards. Thus, I tend to think of R6RS trumping SRFI-9 rather than the other way around. It is unfortunate that R6RS collided with SRFI-9, but R7RS has the opportunity to reconcile the two as SRFI-9 is a semantic subset of R6RS records. It would be nice to either use the syntactically restricted subset of R6RS records, or even if they used the same keyword, to make their syntaxes distinct. As it is, I believe there are invocations of define-record-type that are inherently ambiguous as to which syntax is being used.

4a) (parameterize ([param (param)]) ) is not equivalent to if the converter isn't idempotent. This might not be an issue until you consider if someone wants to save and restore param value from an outer scope in an inner scope. For example, (let ([old (param)]) (parameterize ([param new]) ... (parameterize ([param old]) ...) ...)). Throw call/cc's in there that jump across the dynamic-wind of the parameterize and implementations based on SRFI-39 will get quite problematic.

4b) The problem shows up in things like (define-syntax my-include (syntax-rules () [(_ file) (include file)])). On hygienic macro systems, this macro is not equivalent to the "include" macro. The "include" macro imports things into the scope where it is used, but "my-include" imports things into the scope where "my-include" is defined, not where it is used. Sometimes that is what you want, but often it is not. Any macro that uses "include" inside it will have this issue, and this makes it impossible for the user to write more sophisticated variants of "include" (e.g. that do instrumentation). In general, anything that converts from non-syntax to syntax has to deal with this problem. This is most clearly seen with "datum->syntax" where it is solved by passing an extra argument from which the marks are taken. (Also, R7RS isn't clear about whether things are placed into the scope of where "include" is called or where the "include" identifier originates; there is a difference even in syntax-rules.) I can elaborate on this further if it isn't clear what I'm getting at.

5) Given that on a system with Unicode, you almost never want to do a non-normalizing case-insensitive match and that it is hard for a user to efficiently implement their own normalizing case-insensitive match, it seems an odd corner of the rectangle to omit.

6) OK, I'll blame R6RS for that one.

7) But how many of the "big 5(+/-2)" have been positive? Silence from the implementors is a bad thing, because silence is non-participation. R6RS for all its faults at least had strong implementor representation. My understanding is that R7RS is not so strong in this regard, and that worries me. Some features may sound good on paper, but end up being impossible to efficiently and/or correctly implement. Note that I'm drawing a distinction between being possible to implement and being possible to implement well. Each Scheme implementation will have a different de

Problems with R7RS on R7RS Scheme Progress Report · 2011-10-05 04:33 · Score: 1

Pointless changes:

Renamed "textual" ports to "character" ports. Note that they kept "binary" ports which is properly parallel to "textual". The parallel of "character" (noun) is "byte" (noun), not "binary" (adjective).
Library system incompatible with R6RS. R6RS already had a least common denominator library system that all the major Schemes have already implemented. Good luck getting them to implement yet another library system. The summary is misleading when it claims R6RS libraries are complicated. R6RS libraries are about as simple as you can get and still handle the corner cases when macros are involved. In fact, from what I see, the R7RS-Small modules are a super-set of the R6RS features and are more complicated than R6RS libraries.
Record system incompatible with R6RS. R7RS-Small records are a proper subset of R6RS records, but rather than use a compatible syntax, they used a syntax that makes it ambiguous use R6RS and R7RS records syntaxes together. They could have either used a subset of the R6RS record syntax or chosen a different keyword or a syntax that distinguishes the two.

Slashdot Mirror

User: mdmkolbe

Comments · 1,038