The entire point of a "trade secret" is that it is secret. Trade secrets do not enjoy copyright or patent protection: both of those require that you disclose that which you are attempting to protect.
Erm well no. Unless you mean that Microsoft is about to disclose the source code to Windows.
Mining the moon would require placing the equivalent of heavy "earth" moving equipment on the surface. Doing that is expensive. So is getting the results back off the surface. He3 is only in the first few feet of moon surface because it comes from the sun. Go to the source.
A better design would be a sol-centric orbital platform, say in Mercury's L-5 point, collecting solar wind via magnetic trap (the "ram-scoop" idea) and using an on board mass spectrometer to separate the components, which are then bottled for use, storage or shipping. In that orbit, there'd be sufficient solar power to run all that.
There is no benefit to society to engage in costly, counterproductive and futile "wars" against P2P and other useful new technologies in the name of enforcing "intelectual property" laws created in a different era that now benefit only special interests and not the public interest.
I really hope you are right in this. But I fear
you are not. Remember that the world has been
fighting a rather pointless "war" on drugs
for the last 70 or 80 years with no end in
sight. The "war" on
copying might last just as long and claim just
as many innocent lives.
Since many of your "equivalences" are probably a single character, it would probably simplify the regexp if you used character classes with [] whenever possible, and only fall back to grouping and alternation (...|...) when absolutely necessary. This might still be too complex, but it might help...
I'm fairly sure that (a|b|c) will produce
exactly the same DFA as [abc]. In other words,
although the regexp might be a bit smaller, and
perhaps compile marginally quicker, it would
make no difference to the running time.
What I was interested in, however, was
getting around the limit of the GNU regexp
DFA. It seemed to use a 64K buffer, and
64K branch offsets, which means the maximum
size of a regexp you can compile is comparatively
small. (Much larger than anything you might
write by hand, of course, but quite limiting
when you start to write code which writes
regexps). There was no easy way to get
around these limitations, so I ended up using
various hacks to limit the size of the regexp
(splitting into multiple regexps), and also
tried using a flex-generated lexer instead.
To be honest, it disturbed me a bit too. But then a
lot of the stuff I did at that company disturbed
me, which was why I eventually did the right thing
and left.
1. How do you know they were typing "sh;t" or "sh1t"? Were you monitoring them? Do you still monitor them?
I'm not quite sure what you mean by the first part
of this question. I'll send you the code if you
ask me. Yes, we did monitor and record everything.
Yes, I imagine they still do this. In the UK
internet chatrooms are associated (in the public's eye) with the
evils of predatory paedophiles. Mainly by the
right wing tabloid press who sell a lot of
newspapers this way. So monitored chatrooms
are a big selling point (or would be if the
company in question actually had any clue -
I doubt they could sell cheap water to a
thirsty man).
2. Was blocking certain words the only option that was discussed?
Well, the other option (which I favoured) was
not blocking anything at all. As you can see
I didn't get my way:-)
3. Do you personally feel that children should be free to talk about anything they like with their peers (or did you only do it for the cash)?
For what it matters, I personally think the
whole episode was a bit silly. I was proud
that I managed to make some software which
was clean and effective, but not particularly
proud about how it was being used for this
sort of silly censorship.
1) Combinations like fuch, fvck, focker, schit, schidt, "suck my dictionary"
no, yes, no, yes (I think), no, no.
2) Non-swearing such as cockpit, cocktail -- if you use a dictionary of acceptable words to sidestep the filter, would it still filter out non-English words such as soshite (a Japanese word)?
IIRC cock wasn't on the list of swearwords.
Swearwords in the middle of words not filtered
(see previous posting).
3) What about words that can either be used as swears or as non-swears? Dick is a man's name, and there's nothing offencive about cocking a gun or using ``ass'' to describe a donkey
These were all not swearwords, so no issue.
The real issue is v1@gra spam, don't forget!
I think filtering out childrens' swearwords
in a chatroom is just silly. But I was getting
paid to do it, so...
But will it filter the town name Scunthorpe as being offensive? AOL had this problem where people living in Scunthorpe suddenly found they could no longer use their town name.
It handles this case correctly. There is actually
some extra code I added to handle cases like this
(specifically the word "scrape").
Basically the regexp is modified so it only
matches at either the beginning or the end
of a word, using word boundary matching. Not
completely ideal, but good enough.
At my last job I wrote a chat server which was used by school age children.
One of the requirements (coming from "concerned
parents", of course) was to filter out swearing
in the chat rooms. So if someone typed in, say,
"you're a shit", what would actually appear for
everyone else would be "you're a $!%^" or something similar.
Eventually, of course, we got into an arms race with
the kids, who would write "sh1t", "s.h.i.t", "sh*t" and so
on.
However, I came up with a program which generated a regexp which matched pretty much all the variations, and - to date - none of the kids have
worked out a way around it.
This is how it worked.
(Actually, I can
send anyone the original regexp generator
code if they're interested - just mail me).
The basic concept was to use a table of "equivalences", for, eg. "a" => [ "@", "4",
"A",....], "f" => [ "ph",.... ]
For each swear word we generate a regexp
with (r1|r2|r3|...) for each letter in the bad
word, where r1, r2, r3,... are the list of
equivalences for that letter.
That produces a list of swear word - matching
regexps which we then combined into a super
mega regexp which would match any of the 50 or
so banned words.
One interesting thing is that you can end up
with a regexp which is too big for GNU regexp
to handle... But there are ways to get round
that and you can code it up as a flex parser
too which doesn't have any limits as far as I
can tell.
The actual code is slightly more complex and
does a few more things than above (eg. it
works for "s.h.1.t" too, or even
"s---h--1----------t". And it has a concept
of "obliterator characters", so "sh*t" can be
banned also.
While I agree with your main point, this is
wrong:
And the terms of the GPL are all that prevents Microsoft from swiping the Linux source and creating an "MS Linux" loaded with trade-secret/closed-source "enhancements" (e.g. support for the full Windows API).
Microsoft could easily create an enhanced MS Linux
if they wanted to. All they would need to do
would be to write a Win32 library, similar to
wine / winelib.
I can't see them doing that unless things get
very desperate however...
The perception in the UK seems to be a bit different, which isn't surprising considering that we lose out by not being able to play R1 DVDs.
The speciality video store up the road from me (in the UK - region 2) rents out region 1 DVDs, and includes instructions on how to mod popular DVD players to play them.
So next time someone gets murdered, perhaps we should all just throw our hands up and say "it's too difficult - someone provide the proof please!"
This is what the police are for. They go out, find the evidence, present it to court, and the murderer gets convicted.
It's not so hard to find these spammers - they leave a trail of abuse a mile wide. Just go to spamhaus.org to download your initial list of suspects.
Put them in prison. They're abusing the resources of the world just as much as someone who goes and trashes your local park or sets fire to an old barn - except on a much larger scale with huge social and economic consequences.
Well yeah, Mars might be a nice base of operations for future space operations but we're talking about now. Until we have built a base of operations on Mars or the Moon then transiting equipment to LEO or an Larangian point prior to transiting it elsewhere would be more efficient than simply trying to blast it to the final destination directly.
Actually not true. It is comparatively easy to get to Mars, and once you're there you have a huge amount of raw materials you can dig out of the ground, and fuel and oxygen you can extract from the air. If you go to the Lagrange point, you have to haul EVERYTHING you need there. That's pointless and stupid.
It is absurd to sit on Mars when we could have perfectly fine space stations. Why be confined by Mars gravity? It is a must to build space stations first, then moon bases, and once a moon base proves self-sufficient, then mars bases.
Because there's STUFF on Mars, for building things. Raw material in the ground you can dig out. Fuel and oxygen you can extract from the air. There's NOTHING at the Lagrange point now.
Also, Mars's gravity isn't nearly as strong as gravity on earth, and so it's not confining.
Thankfully someone else took the time to actually
look at the patents, and, yes, they all cover the
"VFAT" format for munging long filenames into the
existing FAT standard.
No real news here for camera manufacturers - they
only use short 8.3 names.
Nor, probably, for the rest of us. All the patents
were granted after April 1995, which means it's
highly likely that there's prior art, and the
technique is pretty obvious anyway.
The son of a dead author who wrote a book some 60 years ago if blocking a film which likely as not won't contain a single line of dialog from the original, but contains some "similar" characterisation.
I could be 100% wrong but I think the sticking point of the "High profile cases" that the/. post mentions is that people called and verified that it was indeed the owner of the phone who had answered it at a time when a crime occured. But I dunno that for sure.
Actually no. In the case I heard about (a notorious case involving a black school child murdered in South London in an apparently racially motivated killing) the phones were not called. I understand that mobile phone positioning information is now used fairly routinely in police cases.
The entire point of a "trade secret" is that it is secret. Trade secrets do not enjoy copyright or patent protection: both of those require that you disclose that which you are attempting to protect.
Erm well no. Unless you mean that Microsoft is about to disclose the source code to Windows.
Rich.Mining the moon would require placing the equivalent of heavy "earth" moving equipment on the surface. Doing that is expensive. So is getting the results back off the surface. He3 is only in the first few feet of moon surface because it comes from the sun. Go to the source.
A better design would be a sol-centric orbital platform, say in Mercury's L-5 point, collecting solar wind via magnetic trap (the "ram-scoop" idea) and using an on board mass spectrometer to separate the components, which are then bottled for use, storage or shipping. In that orbit, there'd be sufficient solar power to run all that.
Cool ... Solar Scoops, just like in Elite!
Rich.
Anyway, as reported by the BBC, American scientist Don Mitchell found the original Soviet Venera probe data from the surface of Venus and he applied modern image processing techniques to it to produce some stunning new pictures.
He also has a really fantastic site about the Soviet Venera probes.
Rich.
I really hope you are right in this. But I fear you are not. Remember that the world has been fighting a rather pointless "war" on drugs for the last 70 or 80 years with no end in sight. The "war" on copying might last just as long and claim just as many innocent lives.
Rich.
I'm fairly sure that (a|b|c) will produce exactly the same DFA as [abc]. In other words, although the regexp might be a bit smaller, and perhaps compile marginally quicker, it would make no difference to the running time.
What I was interested in, however, was getting around the limit of the GNU regexp DFA. It seemed to use a 64K buffer, and 64K branch offsets, which means the maximum size of a regexp you can compile is comparatively small. (Much larger than anything you might write by hand, of course, but quite limiting when you start to write code which writes regexps). There was no easy way to get around these limitations, so I ended up using various hacks to limit the size of the regexp (splitting into multiple regexps), and also tried using a flex-generated lexer instead.
Rich.
1. How do you know they were typing "sh;t" or "sh1t"? Were you monitoring them? Do you still monitor them?
I'm not quite sure what you mean by the first part of this question. I'll send you the code if you ask me. Yes, we did monitor and record everything. Yes, I imagine they still do this. In the UK internet chatrooms are associated (in the public's eye) with the evils of predatory paedophiles. Mainly by the right wing tabloid press who sell a lot of newspapers this way. So monitored chatrooms are a big selling point (or would be if the company in question actually had any clue - I doubt they could sell cheap water to a thirsty man).
2. Was blocking certain words the only option that was discussed?
Well, the other option (which I favoured) was not blocking anything at all. As you can see I didn't get my way :-)
3. Do you personally feel that children should be free to talk about anything they like with their peers (or did you only do it for the cash)?
For what it matters, I personally think the whole episode was a bit silly. I was proud that I managed to make some software which was clean and effective, but not particularly proud about how it was being used for this sort of silly censorship.
Rich.
no, yes, no, yes (I think), no, no.
2) Non-swearing such as cockpit, cocktail -- if you use a dictionary of acceptable words to sidestep the filter, would it still filter out non-English words such as soshite (a Japanese word)?
IIRC cock wasn't on the list of swearwords. Swearwords in the middle of words not filtered (see previous posting).
3) What about words that can either be used as swears or as non-swears? Dick is a man's name, and there's nothing offencive about cocking a gun or using ``ass'' to describe a donkey
These were all not swearwords, so no issue.
The real issue is v1@gra spam, don't forget! I think filtering out childrens' swearwords in a chatroom is just silly. But I was getting paid to do it, so ...
Rich.
It handles this case correctly. There is actually some extra code I added to handle cases like this (specifically the word "scrape").
Basically the regexp is modified so it only matches at either the beginning or the end of a word, using word boundary matching. Not completely ideal, but good enough.
Rich.
However, I was doing my job and getting paid for it ...
If it helps to make a small dent in the quantity of v1@gra spam, then so much the better though.
Rich.
(That's a famous trademark in the UK, though :-)
It does work on things like fu(k though.
Rich.
One of the requirements (coming from "concerned parents", of course) was to filter out swearing in the chat rooms. So if someone typed in, say, "you're a shit", what would actually appear for everyone else would be "you're a $!%^" or something similar.
Eventually, of course, we got into an arms race with the kids, who would write "sh1t", "s.h.i.t", "sh*t" and so on.
However, I came up with a program which generated a regexp which matched pretty much all the variations, and - to date - none of the kids have worked out a way around it.
This is how it worked.
(Actually, I can send anyone the original regexp generator code if they're interested - just mail me).
The basic concept was to use a table of "equivalences", for, eg. "a" => [ "@", "4", "A", ....], "f" => [ "ph", .... ]
For each swear word we generate a regexp with (r1|r2|r3|...) for each letter in the bad word, where r1, r2, r3, ... are the list of
equivalences for that letter.
That produces a list of swear word - matching regexps which we then combined into a super mega regexp which would match any of the 50 or so banned words.
One interesting thing is that you can end up with a regexp which is too big for GNU regexp to handle ... But there are ways to get round
that and you can code it up as a flex parser
too which doesn't have any limits as far as I
can tell.
The actual code is slightly more complex and does a few more things than above (eg. it works for "s.h.1.t" too, or even "s---h--1----------t". And it has a concept of "obliterator characters", so "sh*t" can be banned also.
If anyone's interested I can send the code.
Rich.
While I agree with your main point, this is wrong:
And the terms of the GPL are all that prevents Microsoft from swiping the Linux source and creating an "MS Linux" loaded with trade-secret/closed-source "enhancements" (e.g. support for the full Windows API).
Microsoft could easily create an enhanced MS Linux if they wanted to. All they would need to do would be to write a Win32 library, similar to wine / winelib.
I can't see them doing that unless things get very desperate however ...
Rich.
The speciality video store up the road from me (in the UK - region 2) rents out region 1 DVDs, and includes instructions on how to mod popular DVD players to play them.
Rich.
This is what the police are for. They go out, find the evidence, present it to court, and the murderer gets convicted.
It's not so hard to find these spammers - they leave a trail of abuse a mile wide. Just go to spamhaus.org to download your initial list of suspects.
Put them in prison. They're abusing the resources of the world just as much as someone who goes and trashes your local park or sets fire to an old barn - except on a much larger scale with huge social and economic consequences.
Rich.
Rich.
Actually not true. It is comparatively easy to get to Mars, and once you're there you have a huge amount of raw materials you can dig out of the ground, and fuel and oxygen you can extract from the air. If you go to the Lagrange point, you have to haul EVERYTHING you need there. That's pointless and stupid.
Grab a copy of Zubrin's A Case for Mars.
Rich.
Because there's STUFF on Mars, for building things. Raw material in the ground you can dig out. Fuel and oxygen you can extract from the air. There's NOTHING at the Lagrange point now.
Also, Mars's gravity isn't nearly as strong as gravity on earth, and so it's not confining.
Good reading
Rich.
http://www.vcnet.com/bms/departments/innovation.ht ml
Rich.
Thankfully someone else took the time to actually look at the patents, and, yes, they all cover the "VFAT" format for munging long filenames into the existing FAT standard.
No real news here for camera manufacturers - they only use short 8.3 names.
Nor, probably, for the rest of us. All the patents were granted after April 1995, which means it's highly likely that there's prior art, and the technique is pretty obvious anyway.
Rich.
The son of a dead author who wrote a book some 60 years ago if blocking a film which likely as not won't contain a single line of dialog from the original, but contains some "similar" characterisation.
This is fair and right how exactly?
Rich.
: Rich.
Actually no. In the case I heard about (a notorious case involving a black school child murdered in South London in an apparently racially motivated killing) the phones were not called. I understand that mobile phone positioning information is now used fairly routinely in police cases.
Rich.
For reference here's how I fixed this, on my Debian machine: I edited /etc/X11/Xwrapper.config and removed completely the line which sets nice_value.
If you don't want to restart your X server to make the change have effect, then you can instead do:
renice 0 PID
where PID is the process ID of the X server.
Rich.
s/now/not/
Rich.