An Algorithm To Stop Joke Plagiarists
The basic algorithm is very similar to the random-sample-voting algorithm that I've advocated as a way to stop vote manipulation on Digg, how to handle abuse reports in a scalable way on Twitter and on Facebook, and how to identify the best ideas submitted to the White House's "We The People" petition site. The algorithm can be used to rate the best jokes (at least according to the average rating of users, not according to some Platonic ideal), while still flagging plagiarized jokes and preventing anyone from building up a following by using them.
Under the algorithm, suppose a subset of users -- let's say, 1 million -- signs up to receive tweets in the general humor category. When a would-be amateur comedian comes up with a funny tweet, then in addition to tweeting it to their followers (if they have any), they can submit it to the humor category generally. The joke is first pushed to the feeds of, say, 1,000 randomly selected users, who have the option of rating it (independently of each other, without seeing the opinions of other raters). Once the joke has acquired enough ratings to constitute a statistically significant sample -- so that the average rating really does reflect the community's "opinion" of the joke -- then the joke gets released into the general pool of jokes available to all 1 million users subscribed to the "humor" category. Those users can decide what threshold of quality they want to set for the jokes that show up in their feed -- for example, if you only want to see jokes that got an average rating of 9 out of 10 or higher, you might only see 50 a day, but if you can lower your standards down to an 8, you might see 100 or 200. And if a user really likes a particular joke that they see in their "threshold feed," they can browse the other jokes in that author's Twitter feed and decide whether to follow them.
So if your joke sucks, it will only end up wasting the time of about 1,000 people, but if it gets a high rating, it will be available in the feeds of up to 1 million people. Thus from the user's point of view, only about 0.1% of the jokes that they see in their feed, are sucky jokes that were pushed to them as part of an initial "focus group" to measure their quality; the other 99.9% is made up of jokes that met whatever threshold they set for the average rating.
As I've stressed in the case of other applications of the random-sample-voting algorithm, this system is scalable, because the number of available reviewers grows as the community grows. It's also non-gameable -- because the raters are randomly selected, even if you create a large number of zombie accounts to try and upvote your own joke, the zombies won't constitute a significant portion of the raters, if the raters are selected from the entire pool of 1 million users.
Still, even under this system, it would be possible to take a highly rated joke and re-word it slightly (to fool any text filters looking for blatant copy-and-paste jobs), and pass it off as your own, hoping that your re-worded version will also get pushed out to a wide audience and net you some extra followers. To prevent this, you can implement a "duplicate" flagging feature that also relies on the random-sample-voting system:
- If a user recognizes a joke as a re-worded version of someone else's tweet, they can flag it as a "duplicate", with a link to the earlier tweet that they think is similar. (Flagging it as intentional "plagiarism" would be a bit harsh, since it's quite common for multiple comedians to come up with the same joke.)
- The flagged joke, along with a copy of the earlier joke, would once again be sent out to a random sample of subscribers to the humor category, who are then asked to vote on whether the two jokes are substantially similar.
- If a statistically significant majority of those users vote that the two jokes are essentially duplicates, then the second tweet gets displayed with a flag icon (shorthand for "our users have identified this as a duplicate of an earlier joke") with a link back to other tweet that was identified as an earlier version of essentially the same joke.
- If a majority votes that the two jokes are not similar, then nothing happens. Optionally, if an overwhelming majority of the users vote that the two jokes are not at all similar, then some kind of reputation point penalty could be applied to the user who flagged the second joke as a "duplicate". This discourages people from frivolously duplicate-flagging a joke.
This does have the unfortunate result that if you unintentionally write a joke that duplicates someone else's, it will still end up with the "duplicate" flag after users recognize the similarity to the earlier version. This is, however, something that I don't think any algorithm can solve, because it's impossible to detect the difference between someone copying another person's joke and independently coming up with it on their own. A comedian whose joke ends up being labeled with the "duplicate flag", just because someone else came up with the same gag first, could leave the joke in their feed, but they might consider the duplicate flag to be a mild embarrassment.
On the other hand, if you're just a full-time plagiarist like the Fat Jew, and virtually all of your jokes end up being flagged as clones of other people's work, then your entire feed will be littered with "duplicate" flags that mark you as a hack. Depending on whether Twitter's terms of service prohibit serial plagiarism, your account could even get suspended.
Meanwhile, anybody could still set themselves up as a curator who re-tweets other people's jokes with the original attribution intact. Many users would find that they wouldn't need curators at all, when they can just subscribe to all jokes that get an average rating of, say, 8.5 or higher, but if your humor happens to align very closely with the kind of jokes picked out by a particular curator, you could subscribe to get jokes re-tweeted directly from them. And since the original attribution would be intact, any time you saw a joke that you really liked, you could subscribe to updates directly from that author. Curating can still serve a valuable function that plagiarism does not.
In addition to dealing with plagiarists, though, what I think is interesting about this system is how it would overturn everything we know about what it takes to build a reputation. In the current ecosystem, to build a following, it helps to have good content, but what really matters is hustle -- making friends in high places who might be able to give you a boost with a re-tweet or a shout-out, looking out for opportunities for free publicity, etc. Well, I admire the people who have the energy to keep that up. But from an economic standpoint, "hustling" is a non-productive activity, because it doesn't actually make your content better, it's just an attempt to crowd out someone else's content with your own, which may be better or worse, and it's a zero-sum game. The "hustling" ecosystem is also non-optimal from the user's point of view -- if Joe is better at writing jokes, but Bob is better at hustling, then you as the user are more likely to be exposed to Bob's sub-optimal content, and may never even hear about Joe.
The random-sample rating system, however, makes the entire notion of "hustling" obsolete. The only way to get your content in front of lots of people, is to write content that gets a high average rating from the initial sample of people who see it.
If such a system ever gets implemented, by Twitter or any other company, maybe the Fat Jew can find out if any of his own original material meets the bar. But don't hold your breath -- the marquee joke currently displayed on his Twitter feed is "You can't get an STD if you never get tested."
Guess I was wrong.
Sad, really.
/r9k/?
Seriously? This sounds like an article from the Onion.
You are all cows. Cows say moo. MOOOOOOO!! MOOOOOOO!!! You joke plagiarizing cows!!!
I thought we were overdue from a Bennett post. Now my titillating curiosity has been titillated.
The comedy world crucified Josh "Fat Jew" Ostrovsky for building his career on re-tweeting other people's jokes without attribution.
Oh, look at Bennett trying to be fun-edgy.
...which could be used to prevent the same story from appearing on Slashdot FIVE FREAKING TIMES with a few word changes to get it past the moderators.
Seriously, it says it right there in the summary: This is "very similar to the random-sample-voting algorithm that I've advocated as a way to stop vote manipulation on Digg, how to handle abuse reports in a scalable way on Twitter and on Facebook, and how to identify the best ideas submitted to the White House's "We The People" petition site".
And by "very similar", he means "basically the same". Enough, already!
"Stuff that matters" indeed.
Just cruising through this digital world at 33 1/3 rpm...
Oh boy! Another chapter in Bennett's blog!
I can't wait to read it...
You're still posting stories from this loser? Speaking of people that could go away.
worth watching
https://www.youtube.com/watch?v=gdugSUFbzws
This, because, you know... nothing of importance is left to do in the world.
Somebody once said -- We ran out of real problems when we started buying a spray for 'static cling'
Waiting for the book release "Earth, Life, and Everything : Mission Accomplished"
Pretending this is my office full of bitter coworkers..
Are totally screwed.
Hilarity does NOT ensue.
Slashdot is filled with lousy posts nowadays, so I am used to crap, I just go quickly through them without even noticing the submitter. But, every time time there is a Bennett one, I immediatelly look up to verify the submitter after just having read two sentences. How does he do it? How is his crap so distinctive as to be instantly recognized? What kind of algorithm could detect if a piece of text was written by Bennett to filter it out of our internet?
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
TL:DR
Your incoherent ramblings are an inspiration to us all.
we've all heard this joke before
Why give away an golden censorship idea like this when you can write it and be reviled the world over yourself!
Did you hear the one about the Bennett Haselton post that wasn't tedious bullshit?
Me neither.
How do you find Will Smith in the snow?
You look for the fresh prints.
Originally told by Ugg the caveman, with minor changes along the millennia.
Hasn't comedy *always* been the domain of people stealing jokes with the successful people being able to fit them into their own styles with their own humour?
It's not like stand up comedians like:
- Rodney Dangerfield
- Joan Rivers
- George Carlin
- Jerry Sienfeld
didn't have their jokes repeated endlessly as well as use other people's jokes as part of their career. I have seen all four listed above live and they all did jokes that I've heard from Groucho Marx, WC Fields, Abbot & Costello and others (who probably stole them originally).
It's a hard living and even if you are successful you have to deal with the likes of parasites like Josh Ostrovsky and Jackie Martling while finding other people's jokes and routines that fit into your persona and act.
It's a circle of life thing.
Mimetics Inc. Twitter
aardvarkjoe's Bennett blocking script
Seriously, Slashdot should look into the development of such a system...
...until Hitler hears about this algorithm...
If a user recognizes a joke as a re-worded version of someone else's tweet, they can flag it as a "duplicate", with a link to the earlier tweet that they think is similar. (Flagging it as intentional "plagiarism" would be a bit harsh, since it's quite common for multiple comedians to come up with the same joke.)
So we're expecting one sample of 1000 people to overlap with another sample of 1000 people AND that they will read and remember enough of the jokes to mark it as plagarism? If that's not what is assumed then one could still surely still game the system and harvest jokes that (effectively) nobody has seen by making multiple accounts and stealing all the best jokes that only 1000 people see....
Back in the '70s, it commonly believed that the least funny person in the entire world (which admittedly only had about 3 billion ppl back then) was consumer advocate Ralph Nader.
(Saturday Night Live responded by inviting Ralph to guest host one of their shows).
After skimming TFS, I think we have a candidate for the world's least funny person in 2015.
Yet again, Bennett Haselton inspires us with a short-sighted solution, having never considered whether his system will actually work.
1. If a user recognizes a joke as a re-worded version of someone else's tweet, they can flag it as a "duplicate", with a link to the earlier tweet that they think is similar.
Right there in step 1 is the problem. By requiring a link to a sentence someone read months ago, the burden on the user is raised unacceptably. Users won't bother policing when it's difficult, unless the case is severe enough to stir up an outrage - which would already result in more damage than just flagging a user's tweets.
Of course, the potential for abuse is also high. Changing a single word can parody an original post, yet changing a different single word may not avoid plagiarizing. An automated algorithm won't likely be able to tell the difference, so it will fall to manual effort to identify which flagged duplicates are actually malicious. In context, even an identical phrase may be making a very different statement, so taking the tweet out of context for manual review makes false positives very likely.
Shakespeare plagiarized. Plato plagiarized. Tom Lehrer penned many verses praising plagiarism. The bottom line is that plagiarism goes hand-in-hand with creation, and it should always be evaluated only in the entire context of both works - the plagiarizing and the plagiarized. What is being said is often not what's being written.
You do not have a moral or legal right to do absolutely anything you want.
Rodney Dangerfield's ghost is now doing telemarketing.
if this is supposed to be a new economy, how come they still want my old fashioned money?
Use enough discretion to determine when a joke is original, and when it's not. Then ignore accordingly.
I have to assume this entire story occurred because Slashdot hasn't been sold from Dice.
I made an attempt to start a conversation about the community purchasing the site, but it languished in the Firehose (it got the right color, and fast, but not the posting):
http://slashdot.org/submission...
Either Dice is compensating Bennet for inane commentary, or Bennet is paying Dice to have a platform from which to speak, inanely.
BlameBillCosby.com
"Read on for Bennett's take on how such a system could work."
Yuck.
Doesn't all this effort to prevent people plagiarizing jokes set a legal precedent for people reposting images without full credit to the owner, as we see on sites like imgur, 9gag, lolcats, and "I can has cheezeburger" in general? Will all memes require a lengthy set of movie style credits? How would this affect "reposts"?
The real article starts something like,
"The Slashdot world crucified Bennet "Disconnected from Reality" Haselton for building his career on writing nonsense about other people's unimportant situations expecting attribution."
This would reward Joke thieves
Or at least the ones who stole their jokes from somewhere other than Twitter. Say I tell a joke my comedy routine. Some other guy then steals my joke and puts it on twitter. Then later, I post it on twitter. Now my joke gets credited to him. Unless I make twitter my primary platform, this makes the situation worse.
Fails to account for the fact that no-one will "flag as duplicate" because it requires effort - pretty significant effort actually - and there's no reward.
The only "joke" here is that this total waste of time post made it onto Slashdot and wasted my time. I'm not laughing though.
Bennett.
Your commentary is not "stuff that matters". The community at large is not interested in the problems you investigate or the solutions you present. This is not opinion -- this is a fact supported by the vocal and snide comments your posts receive.
I am perplexed. I do not understand why your commentary is posted within a Slashdot "news article" itself and not posted on your own blog. Why are you given a position that allows your news to be posted here, instead of merely linked from a blog, like so much other commentary?
Are you employed by Slashdot? Are you employed by a company that has a parental or sibling relationship with Dice? Are you being compensated (or compensating someone yourself) to post your content here?
i only have a passing awareness of who Bennett is and that's enough to know he's a inane cockwomble.
go away Bennett. we don't care what you think.
Not really. Actually, I haven't ever heard about this guy before today.
I just wanted to feel special for a moment. Because there is not even a single comment (out of 58) showing any kind of appreciation for that guy!!
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Flagged as a duplicate of Bennett's thoughts on Digg, which is a duplicate of Bennett's thoughts on petitions.
Rated -1, lame and unoriginal.
I propose we call these categories "hashtags", and we can use a special character, like say the pound sign to designate that a tweet is intended for that category.
When the subset of users interested in moderating this category approve the worthiness of a tweet, they can signal their approval by forwarding it on to all of their followers. We could call this a "retweet". The cool thing about this idea is that not everyone's sense of humor matches up, so individual users can perhaps set themselves up somehow to only receive retweets from people who tend to retweet content they like. They could perhaps even then "retweet" that good content to their followers. Thus the best content would get spread much more widely, with perhaps the same exponential growth one sees in diseases.
A good way to "tag" this joke as a dup might be to use some kind of new feature to look for old tweets with identical content. Let's call that a "search". Then we can "flag" it by "retweeting" only the earliest version of the tweet to our users. Unless the new version has some new contributions that make it superior.
What an amazing insight! Well worth slogging through 16 paragraphs for .
Has B.H. ever responded to these allegations of being a boring choad or does he revel in his choadliness in complete silence?
Is this a joke? Can I repeat it?
Sure, if 1 crap joke is submitted, it wastes 1,000 people's time. But there's way more jokes on Twitter everyday. Even having to review 100 tweets a day is going to be a hassle, especially if you have to go find the original source.
Who cares where jokes come from? Some time ago, I posted one on line. The response came back that I had 'stolen' that joke from some guy named Jackie the Jokeman. Except that the joke I had posted originated during WWII, which predates Jackie somewhat.
It's getting to be like patent law. If you can grab something that is in the public domain and label it as yours, there's money (or an on-line reputation) to be had. But who fucking cares? It's not like these jokes are going to enrich our culture or promote the progress of science and the arts. And if someone claims that I am wrong and that they indeed do, then why is this function relegated to the likes of Twitter? There shouuld be a branch of the Library of Congress and funding for a research department to determine the provenance of jokes.
...an algorithm to stop Bennett drivel.
Try it! Library of Babel
I know slashdot views everything on social networks as trivial but social networks need content creators like comics to function. New content keeps people tuned in and checking for updates. Content creators post content for self promotion. Comics do it to raise their profile and get people to physical shows. If someone can just plagiarize it and not attribute it to the comic to the point where they drown you out this defeats the whole reason they post it. I'm a no-name comic (and a software developer) but even I had a joke stolen by some aggregator, He got like 10k+ likes on it (it was about the crash of a truck full of ramen noodles). Of course I had less than 100 on my post because he had a bigger following. But if he had just shared my post instead of pretending to take credit for it I could have used those likes and publicity to get a couple dozen more people at one of my shows and we'd all be happy. It makes me not want to bother posting my jokes. Now I'm not important but if more famous people feel the same way or it becomes a trend then it is a problem for a social network's business.
Windows 10's "telemetry" will send everything you type to Microsoft-controlled servers. Jim Stone speculates that "the tribe" is using this as a method to obtain everything everyone writes, to use for their purposes.
To now see a story about a "Fat Jew" who is stealing other people's jokes -- he could have saved a lot of his time and effort if he just subscribed to Microsoft's offering to the tribe.
I feel fantastic, and I'm still alive.
...One plagiarist orders a "Pink Mary". The second plagiarist says, "I too will have a Pink Mary". The first plagiarist then asks for Saltine crackers. "I too will have Saltine crackers", says the second plagiarist.
The first plagiarist turns to the second plagiarist in disgust and asks, "Why do you keep copying me?"
"Well, because I injured my foot, and [punchline censored by plagiarism software]
Table-ized A.I.
Useless phrases like "content creators" and "evil-doers" or a BH post?
He may very well be brilliant.
However, there is a large amount of animosity directed to him here on Slashdot. The main reason is because his posts are allowed to be posted INSIDE a Slashdot item itself, instead of on his own personal blog. This suggests that for some reason, the Slashdot editors value his commentary above most of the other stories posted here. The community cannot figure out why that is the case, because this commentary is perceived as neither "news for nerds" nor "stuff that matters".
Some community members also don't like his writing style (too wordy?). Some community members don't like the topics he decides to discuss (non-problems). Some community members don't like the solutions he describes (already solved in more simplistic ways or deeply flawed solutions). Some community members don't like his (perceived) arrogance or inability to parse constructive criticism of his commentary.
But overall, he is a target because he is being given special treatment, and I'm not aware of any clear statement from the Slashdot staff about why he is being given special treatment. (He won a court case that basically allows people to view porn in public libraries, but I don't see why that makes him special.) Personally I'd have no problem with Slashdot linking to his content through a frontpage item so long as it passed through the same editorial / upvoting process as any other story submitted through the firehose. ("same process as any other story" except Dice.com and itworld.com content, of course.)
It makes me not want to bother posting my jokes. Now I'm not important but if more famous people feel the same way or it becomes a trend then it is a problem for a social network's business.
If you haven't noticed, working comics generally don't post material they'd want to use because of this issue. Kelly Oxford used to post funny stuff, then she got a job writing comedy, and now her twitter feed is mostly boring.
Save your jokes for the crowd, you won't know if they work until you say them on stage anyways (speaking from experience).
"Who are you?" "No one of consequence." "I must know." "Get used to disappointment."
So will this be the end of family guy?
If you already have a following you don't have to self promote that hard. Still though plenty of comics post one liners on twitter and facebook and it gets them followers.
Please make it stop.
I do not want a random committee select what jokes I see. I am following someone because I like the jokes he likes. This is how real life works. The problem is not retweeting jokes, the problem is attribution. I a friend sends me a joke I would be annoyed had he added a note about where he got it from. I do not care where it is from, I do not care who invented it.
if someone makes a living from tweeting jokes, then attribution can be a problem. Actually making a living from tweeting jokes is a problem.
Not really. Actually, I haven't ever heard about this guy before today. I just wanted to feel special for a moment. Because there is not even a single comment (out of 58) showing any kind of appreciation for that guy!!
Unfortunately that's "special" as in "special needs".
To have a right to do a thing is not at all the same as to be right in doing it
Well, you're new to Slashdot, then? Rest assured that very soon you'll have heard too much from him too.
There's is no real expression of an idea to them at all. Perhaps a collection of jokes organized in a certain way, but individual one or two liners? that's utterly ridiculous!
Not giving attribution where known may not be highly regarded ethically, but that is ALL.
Thanks for sharing all this information
I have been participating (= reading/writing comments) in Slashdot during just a few months, but am starting to like the overall attitude here quite a lot. It was just an innocent joke highlighting something curious (first time I have seen a so unanimous behaviour against something/someone); although I never doubted that there might be quite good reasons for it.
After quickly skimming through this article and even before reading the comments, my opinion was: too lengthy, unnecessarily detailed and describing something which is not news (tons of filtering algorithms are built every second). Additionally, the target behaviour (plagiarism of jokes) didn't seem particularly interesting.
My opinion about this guy continues being the same: I don’t know him and that's why cannot have a valid opinion; although I have certainly got a pretty bad first impression.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
It was a joke, apparently a bit too difficult for you. Sorry for not having written a simple enough set of ideas suitable for readers of any "background". Please, feel free to ask me anything you need to know; also I will try to avoid complex ideas/humour and use as simple words/concepts as I can.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Yes. As commented above, I am quite new in the comments section, but am liking what I have seen so far pretty much (there are always some exceptions; like the special guy above. I mean... I want to help everyone, but sometimes feel like getting a bit more relaxed and using humour without having to explain every single bit; currently living in a remote area where finding people properly understanding anything is quite difficult).
The AC two comments above has written a quite good summary about the Haselton issue; and all the ideas on this front are quite clear already. I certainly look forward to continue participating in this community.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Why not copyright original insults. They can be directed to a subset of the offended group for rating, and then follow Hasselton's scheme. Any AC insults from this site that are passed on without retribution would be sent to Hasselton directly. Then he can repost them as Hassetons, a class of stolen insults.
I hereby patent the joke, i.e. a textual utterance or written work which involves one or more persons or animals or other objects which are portrayed in certain circumstances as indulging in behavior and/or speech which is intended to evoke a response of humor in the listener or reader. I have my lawyers ready to police this aggressively.
Star Trek transporters are just 3d printers.
"The comedy world crucified Josh "Fat Jew" Ostrovsky for building his career on re-tweeting other people's jokes without attribution."
ironically, that's the actual true story of what happened to Jesus.
Star Trek transporters are just 3d printers.