Ask Slashdot: Automated Tool To OCR CCGs Like Magic: the Gathering?
An anonymous reader writes I buy massive collections of trading card games, Magic:The Gathering, Yu-Gi-Oh!, Pokemon, Weiss Schwarts, Cardfight Vanguard, etc, etc. And I've gotten the process fairly streamlined as far as price checking, grading, sorting, etc. Part of my process involves using higher-quality web cams positioned over the top of the cards which are in a stack. I keep a cam window on the screen to show a larger, brighter version of the card. What I'm wondering: Is there is an OCR solution out there that will look at the same spot on the screen, capture, ocr, dump to clipboard, etc.? I've tried several open source solutions but none of them quite fit my needs. What I'd really like is to be able to hit a hotkey, and have my clipboard populated with the textual data of the graphics in a pre-set x,y window range. All this should be done via a hotkey. I may be asking for a lot, but then again, I'm sure someone out there has had need of this type of set-up before. Anyone have any recommendations?
I bet wizards of the coast will be totally cool with that.
Mod me down, my New Earth Global Warmingist friends!
$25 seems like a good deal, or did you mean $25,000 rather than $25.000?
A different method would be to have frames from the webcam be compared to a database of images and tally the matches. Space bar could serve as the "capture and compare image" function. Similar to http://www.tineye.com but local and with a limited data set.
But I just wanted to say that you are perhaps the biggest nerd I have ever been aware of. I mean that as a sign of respect.
I read the title and thought this article was going to be about DNA and the amino acid proline.
Grab an OCR system off of https://help.ubuntu.com/community/OCR. Get ImageMagick. Get streamer (package xawtv). Create a script on the order of:
now=$(date --iso-8601=ns) /dev/video0 -b 32 -o $file
file=$now.png
outfile=$now-cropped.png
streamer -c
convert $file -crop 40x80+150+120 $outfile
gocr $outfile > $now.txt
rm $outfile
Now create a keyboard shortcut with your window manager to run this script, or open a terminal and get used to pressing up and enter a lot.
If you're not on Linux, sorry.
Some European countries swap the , and ..
Some European countries swap the , and ..
True, but do any of those countries have English as an official language? I thought the choice of thousands or decimal separator depended on the language of the surrounding words: French uses one convention, German another, etc.
The submitter is actually handling meatspace items. RTFS: price checking, grading, sorting, etc.
I use it every day. The Android app is phenomenal at picking the right card from the database based on the picture. The only real problem is that it doesn't have all the alternate art versions of cards from older MTG sets. The interface is a bit sloppy on the desktop version, but the recognition is pretty good.
I can tell you that when I lived in Germany, even if I was writing in German, I got the decimal notation wrong every single time. I was just too used to my way of doing it.
It can be done by scraping the database of cards, creating a model out of them, then matching the new card to the database.
So how much is this worth to you?
OMG WTF TLA OCR CCGs?
A regular digital camera, on a tripod, 5 second timer
A Canon P&S, CHDK (intervelometer), swap the card before it clicks again.
Seto Kaiba has already done it, but added holograms.
Fight Spammers!
The submitter is collecting magic cards, and cataloging them for resale. Sounds "for profit" to me... and not like that fact is hidden.
In Emacs: Ctrl + M + T + G. Also runs a Monte Carlo on the last 3000 cards scanned and outputs the optimal 60 card deck and registers you in the nearest FNM.
/sarcasm ...
Gee, if only some one would invent a device to do repetitive work.
It would follow a set of what I'll call instructions.
And instead of hard-coding them, it would be programmable, so that it is more flexible.
I even have a name for it! A computer, because it "computes" the math along the way it needs.
Nah, that will never sell.
Hey, people can be convinced to buy any old ridiculous thing - just look at baseball cards. Or stamps. Or tulip bulbs. It seems a certain percentage of the population has an obsessive compulsion to hoard things and, all in all, playing cards beat old chicken bones or pizza boxes.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Hashes/checksums are unlikely to be of any use - all it takes is one pixel being a slightly different color and the hash will change completely, unless it's a fairly worthless hash to begin with.
There are various techniques by which you could "fingerprint" images in a more variation tolerant manner, but they have nothing to do with hashes/checksums, which are specifically designed to be able to detect even single-bit changes.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
I just use spaces for the digit grouping and a period or comma for the decimal. That way seems the least ambiguous to me.
$ apt-cache search ocr | grep -v ^lib | grep -i ocr | grep -i -v language | grep -v motocross
fonts-ocr-a - ANSI font readable by the computers of the 1960s
fuzzyocr - spamassassin plugin to check image attachments
gimagereader - Graphical GTK+ front-end to tesseract-ocr
gocr - Command line OCR
gocr-tk - tcl/tk wrapper around gocr
python-gamera.toolkits.greekocr - toolkit for building OCR systems for polytonal Greek
hocr-gtk - GTK+ frontend for Hebrew OCR
python-gamera.toolkits.ocr - toolkit for building OCR systems
ocrad - optical character recognition program
ocrfeeder - Document layout analysis and optical character recognition system
ocrodjvu - tool to perform OCR on DjVu documents
r-cran-rocr - GNU R package to prepare and display ROC curves
tesseract-ocr - Command line OCR tool
tesseract-ocr-dev - transitional dummy package
Maybe if it's a price, but it wouldn't be valid as an amount. Or can you tell me whose head is on the one mil coin?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
The smallest coin in the USA is the 1 cent coin, yet all gas stations have their prices end with 9 tenths of a cent. (Obviously they automatically round it up and keep it. Essentially it's a real world version of the stealing a fraction of a cent scam that has been used in movies for a very long time.)
No hurry though, still waiting to get my holographic Charizard back from Mt Gox...
Most linux users don't know this, but the man pages were named after Chuck Norris. Chuck Norris fsck'ing hates noobs!
I think he meant if you are dumb enough to take this AC post seriously, I will happily take your $25.
ôó
I do not know all that much about MTG, but my SO's kid plays it. There seems to be a new release of cards every few months. Certain cards get re-released, some old ones you can not use in tournament play, if a card is damaged it can't be used. There are all sorts of rare cards on whatever scale they use. I really this it is what is keeping comic book shops going. People will spend hundreds if not thousands buying cards. I am sure you can imagine to person who has to win spending that much on just one card.
uh, you're thinking of cryptographic/non-invertible/fast-mixing/whatever hashes specifically. it's not exactly defined what a hash is, but generally it means a possibly many-to-one (i.e. lossy) function of data, usually with outputs of fixed (or parametrizable) size.
for example, an OCR is a hash; it (ideally) hashes images of arbitrary dimension into an output space of characters according to which one it most resembles; similarly for any other image recognizer.
"They were pure niggers." – Noam Chomsky
Odd that you should ask this question a few days after I started trying to create a solution for myself. This is a strictly for profit venture for me. Apparently paying for my kid's college fund is naughty in some circles. Not sure how that works out for the world economy but I digress. I've spent about six days on this and might be able to save you some dead alleys. Mostly I've found a lot of frustration. My plan was to develop an app which could scan images of cards via a flat bed 9 at a time, crop those to single images, then extract the trading card title. It would then run the title against any number of online databases for current value of card. Going in to it I did not expect major issues. I've done OCR on many types of trading cards using Microsoft OneNote and text extraction is nearly 100% accurate. So I figured this was simple. Not so. I decided to use Tesseract which seems to be the open source gold standard for OCR. However I discovered rather quickly that tesseract does almost no preprocessing of the image and spits out perhaps at best 5% accurate text on these cards. So I went to image magick and graphics magick to see if I could use them to format my incoming scans in a way that tesseract could use. The teseract and image magick communities have been very helpful in trying to help me find a solution, however the reality is no simple, or even sort of simple solution exists. I'm shocked and amazed but it seems that no real world out of the box solution exsists for open source OCR. That is the stick. At this point I am at a cross roads. I have neither the program skills or time to devote to creating OCR for this. There is a good project for MTG cards using a webcam on github. However it is specific to magic cards and from what I can tell actually does image matching more than OCR. I am either going to abandon the project, or, and this is corny, write a script to drop the files in to one note and use that clunky interface for OCR. It's an awful awful solution, but due to my limited programming skills, and the lack of integration between image preprocessing and open source OCR, I think that is where I'm at. I may be missing some thing but I think this is where I'm at.
Why is that a problem? There's nothing at all nefarious about buying and selling MTG or any other CCGs. In fact that producers of the cards rely on it. It fuels a good portion of their sales.
As far as becoming a rival to a site, so what? If he can do it better then more power to him. If he can get people to help him to it better for free, even more so. In the end, what do you care unless you have some vested interested in one of the services that already does this kind of thing.
"But we have to pass the bill so that you can find out what is in it,..." - Nancy Pelosi
Card damage is a concern. However this could be avoided by feeding cards loaded in to protective sheets provided the scanner was a. robust enough to take that thickness, and b. gentle enough not to curl them. My issue is ocr. The preprocessing is complex and makes my poor head hurt.
Why is that a problem?
Because it's another example of turning Ask Slashdot into Slashdot, Please Do My Job.
Remind me to never ask you for directions. Sheesh.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
Search Ebay for sold items with "Box Only" in the title. You'd be amazed how much some OEM boxes sell for.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
Indeed. I'm unconvinced though that a lot of those prices aren't due to careless idiots not paying close enough attention to what they're buying. Especially considering the deceptive images often included.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Oh, and then I suppose there's money laundering as well. It's not uncommon to see things selling on ebay, Amazon, etc. with prices that are hard to explain any other way.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Not a bad idea. Except that the card names in MTG are hardly normal English words. That might complicate matter for voice.
Currently I'm using OpenCV and a lot of glue code to scan real-time video and recognize cards for MtG. The database is easily extendable for Pokemon, Yugioh, L5R, and other card games.
I wrote it in Python on the PC, and recently ported it over to native Android. So far it works really well, and you can see a screenshot of it in action right here:
http://imgur.com/gallery/v44gIbB
Like others, I'm trying to put my kids through college, and am not quite willing to open-source my months of work just yet. However, I'm not looking to scalp anyone, and my rates are very reasonable. Feel free to PM me if you would like me to license this library to you -- it would be a fairly turn-key solution for you.
It gets even more interesting when the last digit is a 5. Accounting rules kick in and you round towards the nearest even number so $22.995 = $23, but $22.985 = $22.98.
This one threw me for a loop when I first hit it as there are some programming languages that the default Math.Round function follows to RNE (round nearest even) definition.
-Rick
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs