MIT Offers Picture-Centric Programming To the Masses With Sikuli
coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it."
Here's a video demo of the technology, and a paper explaining the concept (PDF).
Sounds like the Microsoft FrontPage of coding software. Why do with text what you can do with pictures? And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.
But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
This looks like a powerful tool for gold / isk / whatever farming. I'm tempted to resurrect my eve account and see if I can make an auto-miner script.
My patience is infinite, my time is not.
Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.
How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically looking (empty) text fields, with the only distinguishing factor being the label beside them, and clicking on the label does not select the text field. Like screen scraping, it is also somewhat fragile to UI changes (although not as much as other GUI scripting tools that rely on pixel location).
Yes, you could use Sikuli to fire up a text editor, individually press the keys to write all the lines of code, launch the compiler/linker/whatever. So it meets your weird definition of completeness. However, I suspect you could not use Sikuli to write a program that writes a Sikuli program to write Sikuli. I could be wrong, though.
I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this is basically how to do it with trivial coding.
I think of this as an equivalent to something like sql. There's a domain in which you'd like to impose logical structure (relational data / images), and you generally use the language to great effect in conjunction with another programming language. If I had to write a scheduled task for my laptop that needed for me to be on the VPN, I'd much rather use something like this to handle the connection rather than trying to figure out how the VPN API works.
-t.
Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.
For those kinds of people, Sikuli looks pretty cool because they can do things that would be pretty difficult otherwise. Hey, even for a lot of experienced programmers, capturing a region of the screen and doing fuzzy pattern matching might be a significant task. I haven't tried Sikuli yet, but it looks like it would be very helpful for some things, and a lot easier to deal with than AutoIt or AutoHotkey.
(BTW, TurboTax was just an example. I actually use something I like better, but you get the idea.)
I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.
Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer to that variable throughout your test, so if a button happens to change dramatically, you make the change in potentially one place in your code instead of every time it is used in a click. The fact that the tool appears simple (not too many bells and whistles) and is based on Python seems to be major advantages for maintainability.
Check out this interesting academic paper which specifically addresses using Sikuli for automated GUI testing: "GUI Testing Using Computer Vision, CHI 2010" at http://sikuli.csail.mit.edu/documentation.shtml
Has anybody actually used Sikuli? I'd be very curious if anybody has used this for automated GUI testing in a corporate environment...
The idea is cool and innovative, and makes automating a point-and-click interface a breeze. It certainly has applications.
But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.
Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book