Slashdot Mirror


MIT Offers Picture-Centric Programming To the Masses With Sikuli

coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it." Here's a video demo of the technology, and a paper explaining the concept (PDF).

24 of 154 comments (clear)

  1. FrontPage? by Itninja · · Score: 2, Interesting

    Sounds like the Microsoft FrontPage of coding software. Why do with text what you can do with pictures? And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

    But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.

    --
    I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
    1. Re:FrontPage? by gad_zuki! · · Score: 4, Informative

      >And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

      Do you want to democratize technology or just have it controlled by elites? Non-techies want to do things like scripting and web design without paying a professional, the same way they want to fix things around the house or fix the car. When it comes to small or easy jobs, a non-expert can do just fine. Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.

      While Im certainly no fan of Frontpage, I feel that it wasnt much worse than Mozilla Composer or other WSIWYG html composers.

    2. Re:FrontPage? by BobMcD · · Score: 4, Insightful

      Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).

      From a business point of view, it actually did. People used VB, and particularly VB macros in Office, to do things that resulted in a lot of dollars flowing through a lot of organizations. Yes it did eventually need to be changed out, but in it's time, for it's purpose, you can't really fault it. It truly did work.

    3. Re:FrontPage? by AardvarkCelery · · Score: 3, Insightful

      Yeah, that's real easy for a programmer to say. Ever used a brownie mix? I'll bet a pastry chef would say, "I'd like to see people who wish to bake brownies actually learn how to bake brownies properly." Tools like Sikuli are the programming equivalent to brownie mix. It's easy gratification. (... or at least easier than learning to capture part of the screen and then do fuzzy image pattern matching on it.) If I were a very casual, light duty programmer, this would be pretty helpful sometimes.

    4. Re:FrontPage? by MarbleMunkey · · Score: 2, Insightful

      Just because code is text (and can be readily generated) doesn't mean that anyone with notepad should be able to write it.

      Wrong. It means exactly that. BASIC was a language meant to make it easier for people to program, and it was THE introduction to programming for many people. Now, am I arguing that BASIC is the right language for any given task? no. I haven't coded in BASIC for 15 years. But making the barrier to entry easier can only be a good thing for attracting new blood.

  2. Potential by zero0ne · · Score: 2, Insightful

    Especially for Testing your GUI.

    This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).

    1. Re:Potential by jdimatteo · · Score: 2, Interesting

      I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.

      Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer to that variable throughout your test, so if a button happens to change dramatically, you make the change in potentially one place in your code instead of every time it is used in a click. The fact that the tool appears simple (not too many bells and whistles) and is based on Python seems to be major advantages for maintainability.

      Check out this interesting academic paper which specifically addresses using Sikuli for automated GUI testing: "GUI Testing Using Computer Vision, CHI 2010" at http://sikuli.csail.mit.edu/documentation.shtml

      Has anybody actually used Sikuli? I'd be very curious if anybody has used this for automated GUI testing in a corporate environment...

  3. MMO macro maker? by visgoth · · Score: 4, Interesting

    This looks like a powerful tool for gold / isk / whatever farming. I'm tempted to resurrect my eve account and see if I can make an auto-miner script.

    --
    My patience is infinite, my time is not.
  4. Better by pavon · · Score: 2, Interesting

    Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.

    How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically looking (empty) text fields, with the only distinguishing factor being the label beside them, and clicking on the label does not select the text field. Like screen scraping, it is also somewhat fragile to UI changes (although not as much as other GUI scripting tools that rely on pixel location).

  5. My grandmother knows python by Anonymous Coward · · Score: 5, Insightful

    "Computer users with rudimentary skills"..... "with a basic understanding of Python"?

    1. Re:My grandmother knows python by Fred_A · · Score: 5, Funny

      "Computer users with rudimentary skills"..... "with a basic understanding of Python"?

      Computer users with a rudimentary skill who do not have a basic understanding of Python can always build a Python programming AI in Lisp (or at least that's what I gathered from the MIT docs I browsed) and thus save themselves the trouble.

      --

      May contain traces of nut.
      Made from the freshest electrons.
    2. Re:My grandmother knows python by AardvarkCelery · · Score: 3, Informative

      If a friend wanted to learn just enough programming to do a few light chores, what would you recommend? Python is arguably one of the easiest languages to learn. Randy Pausch used it for Alice, which has been successful for teaching middle school girls how to program. So if "computer users with rudimentary skills" means rudimentary programming, then that works for me.

  6. The Cow pat model by Anne+Thwacks · · Score: 5, Funny
    Yeah - lets hear it for a new development model:

    For years I have been asking for a softwsare development tool that allows me to write PHP code by throwing cow-pats at the screem with the Wiimote.

    And my colleagues wat a tool that allows dispatching my bugs with the Wii gun attachment they use in "Quantum of Solace".

    --
    Sent from my ASR33 using ASCII
  7. Program, NOT code. Think MACRO by SmallFurryCreature · · Score: 3, Insightful

    From what I seen is this a macro program that can use screenshots rather then key/mouse data to automate tasks. So you PROGRAM your PC in the same way you PROGRAM a VCR to record a show. It is NOT the same as writing an application.

    But it seems very intresting once you got past this difference. Macro's are very handy for testing in my experience but often have a problem because a tiny mis-alignment can ruin it all. If this program is smarter because it can regonize where data is supposed to go... well that would certainly make automated tests a bit easier.

    Interesting stuff. Just don't think you will be writing software with this.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

  8. Yes, but can Sikuli be used to write Sikuli? by hellop2 · · Score: 2, Funny

    Otherwise it's just not complete, IMHO.

    --
    How many more years will slashdot have an off-by-one error on your Score in your profile?
    1. Re:Yes, but can Sikuli be used to write Sikuli? by Seor+Jojoba · · Score: 2, Interesting

      Yes, you could use Sikuli to fire up a text editor, individually press the keys to write all the lines of code, launch the compiler/linker/whatever. So it meets your weird definition of completeness. However, I suspect you could not use Sikuli to write a program that writes a Sikuli program to write Sikuli. I could be wrong, though.

  9. lame by Charliemopps · · Score: 2, Insightful

    This is the same sort of scripting you can do with many already existing languages. Autohotkey for example. The only new feature would be the ability to copy the screenshot directly into the program as apposed to taking it outside the program and referencing the file directly. I'd say that this scripting language is actually weaker because of it. As far as using this inside a game... they are already hardened against this sort of thing. For example, next time you're in EVE look at the buttons you use. They are semi-transparent. This is not just for aesthetics. If you take a screenshot of the button, and then change your camera angle the button looks different because what's behind it is different. That doesn't mean you can't script inside EVE, you just have to be a lot more clever than using a script to click on a static image of the gui. This language would be almost completely useless in any GUI that has any transparency. Which I'd think would include Vista, Win7 and even Macs with the right stuff turned on.

  10. Re:How easy IS it? by Anonymous Coward · · Score: 2, Funny

    Wow, no one has watched the movie Swordfish have they?

    We're trying to repress those memories, you insensitive clod!

  11. Its a brilliant idea. by Seor+Jojoba · · Score: 2, Insightful

    Come on, let's cut through the default Slashdot snark. The image capture aspect of Sikuli is brilliant! I don't like the tagline "program anything with Sikuli" because 99% of software should be written in something else. But think of writing test scripts that can use the image matching features. If the software works as advertised, then you could throw together UI test cases way faster than anything else I've seen. System administration tasks should be a good match too. The resulting code would be brittle and hard to maintain, but for quick one-off scripts, sure... I can see it.

  12. The Sikuli School of Programming by presidenteloco · · Score: 2, Funny

    if NOT understand logic then
       loop
          talkTo (self, "Don't program!")
          Look (@ Pretty pictures)
       endloop
    endif

    --

    Where are we going and why are we in a handbasket?
  13. Think executable step-by-step tutorials by tucuxi · · Score: 4, Insightful

    Sikuli is certainly not commercial-grade UI testing software. It was never intended to be, this is academic software written to explore ideas, rather than to polish them to perfection. Also, it is not a "general" programming language. The previous posters that compared it to video-programming are right: not all programs have to target complicated algorithms and data-structures, there is plenty of space for automating "simple stuff".

    As an idea, I find the readability of the code particularly interesting. Sikuli code is about the closest you can come to self-explanatory, step-by-step instructions on how to achieve whatever a particular program does. Add a few comments to the most arcane steps, publish those programs to an online repository, and presto! executable step-by-step tutorials.

    Yes, the developers may have to address the variability of themes on people's desktops. It is certainly possible to do so (for instance, by keeping a list of mappings from any of a set of "supported" themes to a "canonical" theme, which would be used in all examples), but, as far as ideas go, I really think that Sikuli is a very refreshing idea.

    1. Re:Think executable step-by-step tutorials by tristanreid · · Score: 3, Interesting

      I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this is basically how to do it with trivial coding.

      I think of this as an equivalent to something like sql. There's a domain in which you'd like to impose logical structure (relational data / images), and you generally use the language to great effect in conjunction with another programming language. If I had to write a scheduled task for my laptop that needed for me to be on the VPN, I'd much rather use something like this to handle the connection rather than trying to figure out how the VPN API works.

      -t.

  14. What's so wrong with TurboTax? by AardvarkCelery · · Score: 2, Interesting

    Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.

    For those kinds of people, Sikuli looks pretty cool because they can do things that would be pretty difficult otherwise. Hey, even for a lot of experienced programmers, capturing a region of the screen and doing fuzzy pattern matching might be a significant task. I haven't tried Sikuli yet, but it looks like it would be very helpful for some things, and a lot easier to deal with than AutoIt or AutoHotkey.

    (BTW, TurboTax was just an example. I actually use something I like better, but you get the idea.)

  15. Cool, but it has severe downsides. by mrjb · · Score: 2, Interesting

    The idea is cool and innovative, and makes automating a point-and-click interface a breeze. It certainly has applications.

    But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.

    Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.

    --
    Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book