MIT Offers Picture-Centric Programming To the Masses With Sikuli

FrontPage? by Itninja · 2010-01-21 09:25 · Score: 2, Interesting

Sounds like the Microsoft FrontPage of coding software. Why do with text what you can do with pictures? And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.

Re:FrontPage? by Anonymous Coward · 2010-01-21 09:27 · Score: 0

Sounds like LabView - very useful for somethings, painfully tedious for others.
Re:FrontPage? by gad_zuki! · 2010-01-21 09:53 · Score: 4, Informative

>And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.
Do you want to democratize technology or just have it controlled by elites? Non-techies want to do things like scripting and web design without paying a professional, the same way they want to fix things around the house or fix the car. When it comes to small or easy jobs, a non-expert can do just fine. Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.
While Im certainly no fan of Frontpage, I feel that it wasnt much worse than Mozilla Composer or other WSIWYG html composers.
Re:FrontPage? by ArhcAngel · 2010-01-21 10:03 · Score: 1

That was my first thought as well. I programmed in HP VEE and Labview in the early nineties.

--
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
Re:FrontPage? by mustafap · 2010-01-21 10:18 · Score: 1, Insightful

>Do you want to democratize technology or just have it controlled by elites?
Neither. I'd like to see people who wish to program, learn how to.

--
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
Re:FrontPage? by Yvan256 · 2010-01-21 10:19 · Score: 1

The problem with FrontPage wasn't the users, it was the code that it produced.
Re:FrontPage? by Xiaran · 2010-01-21 10:19 · Score: 1

Elite or competent? I'm all for people tinkering with software in their spare time the problem is people who arent qualified start thinking *everything* in software development is as simple as the tiny little things they are doing. Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).
Re:FrontPage? by idontgno · 2010-01-21 11:05 · Score: 1

Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.
Thousands of cars on cinderblocks and dozens of houses with flooded basements are testimony that sometimes, paying someone is the only thing that isn't ridiculous. There's DIY, and there's "OMG you are SO in over your head." Anyone whose software development abilities are so stunted that the "advancement" outlined in TFA would help them is absolutely in the latter category.

--
Welcome to the Panopticon. Used to be a prison, now it's your home.
Re:FrontPage? by BobMcD · 2010-01-21 11:09 · Score: 4, Insightful

Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).
From a business point of view, it actually did. People used VB, and particularly VB macros in Office, to do things that resulted in a lot of dollars flowing through a lot of organizations. Yes it did eventually need to be changed out, but in it's time, for it's purpose, you can't really fault it. It truly did work.
Re:FrontPage? by Anonymous Coward · 2010-01-21 13:42 · Score: 0

Changed out? Then why is the local hospital here hiring VB developers?
Re:FrontPage? by ilsaloving · 2010-01-21 14:15 · Score: 1

There is a minimum level of skill and talent required to do anything. The only thing that happens when you make something "so simple anyone can do it", is a minefield of crap software. Instructing a computer to do something requires the ability to think abstractly, and organize/plan with an orders of magnitude more sophistication than "Do I want eggs or pancakes for breakfast?". Arguing that the 'elites' are pushing down the 'DIYers' is disingenuous. A real DIYer will overcome the learning curve of whatever they're trying to do, because they care enough to put the effort into it.
There's a big difference between that, and someone who just wants to slap a bunch of widgets together and expect it to work.
The end result is a bunch of people who don't know what the hell they're doing, but demand that they be called programmers. You also get other people who, when needing specialized software to run some key part of their business, look to these non-skilled 'programmers', and then turn to the skilled people and complain how unreasonable their higher rates are. It's downright insulting.
It's the same mindset (or lack thereof) that many people think Y2K was a big waste of time and money because 'nothing happened'.
Hell, it's (relatively) easy to program an iPhone too. What do we have? Tens of thousands of apps that emit varying types of fart sounds.
Re:FrontPage? by AardvarkCelery · 2010-01-21 14:58 · Score: 3, Insightful

Yeah, that's real easy for a programmer to say. Ever used a brownie mix? I'll bet a pastry chef would say, "I'd like to see people who wish to bake brownies actually learn how to bake brownies properly." Tools like Sikuli are the programming equivalent to brownie mix. It's easy gratification. (... or at least easier than learning to capture part of the screen and then do fuzzy image pattern matching on it.) If I were a very casual, light duty programmer, this would be pretty helpful sometimes.
Re:FrontPage? by Anonymous Coward · 2010-01-21 17:47 · Score: 0

"the same way they want to fix things around the house or fix the car"
And that's why we have mechanics and builders. Why should coding be any different. Just because you went to W3C and did a tutorial does not make you a web developer. Shit, I just went to WebMD, I'd like to diagnose some illnesses now.
Sure, if someone wants to make their own homepage, fine, go ahead and tinker. But would you trust you a financial management web app with all your personal details written by your accountant?
Just because code is text (and can be readily generated) doesn't mean that anyone with notepad should be able to write it.
Re:FrontPage? by CodeBuster · 2010-01-21 18:50 · Score: 1

Do you want to democratize technology or just have it controlled by elites?
If I have to be in charge of cleaning up their mess then I would prefer that they leave development to the professionals. I think that is what the parent is getting at. We professionals are tired of rescuing dabblers who get in over their heads because their "easy to use" tools are just powerful enough to get them into trouble, but not powerful enough to get them out. If people agree to be responsible for their own results, good or bad, then I say let them do as they wish. Unfortunately, it never seems to work out that way in the real world. If their experimental project causes me to have an unplanned interruption in my own development work to help them out of a jam, I am probably going to be unhappy.

a lot of computer stuff is pretty easy and paying someone is ridiculous.
The famous last words of many who come to IT, hat in hand, and ask us to "fix it" or "make it work". It may be easy for the Slashdot crowd, but in my experience real computer knowledge is less common in the general population than you might otherwise believe.
Re:FrontPage? by blippo · 2010-01-22 01:38 · Score: 1

Hear hear. Wish I had modpoints.
Its not the writing that's hard, it's the thinking, and that is why drawing flowcharts that aren't expressive enough just makes simple problems simple, and real problems impossible to solve.
However, this is not about programming using graphical tools really, it's about finding out where to send keyboard and gui events in an automation script.
There are other ways, but none that is 100% universal.
Not all applications uses controls that you can access programatically (some are just drawings on a bitmap - like java, flash etc).
So using image recognition is a nice touch, and better than guessing : which is the alternative.
Now, there is limited use for this, but some corporations are willing to spend some serious money on application integration, no more sofisticated than sending keyboard events to old applications, so it might come handy.
Re:FrontPage? by azmodean+1 · 2010-01-22 02:45 · Score: 1

Ever baked brownies without a mix? Surprise! it's not actually hard. Simple Brownie Recipe
I'm NOT a chef, and I think people SHOULD learn to cook, and do basic home repair, and understand how their car works, and if they have a task that remotely requires it, yes they should learn how to program.
Sure there are extremely difficult tasks in each of these areas that shouldn't be attempted by an amateur, but the basics of all of them are readily accessible, and beneficial to learn, if for no other reason than to occupy yourself with something other than mindless consumption of sitcoms. (or slashdot articles)
Back on topic, I AM a software engineer, and I am highly skeptical about the "natural language programming" concept, but that isn't what this article is about. This is just another system for scripting GUI events with a simple interface. If you RTFA, you would find out that the system uses Jython as its scripting language, so the actual code isn't any more "visual" than any oher programming task, it's just that the (often very confusing, even for pros) GUI API has been replaced with a visual system that allows the script to act as if it were a user. I think a system like this has its place if you don't need lots of speed, which is a large number of applications.
Re:FrontPage? by MarbleMunkey · 2010-01-22 02:50 · Score: 2, Insightful

Just because code is text (and can be readily generated) doesn't mean that anyone with notepad should be able to write it.
Wrong. It means exactly that. BASIC was a language meant to make it easier for people to program, and it was THE introduction to programming for many people. Now, am I arguing that BASIC is the right language for any given task? no. I haven't coded in BASIC for 15 years. But making the barrier to entry easier can only be a good thing for attracting new blood.
Re:FrontPage? by Anonymous Coward · 2010-01-22 03:20 · Score: 0

Why is it whenever there is talk of a poorly implemented piece of software created with "ease of use" as its central focus the topic of conversation immediately reverts to Microsoft?
Re:FrontPage? by BobMcD · 2010-01-22 03:34 · Score: 1

That's a false assumption. VB = Microsoft, and more importantly, FrontPage = Microsoft.
Re:FrontPage? by cyberthanasis12 · 2010-01-22 07:15 · Score: 1

Do you want to democratize technology or just have it controlled by elites? Non-techies want to do things like scripting...
You make the implicit assumption that windows and icons and clicks are easier for non-techies. I am afraid it is not so. The non-techies are functional human beings and they are able to understand ordinary scripts. And it is actually easier to learn text than icons.
I am not making this up. Back in the early 90's, when windows was novel, many times I sold computers to complete novices. What is saw was that it was easier for them to understand commands (DOS style) than icons (windows) style. It took them much time to get accustomed with icons, buttons pull down menus (for example, "in which menu is the replace command? why can't I type replace and do the job").
The same thing happened to me when a friend bought a macintosh and told me how "easy" was to operate the machine. I could not figure out how to remove a file. When my friend opened the file manager, clicked on the file (holding the mouse button pressed), and dragged it to an icon, which was the "garbage", I could not believe how time consuming, clumsy and unintuitive it was. Of course, after years of getting accustomed to windows, it was straightforward to learn the Mac too.
Off topic, what happens if you try to run the Sikuli script to a computer with different resolution, or different font, or different (customized) icons.
Re:FrontPage? by thetoadwarrior · 2010-01-22 07:56 · Score: 1

HTML and CSS is pretty easy to learn, at least enough to produce the shit people produce with front-page.

Sikuli is good though it would appear it takes control of your computer so really it's pretty useless aside from dong batch repetitive jobs. You can probably get around that by learning more Python but the demo didn't interest me but I'm sure loads of people will love it for automating tasks.

Front-page on the other hand was awful and produced loads of awful sites that unfortunately affect more people. When it's a personal site that's fine but when you used to get businesses knocking out shit websites in frontpage, I don't think anyone can justify it by comparing it to DIY jobs or saying web develops are expensive. Companies should consider some level of accessibility and you'll never really get that from front page. If you learn to do html, css and JavaScript then just do it in Notepad++

And no one *needs* a master in CS. A lot of professional developers don't.
Re:FrontPage? by thetoadwarrior · 2010-01-22 07:58 · Score: 1

Actually brownies (or a lot of food) isn't that hard to cook if you follow the instructions and people would probably be healthier if they would actually learn how to cook rather than rely on chemically pre-made foods. It's no surprise some of the fattest nations are full of people who can't cook.
Re:FrontPage? by thetoadwarrior · 2010-01-22 08:02 · Score: 1

Yes if you ignore the fact that VB macros causes loads of security head aches and issues with compatibility.

Imo, VB wasn't actually that bad in the hands of people that took a real effort to learn how to program but in the hands of most it was a freaking nightmare and caused loads of problems.

DIY programming is fine for personal use but it should never be used for businesses.
Re:FrontPage? by BobMcD · 2010-01-22 08:10 · Score: 1

Security issue, yes. Loads of problems, yes.
Doesn't obviate what I said. In the moment lots of dollars changed hands because of it. That, among all else, is what a business would use to determine 'success'.
Re:FrontPage? by mustafap · 2010-01-22 10:38 · Score: 1

>Ever used a brownie mix
Yes, but I didn't consider myself a chef afterwards

--
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
Re:FrontPage? by wondershit · 2010-01-23 03:08 · Score: 1

Non-techies want to do things like scripting and web design without paying a professional, the same way they want to fix things around the house or fix the car.
I really doubt that. And from what I hear you can't easily fix most issues with cars anymore.
Re:FrontPage? by Anonymous Coward · 2010-01-27 13:20 · Score: 0

This is only news. This is not new. From the overview I just saw (briefly), there is a program called AutoIt (http://www.autoitscript.com/autoit3/) that has been around for a while now. In the 1990s I think Norton had something similar. This just seems like an old idea coming around again with a new twist. Personally, from what I see in the demo, the images slow it down it. The images make it a little easier, but do you want everyone programming a computer? Really? What will the broken cup holder types do damage wise?

Potential by zero0ne · 2010-01-21 09:30 · Score: 2, Insightful

Especially for Testing your GUI.

This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).

Re:Potential by Anonymous Coward · 2010-01-21 09:45 · Score: 0

The only problem is that it is non-deterministic. It works off of pattern matching. There are cases where you can have statistical matches that are wrong such is having multiple "Add" or "Remove" buttons on your screen. Not to mention upgrades with new icons/graphics/layout/test will add "noise" to the search domain.
Re:Potential by Jonah+Hex · 2010-01-21 09:54 · Score: 1

Watching the YouTube demo, I immediately thought of how basic this is compared to AutoIT's functions, and even the quick record function is faster to "program" with than this screenshot function.
It says it can tolerate some changes, but what if there is a completely different visual theme installed? What if a drop down is not on the same item it was when you made the script? AutoIT can take care of this by reading the underlying GUI code to allow for these kind of things. As someone who has been automating OS/Software installs since before Windows, I know you can not expect things to work the same way every time when doing so.
Jonah HEX

--
Horror & SciFi Erotic Nudes
Re:Potential by BitZtream · 2010-01-21 10:01 · Score: 1

The MS test crap in the latest versions of VisualStudio do it as well, and they'll be happy to find a button (if its a standard control) to click on using other data rather than mouse coordinates as well.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Potential by gad_zuki! · 2010-01-21 10:01 · Score: 1

>This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).
Right, its AutoHotKey/AutoIT with a nicer OCR library. Perhaps this will light the fire under the butts of the AutoHotKey devs and add in some smarter screen reading and browser integration.
Re:Potential by Anonymous Coward · 2010-01-21 10:57 · Score: 1, Interesting

Eggplant says hi.
As a professional test automator, I'd like to point out that automation by image recognition is the method of last resort. The #1 concern in GUI automation is maintainability, and image recognition is the least maintainable method of automation there is short of recording mouse coordinates and keypresses. If you change your theme, if the developer rearranges the controls, if any text is changed, the script is broken. The idea of using image recognition for web page automation is right out. Web sites change way too often for something like this.
The key to writing maintainable scripts is finding and hooking into the property that is least likely to change. If you're automating Windows Forms .NET apps, you might be able to get the actual variable name. If you're automating web pages you could look at the id or name of the control. You can look at the text of a button or the label of a textbox. You find whatever you can that won't change.
On Windows, use AutoIT if you want something free. There's better commercial tools but they start in the hundreds of dollars and only go up from there.
For web automation, look at watir, WebDriver/Selenium, or WatiN.
On Macs you get these nice tools called AppleScript and Automator. These are made for end users. They don't use the UI, but instead use an interface made just for automation.
If you can at all avoid it, I recommend not using image recognition tools. They're extremely fragile. That said, sometimes it can't be avoided. I'll probably take a look at the source to see if there's anything I can use in those few cases where image recognition is unavoidable.
Re:Potential by jdimatteo · 2010-01-21 16:08 · Score: 2, Interesting

I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.
Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer to that variable throughout your test, so if a button happens to change dramatically, you make the change in potentially one place in your code instead of every time it is used in a click. The fact that the tool appears simple (not too many bells and whistles) and is based on Python seems to be major advantages for maintainability.
Check out this interesting academic paper which specifically addresses using Sikuli for automated GUI testing: "GUI Testing Using Computer Vision, CHI 2010" at http://sikuli.csail.mit.edu/documentation.shtml
Has anybody actually used Sikuli? I'd be very curious if anybody has used this for automated GUI testing in a corporate environment...
Re:Potential by ComaVN · 2010-01-22 01:25 · Score: 1

I just tried it out for an hour or so with our web application, and it seems to be doing it's job. One thing that it didn't manage to do is click somewhere relative to the matched image. It always seems to click in the middle of the image, which is annoying when you want it to click one checkbox out of many based on it's preceding label.
Perhaps it's possible to use some kind of nesting, so you could try to find the image of the checkbox inside a previous match that includes the label, but I didn't find out how, because the documentation is kind of sparse atm, apart from the tutorials.
I'd say this is very interesting software, but only alpha quality at this stage. (opening and saving "projects" is quite cumbersome for instance, and I've seen several stack traces in it's debug pane)

--
Be wary of any facts that confirm your opinion.
Re:Potential by ubersoldat2k7 · 2010-01-22 01:51 · Score: 1

AFAICSOTV (as fas as I can see on the video) you can issue a tab keypress to jump to different controls. Maybe you can try that out.
Re:Potential by ComaVN · 2010-01-22 03:12 · Score: 1

Yes, but that kind of defeats the purpose of using image recognition so you don't have to care about the exact layout of your application. Inserting a new control on the page could break the test if you used tabs

--
Be wary of any facts that confirm your opinion.
Re:Potential by Anonymous Coward · 2010-01-22 03:37 · Score: 0

Abstraction is definitely your friend in automated testing, but image recognition is still the least maintainable way of interfacing with a UI. Any change to the UI requires fixing up your framework. That takes time that a smarter framework would not need.
Imagine there's a an input button on a webpage with an ID of submit and the text submit on it. Now, the text is changed from submit to OK. A smart framework that hooked on the id doesn't need to be updated at all. With an image recognition framework you need to go to that button in the UI and take your new screenshot. How long does it take to navigate to the button to begin with?
Now the next day the style changes. Your button is no longer square and gray, but round and blue. The smart framework still doesn't need an update. With the image recognition framework you need to update your screenshot again.
The next day the id changes. With the smart framework you change the mapping. With an image recognition framework you have to navigate and take your screenshot again.
GUIs change way too much to have to be continually updating your mapping. Don't use image recognition if you can avoid it.
Re:Potential by ElizabethGreene · 2010-01-22 04:16 · Score: 1

I was thinking the same thing. Where this would be really handy would be in applications that paint their own windows and don't expose the gui handles for AutoIt to latch on to. Specifically, this would work great for Great Plains or online poker clients. :)
-ellie
Re:Potential by TheLink · 2010-01-22 04:24 · Score: 1

I wonder how Sikuli copes with "click page down till you find the icon you need to actually click on".

How about if the stuff you click on might look rather different each time? e.g. the IP address might not be 0.0.0.0 but something else the second or third time around.

And what if the stuff you need to click on can only be identified by text or an icon that you don't click on - you actually click on the stuff to the right (or left or whatever) of it. This one isn't a biggie - it shouldn't be too difficult to get Sikuli to search for the text, then search from an offset for the stuff to click on, or click on a relative offset point.

But yeah it's interesting.
--
- Too many replies beneath your current threshold

MMO macro maker? by visgoth · 2010-01-21 09:32 · Score: 4, Interesting

This looks like a powerful tool for gold / isk / whatever farming. I'm tempted to resurrect my eve account and see if I can make an auto-miner script.

--
My patience is infinite, my time is not.

Re:MMO macro maker? by BoppreH · 2010-01-21 10:16 · Score: 1

Things to take into account:

- selecting and clicking on see-through buttons (the background will change too much)
- the program access to the actual game for seeing, clicking and typing
- the game's anti-hack detection / counter-measures
- macro playing lag (see video)

But it seems very promising nevertheless.
Re:MMO macro maker? by burkmat · 2010-01-21 10:36 · Score: 1, Offtopic

I don't know how much experience you have in EVE, but generally, if you're AFK you're dead meat. Suiciding miners even in hisec is quite fashionable these days.
Re:MMO macro maker? by Arimus · 2010-01-21 10:40 · Score: 1

Add in the number of pilots who even if they're anti-pirate operate a KOS policy when it comes to macro miners....

--
--- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.
Re:MMO macro maker? by visgoth · 2010-01-21 11:01 · Score: 1, Offtopic

I've done a fair bit of mindless semi-afk mining during my time playing eve, and never had much trouble with suicide attackers, can flippers, or other such stuff. I'd imagine that taking the usual minimal precautions like parking in a dead end, low traffic system would work relatively well.

Depending on how robust sikuli is, it might be possible to make a mission running macro, which could be even safer than blasting rocks (with the right ship setup, and such). Barring that I'd likely use sikuli on a second account to automate monkey work. Things like post-mission looting/salvaging, hauling, etc. are wonderful candidates for macro abuse.

--
My patience is infinite, my time is not.
Re:MMO macro maker? by Anonymous Coward · 2010-01-21 17:00 · Score: 0

you don't have to have the background up in eve you can have it completely covered with UI windows. Warp in to a system bring up your windows, scan for rocks and let the script do the rest. This is a definite win for farmers. err nvm this is a win for farmer's managers who can make one gold farmer do the work of 10 farmers

Better by pavon · 2010-01-21 09:35 · Score: 2, Interesting

Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.

How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically looking (empty) text fields, with the only distinguishing factor being the label beside them, and clicking on the label does not select the text field. Like screen scraping, it is also somewhat fragile to UI changes (although not as much as other GUI scripting tools that rely on pixel location).

Re:Better by Anonymous Coward · 2010-01-21 09:42 · Score: 0

This isn't that new. What about Logo or Turtle or whatever it was called back in the '80s. Programming with pictures.
Re:Better by BitZtream · 2010-01-21 09:53 · Score: 1

I can think of at least 3 ways of doing (scripting gui apps that aren't scriptable) already that have been around for years.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Better by Anonymous Coward · 2010-01-21 11:37 · Score: 0

GUI automation has been around for quite some time.
I personally have written programs to automate GUIs both web pages and desktop applications.
What is new here is the unnecessary extra work of image recognition.
I hope it doesn't try to do recognition every time and instead stores the UI element and uses the element directly.
What happens if your background changes?
Does the script break?
Re:Better by Anonymous Coward · 2010-01-21 13:46 · Score: 0

Just remember to never change themes.
Re:Better by CronoCloud · 2010-01-21 18:40 · Score: 1

no, Logo was simple programming to make pictures. The turtle was a drawing point, whether on screen or paper.
Re:Better by Unequivocal · 2010-01-22 07:47 · Score: 1

Your post would be better if you named the three you're thinking of. I wonder if they're the same ones I'm thinking of..

Click Fraud Boosters Away!! by Anonymous Coward · 2010-01-21 09:36 · Score: 0

Sounds like it would be a great program to commit massive clickfraud. Just take a screenshot of a particular google ad-link in your browser and ask it to click it. Install script on hundreds of computers/ run it thousands of times and you have a great way to commit click fraud.

Re:Click Fraud Boosters Away!! by BitZtream · 2010-01-21 09:58 · Score: 1

There are far easier ways to commit click fraud than actually looking at the screen to do it. The ad companies tend to ignore the same request multiple times from the same IP so this changes nothing.
People who commit 'click fraud' aren't writing crappy little screen scrapers to do it, its far easier and faster to write a plugin for firefox to do what you're say and just find the text of your ad on the page and trigger the link. No need to futz with whats displayed or 'moving the mouse' to the right spot, you just tell Firefox to find the link and trigger it.
A relatively simple WebKit wrapper would work equally well.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

My grandmother knows python by Anonymous Coward · 2010-01-21 09:38 · Score: 5, Insightful

"Computer users with rudimentary skills"..... "with a basic understanding of Python"?

Re:My grandmother knows python by BitZtream · 2010-01-21 09:59 · Score: 0

You're reading a story about MIT on slashdot.
Two groups that are so utterly disconnected from the real world that they both have no idea why their favorite toy hasn't taken over the world even those its the simplest, most efficient, easiest to use, most feature rich (insert whatever here) on the planet.
Most of both groups probably think grandma knows assembly as well.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:My grandmother knows python by Fred_A · 2010-01-21 10:17 · Score: 5, Funny

"Computer users with rudimentary skills"..... "with a basic understanding of Python"?
Computer users with a rudimentary skill who do not have a basic understanding of Python can always build a Python programming AI in Lisp (or at least that's what I gathered from the MIT docs I browsed) and thus save themselves the trouble.

--

May contain traces of nut.
Made from the freshest electrons.
Re:My grandmother knows python by Anonymous Coward · 2010-01-21 10:34 · Score: 0

I don't see why not.
Re:My grandmother knows python by Alex+Belits · 2010-01-21 14:36 · Score: 1

Moar liek BASIC understanding of a python.

--
Contrary to the popular belief, there indeed is no God.
Re:My grandmother knows python by AardvarkCelery · 2010-01-21 15:05 · Score: 3, Informative

If a friend wanted to learn just enough programming to do a few light chores, what would you recommend? Python is arguably one of the easiest languages to learn. Randy Pausch used it for Alice, which has been successful for teaching middle school girls how to program. So if "computer users with rudimentary skills" means rudimentary programming, then that works for me.
Re:My grandmother knows python by iluvcapra · 2010-01-21 17:18 · Score: 1

Python is arguably one of the easiest languages to learn.
I can't wait to explain to my mom the difference between four spaces and one tab, just to name one of Python's endless oddities.

--
Don't blame me, I voted for Baltar.
Re:My grandmother knows python by owndao · 2010-01-21 21:17 · Score: 1

I can't wait to explain to my mom the difference between four spaces and one tab, just to name one of Python's endless oddities.
There is, for OS X, QuicKeys, at http://startly.com/ and as I recall it requires in its basic level logic decisions and scales with allowing any scripting language to be called.

--
Be as you would have the world become.
Re:My grandmother knows python by Anonymous Coward · 2010-01-22 01:00 · Score: 0

That's right. Now your grandmother will be able to write a program to configure her IP address, instead of doing it manually.
Re:My grandmother knows python by ubersoldat2k7 · 2010-01-22 01:59 · Score: 1

That's right. Now your grandmother will be able to write a program to configure her IP address, instead of calling you to do it manually.
TFTFY
Re:My grandmother knows python by Anonymous Coward · 2010-01-22 02:41 · Score: 0

You missed the joke. See the image and example video on the Sikuli site.
Your grandmother will be able to write a program to configure her IP address, because she totally knows what an IP address is, and why she wants to configure it.

The Cow pat model by Anne+Thwacks · 2010-01-21 09:41 · Score: 5, Funny

Yeah - lets hear it for a new development model:

For years I have been asking for a softwsare development tool that allows me to write PHP code by throwing cow-pats at the screem with the Wiimote.

And my colleagues wat a tool that allows dispatching my bugs with the Wii gun attachment they use in "Quantum of Solace".

--
Sent from my ASR33 using ASCII

MIT can't afford real microphones by Hadlock · 2010-01-21 09:41 · Score: 1

The subtitles were a bit of a surprise. Can MIT not afford better than built in microphones on cheap laptops? Between her vaugely asian accent, the poor quality of the audio (seriously, you're TELLING people how to do something, the audio is important here - did they record this in a shower stall or something? my netbook's audio sounds 100x better than this), and then apparently some sort of wacky audio encoding basically makes her impossible to understand. People who speak english as a second language aren't going to be able to understand this, thank god they did the subtitles.

Neat concept though.

--
moox. for a new generation.

Re:MIT can't afford real microphones by pclminion · 2010-01-21 09:58 · Score: 1

On the contrary, my experience has been that non-native speakers of English are actually better at understanding other non-native speakers. I don't know why that is, but intuitively it makes sense -- non-native speakers probably learned from a diversity of other non-native speakers.
I was at a WinHEC panel session in 2008 and the panel leader had absolutely horrible English (I'm sure he was intelligent, but he wasn't intelligible). Somebody else, clearly of another racial background (the specific ethnicities are unimportant) stood up and asked a question, also in completely unintelligible English. The questioner and speaker went back and forth for several minutes speaking. Other non-native speakers in the audience were nodding their heads emphatically, indicating they could understand as well. I looked around and every American in the room seemed completely baffled.
Re:MIT can't afford real microphones by Yvan256 · 2010-01-21 10:17 · Score: 1

That's because non-native speakers can't string the words together, they have to cut them up individually. If that makes any sense.
Re:MIT can't afford real microphones by mfnickster · 2010-01-21 18:19 · Score: 1

That gibberish they spoke was cityspeak; gutter talk. A mishmash of Japanese, Spanish, German, what have you.
I didn't really need a translator--I knew the lingo, every good engineer did...but I wasn't going to make it easier for them.

--
"Slow down, Cowboy! It has been 3 years, 7 months and 26 days since you last successfully posted a comment."
Re:MIT can't afford real microphones by Sulphur · 2010-01-22 02:59 · Score: 1

Can a cave man do it?
Re:MIT can't afford real microphones by isilrion · 2010-01-22 12:54 · Score: 1

As a non-native English speaker, I agree with you. I find it easier to understand another non-native speaker than a native speaker (with one notable exception: if the speaker is French, I can't understand him).
For me, bulgarians are the easiest to understand. I guess that because of their native tongue, their "r" are very strong - just like in spanish.

High? by instagib · 2010-01-21 09:42 · Score: 1

FTFA: "Sikuli -- which means God's eye in the language of the Huichol Indians in Mexico". Mexican Indians love their hallucinogenic Peyote. On the other hand, MIT researchers want the masses to program with the mouse. Well, I know about "correlation is not causation", but MIT sure is an interesting place to be.

Expect by Anonymous Coward · 2010-01-21 09:48 · Score: 0

This is a GUI version of Expect. Nothing really groundbreaking. It will also break as soon as the app changes how it looks, just like Expect. I hate expect passionately.

Re:Expect by Razalhague · 2010-01-21 10:28 · Score: 1

How would it not break? You don't expect your regular program to work if the API it's using changes, do you?

Right hands great- chances are more harm than good by Anonymous Coward · 2010-01-21 09:49 · Score: 1, Interesting

Yea- this might work until the icons change. I don't see this working too well in practice. I don't know about Mac- but on my Ubuntu system the icons got updated last week. And it happens often enough that these scripts would need updating to be a serious pain and expense. It isn't like an ordinary user could figure this stuff out either. Despite it being so simple your still going to need an IT person to create these scripts. Now you just have dumber IT people. Probably people who COST you more money in practice too because they "can" do it- it just the results of their work takes more maintenance. It reminds me of this .bat file written for this video store that backs up a database to a flash drive. If it had only had a statement to check if the flash drive were present and alert the user they wouldn't of wasted $80 calling me to come and find out why the backup program wasn't working. Seriously dumb programmer. In the right hands this kind of thing is good. In the wrong hands it is bad.

Program, NOT code. Think MACRO by SmallFurryCreature · 2010-01-21 09:52 · Score: 3, Insightful

From what I seen is this a macro program that can use screenshots rather then key/mouse data to automate tasks. So you PROGRAM your PC in the same way you PROGRAM a VCR to record a show. It is NOT the same as writing an application.

But it seems very intresting once you got past this difference. Macro's are very handy for testing in my experience but often have a problem because a tiny mis-alignment can ruin it all. If this program is smarter because it can regonize where data is supposed to go... well that would certainly make automated tests a bit easier.

Interesting stuff. Just don't think you will be writing software with this.

--

MMO Quests are like orgasms:

You may solo them, I prefer them in a group.

Re:Program, NOT code. Think MACRO by eulernet · 2010-01-21 11:17 · Score: 1

Interesting stuff. Just don't think you will be writing software with this.
Since a few years, programming has become equivalent to placing Lego bricks in the correct order (I'm working with Microsoft .NET and tons of components).
So I'm not very surprised by the approach, as long as we can find all the possible varieties of pieces.
Re:Program, NOT code. Think MACRO by Anonymous Coward · 2010-01-21 11:52 · Score: 1, Interesting

Don't use a tool like this for testing. Start with AutoIt or nunit+white, and look at commercial tools if those don't do what you need.
Re:Program, NOT code. Think MACRO by Anonymous Coward · 2010-01-21 12:48 · Score: 1, Interesting

Exactly! I'd love to see Sikuli's one new trick integrated into an existing, popular macroing system like AutoIt or AutoHotKey.
Re:Program, NOT code. Think MACRO by Anonymous Coward · 2010-01-21 14:44 · Score: 0

troll.
Re:Program, NOT code. Think MACRO by Anonymous Coward · 2010-01-21 15:48 · Score: 0

"Macro's" ?!? Don't leave us hanging. Macro's what? Macro's pants? Macro's speed? Oh wait, you meant "macros".
Re:Program, NOT code. Think MACRO by Anonymous Coward · 2010-01-21 16:33 · Score: 0

From what I seen is this a macro program that can use screenshots rather then key/mouse data to automate tasks. So you PROGRAM your PC in the same way you PROGRAM a VCR to record a show. It is NOT the same as writing an application.
But it seems very intresting once you got past this difference. Macro's are very handy for testing in my experience but often have a problem because a tiny mis-alignment can ruin it all. If this program is smarter because it can regonize where data is supposed to go... well that would certainly make automated tests a bit easier.
Interesting stuff. Just don't think you will be writing software with this.
Quit intreegheen. It's quite intriguing stuff.
Re:Program, NOT code. Think MACRO by listentoreason · 2010-01-22 13:47 · Score: 1

I could not get it to run on Vista64, but my impression is the same; it allows you to write macros that target GUI components by fuzzy graphical matching of little screen shot snippets to what is currently on the screen. They provide an (unfortunately somewhat tedious) video demo that illustrates what it does.
It would be a boon for working with Lightroom, which has horrific keybindings; no way to rebind keys, and many common functions that have no keys at all (like red-eye reduction, where you must interact with a graphical element). I have been able to get around some of this with AutoHotKey, but I'd love to be able to bind a hotkey to a mouse click on a specific GUI component, regardless of it's current absolute x,y coordinate on my screen. That's exactly what Sikuli is supposed to do.
I really wish Adobe would co-opt the awesome dynamic key rebinding mechanism Gimp uses ...

bad VB flashbacks by mirix · 2010-01-21 09:59 · Score: 1

I'm suddenly reminded of horrible apps written in VB97, with no concern for the back end, horrible input kludge, etc.

--
Sent from my PDP-11

Re:bad VB flashbacks by YourExperiment · 2010-01-21 10:39 · Score: 1

I'm suddenly reminded of horrible apps written in VB97
You're 93 versions ahead of your time - VB6 was the last version of Visual Basic before .NET.
Perhaps more to the point, this not only targets a completely different purpose than Visual Basic, but also looks nothing like it whatsoever.
Re:bad VB flashbacks by Anonymous Coward · 2010-01-21 11:06 · Score: 0

Visual Basic 5 was released in 1997, as part of Visual Studio 5. It installed itself in a directory called VB97. VB6, incidentally, installed itself in a directory called VB98.
The smart-assery is weak with this one.
Re:bad VB flashbacks by ClosedSource · 2010-01-21 11:53 · Score: 1

That's OK. For most VB apps there wasn't any "back end".
Re:bad VB flashbacks by Anonymous Coward · 2010-01-21 14:17 · Score: 0

Not to mention that 97-6=91 not 93.
Re:bad VB flashbacks by YourExperiment · 2010-01-22 02:16 · Score: 1

There was still no VB97. Nice try though.
Re:bad VB flashbacks by YourExperiment · 2010-01-22 02:18 · Score: 1

Er, damn - you got me there.
I still don't have a clue how a scripting language with image recognition reminds you of VB though.

Sikuli by Anonymous Coward · 2010-01-21 10:02 · Score: 0

Sikuli velly nice. Near Itari. Parelmo, velly nice. Except warret got storen.

Re:How easy IS it? by 0100010001010011 · 2010-01-21 10:04 · Score: 1

Wow, no one has watched the movie Swordfish have they?

Re:How easy IS it? by Hognoxious · 2010-01-21 10:04 · Score: 1

Have you seen his wife recently?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

CLI by Anonymous Coward · 2010-01-21 10:06 · Score: 0

This is where we get when everything is a GUI. As long as I have a decent shell & environment, I think I prefer shell scripting.

Re:CLI by ubersoldat2k7 · 2010-01-22 02:06 · Score: 1

Does it run on Linux? Doesn't seem so, and that's sad.

This is where we get when everything is a GUI. As long as I have a decent shell & environment, I think I prefer shell scripting.
I was thinking about this same thing. I don't know about Mac or Windows, but I'm able to do anything from the CLI on Linux. This macros toy would be written faster using bash; just google for "bash change IP address". I believe my grandma would find it easier to just copy/paste the answer than knowing how to take a screenshot.

Yes, but can Sikuli be used to write Sikuli? by hellop2 · 2010-01-21 10:06 · Score: 2, Funny

Otherwise it's just not complete, IMHO.

--
How many more years will slashdot have an off-by-one error on your Score in your profile?

Re:Yes, but can Sikuli be used to write Sikuli? by Seor+Jojoba · 2010-01-21 10:43 · Score: 2, Interesting

Yes, you could use Sikuli to fire up a text editor, individually press the keys to write all the lines of code, launch the compiler/linker/whatever. So it meets your weird definition of completeness. However, I suspect you could not use Sikuli to write a program that writes a Sikuli program to write Sikuli. I could be wrong, though.

Perfect Macro program... by BoppreH · 2010-01-21 10:09 · Score: 1

... but does anyone knows if the program is always that slow?

I understand that it has to visually find the button and this is computationally expensive, but the 2~3 seconds lag didn't seem compatible with the task.

On a sidenote, the video states that there's no "internal API" dependence, but it clearly has to send "click" and "type" signals. Is that really OS independent or was it just an overstatement?

Re:Perfect Macro program... by babyrat · 2010-01-21 12:12 · Score: 1

the video states that there's no "internal API" dependence
I suspect they were referring to internal API of the program being controlled. ie COM, Corba, etc...

lame by Charliemopps · 2010-01-21 10:10 · Score: 2, Insightful

This is the same sort of scripting you can do with many already existing languages. Autohotkey for example. The only new feature would be the ability to copy the screenshot directly into the program as apposed to taking it outside the program and referencing the file directly. I'd say that this scripting language is actually weaker because of it. As far as using this inside a game... they are already hardened against this sort of thing. For example, next time you're in EVE look at the buttons you use. They are semi-transparent. This is not just for aesthetics. If you take a screenshot of the button, and then change your camera angle the button looks different because what's behind it is different. That doesn't mean you can't script inside EVE, you just have to be a lot more clever than using a script to click on a static image of the gui. This language would be almost completely useless in any GUI that has any transparency. Which I'd think would include Vista, Win7 and even Macs with the right stuff turned on.

Re:lame by misexistentialist · 2010-01-21 10:31 · Score: 1

Using screenshots seems more effective than instructing autohotkey to click on coordinates
Re:lame by HaeMaker · 2010-01-21 11:12 · Score: 1

So, you tried it and it didn't work?
Re:lame by sky289hawk1 · 2010-01-21 12:33 · Score: 1

The sikuli language supports fuzziness. You can actually have a "close match", and you can set the tolerance.
Re:lame by mattack2 · 2010-01-21 14:40 · Score: 1

I didn't RTFA, but basing this stuff on the *accessibility* view of the screen is/can be useful.
Re:lame by Anonymous Coward · 2010-01-21 17:07 · Score: 0

you can control the transparency in eve windows.

Re:How easy IS it? by Anonymous Coward · 2010-01-21 10:23 · Score: 2, Funny

Wow, no one has watched the movie Swordfish have they?

We're trying to repress those memories, you insensitive clod!

Re:Fr0s7 pist by pushing-robot · 2010-01-21 10:24 · Score: 1

Sorry, there are some things even Sikuli can't process.

--
How can I believe you when you tell me what I don't want to hear?

Applescript was invented a LONG time ago people... by RocketRabbit · 2010-01-21 10:40 · Score: 1

It can script GUI actions in much the same way. Granted it's not a very nice environment for more complicated work, but still.

Its a brilliant idea. by Seor+Jojoba · 2010-01-21 10:40 · Score: 2, Insightful

Come on, let's cut through the default Slashdot snark. The image capture aspect of Sikuli is brilliant! I don't like the tagline "program anything with Sikuli" because 99% of software should be written in something else. But think of writing test scripts that can use the image matching features. If the software works as advertised, then you could throw together UI test cases way faster than anything else I've seen. System administration tasks should be a good match too. The resulting code would be brittle and hard to maintain, but for quick one-off scripts, sure... I can see it.

Re:Its a brilliant idea. by rmcd · 2010-01-21 15:39 · Score: 1

Couldn't agree with you more. I'm surprised by all the negativity. And it seems to me this is innovative enough to have uses that no one here is thinking about right now.

Problems by master_p · 2010-01-21 10:52 · Score: 1

The script may not work if the UI style is different from the one recorded or if the UI language is different from the one recorded. Generally, any option that can change the UI from computer to computer will create a problem for Sikuli.

Re:Problems by VortexCortex · 2010-01-21 11:38 · Score: 1

It's even worse than that... Just change your icon or window border theme and watch every Sikuli script break.
The great thing about all other languages except Sikuli is: When you change your Icon or window border theme the programs still run.

Re:Fucking Communitst by AlexLibman · 2010-01-21 10:58 · Score: 0

Good libertarian / Objectivist / Anarcho-Capitalist trolls at least try to post on topic... Watch me and learn, grasshopper. ;-)

Anyway, did MIT just figure out a way to make computers slower and GUI script kiddies more arrogant?! Yuck! C, perl, and OpenBSD FTW!

fork bomb, or loop? by Anonymous Coward · 2010-01-21 11:00 · Score: 0

Has anyone tried writing a Sikuli script that finds the Sikuli IDE window and clicks the green run button?

Again!?! That trick never works. by Anonymous Coward · 2010-01-21 11:05 · Score: 1, Insightful

This time for sure!

The Sikuli School of Programming by presidenteloco · 2010-01-21 11:21 · Score: 2, Funny

if NOT understand logic then loop talkTo (self, "Don't program!") Look (@ Pretty pictures) endloop endif

--

Where are we going and why are we in a handbasket?

Re:The Sikuli School of Programming by Anonymous Coward · 2010-01-21 22:04 · Score: 0

make that a while not, maybe people can learn programming by looking at pretty pictures

Google Video Search? by Anonymous Coward · 2010-01-21 11:51 · Score: 0

This might have potential, depending on how flexible the pattern match is when looking for thumbnails of, ahhh, things...

Cut up words? by Anonymous Coward · 2010-01-21 11:52 · Score: 0

Now why would you want to do that?

It's Not Going Anywhere by Clugy · 2010-01-21 12:08 · Score: 1

I'd be curious to see how they handle the back end, especially as some others pointed out it does make calls that seemingly require some hook into the OS. As for its usefulness, I doubt it will really take off beyond being a decent prototype. It relies on image matching so if you use and change a custom icon set all your scripts would be kinda worthless. Same goes if the programs you are "screenshot scripting" receive a major overhaul in the GUI department. Until it can address those issues, I doubt it will really take off.

Think executable step-by-step tutorials by tucuxi · 2010-01-21 12:12 · Score: 4, Insightful

Sikuli is certainly not commercial-grade UI testing software. It was never intended to be, this is academic software written to explore ideas, rather than to polish them to perfection. Also, it is not a "general" programming language. The previous posters that compared it to video-programming are right: not all programs have to target complicated algorithms and data-structures, there is plenty of space for automating "simple stuff".

As an idea, I find the readability of the code particularly interesting. Sikuli code is about the closest you can come to self-explanatory, step-by-step instructions on how to achieve whatever a particular program does. Add a few comments to the most arcane steps, publish those programs to an online repository, and presto! executable step-by-step tutorials.

Yes, the developers may have to address the variability of themes on people's desktops. It is certainly possible to do so (for instance, by keeping a list of mappings from any of a set of "supported" themes to a "canonical" theme, which would be used in all examples), but, as far as ideas go, I really think that Sikuli is a very refreshing idea.

Re:Think executable step-by-step tutorials by tristanreid · 2010-01-21 13:25 · Score: 3, Interesting

I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this is basically how to do it with trivial coding.
I think of this as an equivalent to something like sql. There's a domain in which you'd like to impose logical structure (relational data / images), and you generally use the language to great effect in conjunction with another programming language. If I had to write a scheduled task for my laptop that needed for me to be on the VPN, I'd much rather use something like this to handle the connection rather than trying to figure out how the VPN API works.
-t.

Re:Applescript was invented a LONG time ago people by babyrat · 2010-01-21 12:14 · Score: 1

The last time I tried to use Applescript on windows or linux, it wouldn't even start up.

You're doing it wrong. by Anonymous Coward · 2010-01-21 12:36 · Score: 0

If you have to write a script to automate GUI applications you're undermining the purpose of computers. I'm sitting here imagining people automating deletion.

Re:You're doing it wrong. by tomhath · 2010-01-21 13:10 · Score: 1

I mostly agree with you, it's always silly to automate a sequence of GUI actions.
However I can see where they're going here; the program examines your screen and finds the widget to click on or enter data into, much like a human looking at the screen and deciding what to do next. Extend that to the real world, a robot that looks around your room for the remote control and turns on the TV, then surfs through the channels until it recognizes something you like to watch. By then it will also be capable of understanding speech and making decisions autonomously. Computers will be thinking like humans within just a few years. Oh wait.
Re:You're doing it wrong. by Anonymous Coward · 2010-01-21 21:50 · Score: 0

Well I can get this. Ive been saying to people a lot lately that unless you can automate your stuff on the computer your not using it.
What i dont understand is WHY the hell ist so hard to get every frigging piece of software to be scriptable a easy way.
I mean fine a gui front end but what I and users want is a whatever you like front end.
The biggest problem is that a lot of programs out there dont want to get scriptable since that would defeat their business model.

Use This for Software Testing, and Scripting? by LifesABeach · 2010-01-21 13:22 · Score: 1

I just open this can of worms up, but the first thing I thought of after seeing the demo was, "Can I push a button on a Flash page?"

Re:Use This for Software Testing, and Scripting? by phi2one · 2010-01-21 14:56 · Score: 1

I am wondering the same thing myself; If all it's doing is scraping the screen buffer somehow, I don't see why not.

What's so wrong with TurboTax? by AardvarkCelery · 2010-01-21 13:50 · Score: 2, Interesting

Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.

For those kinds of people, Sikuli looks pretty cool because they can do things that would be pretty difficult otherwise. Hey, even for a lot of experienced programmers, capturing a region of the screen and doing fuzzy pattern matching might be a significant task. I haven't tried Sikuli yet, but it looks like it would be very helpful for some things, and a lot easier to deal with than AutoIt or AutoHotkey.

(BTW, TurboTax was just an example. I actually use something I like better, but you get the idea.)

SendKeys by codepunk · 2010-01-21 14:52 · Score: 1

Wow they just created the old VB SendKeys command. I was actually doing stuff like this 12-14 years ago with SendKeys command in VB. In "practical" use back then
it sucked and I am certain that has not changed.

--

Got Code?

AutoIt by White+Flame · 2010-01-21 14:56 · Score: 1

I did this exact same thing in AutoIt, except that it needs exact matches of images instead of a fuzzy recognizer. (Plus, I also had rule triggers and state vs just a single list of imperative commands)

The fuzzy match is a nice addition, but this automation concept has been available for years.

Re:AutoIt by mrjb · 2010-01-21 22:02 · Score: 1

The fuzzy match is a nice addition
and probably an obligatory one as well. If the screenshot is a (lossy) jpeg, the image recognition simply won't work unless it is at least somewhat fault-tolerant.

--
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Re:AutoIt by White+Flame · 2010-01-22 11:45 · Score: 1

What AutoIt does is take a hash of the pixels in a rectangular area. If you interactively capture an area's hash when the screen is in the desired state, then that area can be scanned during the script run to see when/if it matches the desired hash again. The area's location can be relative to a window, control, screen, etc, and the software can scan around various locations in case it moved.
There's no lossiness in any of the image manipulation, but the same pixels need to show up.

Better Solution one line by codepunk · 2010-01-21 15:02 · Score: 1

man ifconfig

--

Got Code?

Spammers Rejoice! by VortexCortex · 2010-01-21 15:09 · Score: 1

Just Great... all the spammers need now is a few CAPTCHA deciphering Sikuli plug ins.

Once that's done we can all go back to manually removing spam from our web forums and in-boxes.

Bobby Tables by gmuslera · 2010-01-21 15:11 · Score: 1

How you sanitize your inputs in a language that checks what is displayed on the screen? Instead of xss or sql injection you could end being hacked by watching a mail attached normal picture if that kind of programming becomes popular.

Reminded me of HyperCard by Anonymous Coward · 2010-01-21 15:56 · Score: 0

For some reason this suddenly reminded me of HyperCard. Anyway, I think there's definitely a desire for this sort of thing out there. From the Wikipedia article on HyperCard...

HyperCard has been described as a "software erector set." It integrates a software development environment with a run-time environment in a simple, easily accessible way. The tools required to write an application, principally the creation and configuration of screen objects like buttons, fields and menus, are part and parcel with the ability to add programmed functionality to those objects. ... "Empowerment" became a catchword as this possibility was embraced by the Macintosh community, as was the phrase "programming for the rest of us", that is, anyone, not just professional programmers.

I think I've seen this before... by kasparov · 2010-01-21 16:06 · Score: 1

It is basically expect script for GUIs.

--
There's no place I can be, since I found Serenity.

Is this really a new idea? by Anonymous Coward · 2010-01-21 17:18 · Score: 0

How is this any different than Automate? That has been around for many years and based on the MIT video it appears that automate is much easier to use.

http://www.networkautomation.com/automate/7/

Re:Applescript was invented a LONG time ago people by RocketRabbit · 2010-01-21 17:27 · Score: 1

Yeah, and the last time I tried to run Logic Pro 8 on Windows or Linux, it wouldn't even start up.

Cool, but it has severe downsides. by mrjb · 2010-01-21 21:38 · Score: 2, Interesting

The idea is cool and innovative, and makes automating a point-and-click interface a breeze. It certainly has applications.

But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.

Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.

--
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book

Question by Anonymous Coward · 2010-01-21 22:49 · Score: 0

What happens when the luser changes his theme? Or when Apple updates the system software and controls change places/colors a little tiny bit? (is it still called "system software"?)

It certainly has its on niche by gugod · 2010-01-22 03:37 · Score: 1

I tried playing it and found it really impressive. The implementation is still beta-ish, but good enough to give it a try.

The first thing I make it do for me, is to launch my VNC viewer on my mac laptop, connect to my local head-less windows 7 machine, and click the iTunes play button to play some music there. It just worked (amazingly), and I found it to be a pretty good use case for a tool like this. A task like that cannot be easily automated. At least, it has not be the case with a tool that you just start trying for 5 minutes.

I can imagine that, if the image pattern matching can be extended to do recognition, such as face recognition / text OCR, and passed the recognized info back, or it adds webcam as its input device (instead of keyboard / mouse IO) it'll be overwhelming.

Besides, it really is an inspiring way of coding.

Associative Arrays indexed by a Freakin' Image by awol · 2010-01-22 03:55 · Score: 1

I have to say I am impressed. I have had a play with some of the demos and I like what I see. Whilst I agree that there are limitations this project seems fantastic.

Having tried and failed to use "win runner" in the past due to the complexity of the GUI application I was testing, this scripting would get past the problems we were having.

I can envisage sending canned scripts to my folks for doing maintenance on their own machine, even just some diagnostics that I find hard to do over the phone.

I have a couple of itches of my own that I reckon I could scratch with this, for example I have a macbook that I sometimes attach to an external display. Sometimes the external is on the left of my laptop, sometimes the right, sometimes directly above it would be cool to have a script that allowed me to just click an icon to arrange the displays appropriately. Sikuli is close. I am about to go off and see if that will work.

I mean they have associative arrays indexed by a freaking picture. That is simply, well, paradigm shifting. I am less concerned about the actual efficacy of Sikuli than I am about the ability to hook applications together through their GUI. I am thinking about something like "GUI pipes" which is something I have been thinking about for some time. Mark III of this stuff could be amazing.

I honestly think this project is potentially awesome, in the olden days, before the net was quite so pervasive we used to talk about using the RussTerm, which was basically getting our guy on the ground in a foreigh country (Russell) to type stuff on the machine he was looking at whilst we talked him through it over the phone, mostly because we could not automate the stuff we wanted to do. This would address many of the use cases for which similar requirements might exist today. That's just one idea that occurs off the top of my head.

Many posters have noted that much of this functionality exists already in tools like; AutoIt, AutoHotKey, some numpty even mentioned sendkeys in VB. But these people have missed the point, until now its all been very "Goto X,Y -> Click" not "find(Thing).click()". Even things like WinRunner or RationalTest seem, in my experience to be far to rigid to be useful. I can see how I would have used this tool to do much good work for our software back when I was demoing, devloping and testing stuff.

That it is wrapped in a nice scripting language as well just makes it even better.

I'm off to see how good it actually is....

--
"The first thing to do when you find yourself in a hole is stop digging."

I'd suggest AutoIT3 for this kind of thing by Oneamp · 2010-01-22 06:53 · Score: 1

It's an interesting idea, but if you're serious about automating Windows, I heartily recommend AutoIT3. http://www.autoitscript.com/autoit3/

--
Increase my killing power, eh?

Immediately useful, valuable and fun by mattr · 2010-01-23 07:41 · Score: 1

Okay, I have done a fair amount of programming and yet with a new Mac I have not yet dived into the SDKs, etc. I once wanted to do some batch resizing of photos and yet couldn't get it done in Automator easily without being scared of losing the original photos, on my first dive into it. Yes, I actually wrote a great auto-compositing and resizing program once driving the Gimp on linux. It was awesome. But that was years ago and now I have a nice new computer. And where did that code go. Yes I'm sure Automator, Quartz Composer, my shiny new Xcode system and whatever else works on a Mac will be great. But I haven't had time to learn it.

Enter Sikuli. I wrote a hello world and it worked fast. I don't know if I could do it to do batch photo processing still but it just seems cool. I'd rather it was decoupled from a language and the editor was open sourced (maybe it is?) though, so others could build on that. For example if there was a binding to Perl and you could just use the IDE, then maybe someone could add Perl bindings and someone else might add use of CPAN modules for downloading web pages, etc.

Also the vision algorithm looks a bit slow.

There was once an experimental system created that allowed you to program graphic drawings drawn as if on a napkin which would animate in 2D, which is how the program would run. A true graphic language. Maybe someone can find it probably in the ACM SIGGRAPH proceedings several years ago. Maybe "graphic shell" and "napkin drawings" would find it.

ALso see VizDraw (pdf) where recognition is done on drawing with a pen tablet.
http://www.eng.uwaterloo.ca/~akmishra/VizDraw2.pdf

Anyway, Sikuli is spectacular for using computer vision techniques to allow for slight changes, and for being immediately useful. I'd like to see it linked to Xcode for RAD of Objective-C apps; Apple should definitely license it or hire the developers for research on it. There is a vast field opened up by this, finally an a-hah experience and not just Apple but many developers should now consider how to get the computer to be smart and find out what you want to do.

This ought to make it possible to do easy mechanized data extraction from the web, analysis of webcam feeds, acting on audio and other types of sensor cues, accessing data and devices over networks, and taking action based on feeds from other devices that are minimally enhanced like my cellphone telling my mac and maybe my mail server when its battery is about to die. It could forward mail to another device, etc. This kind of thing even could work in video cameras and household devices. Even if you just consider it a way to turn people on to programming it is invaluable and fun. I'd like to see Sikuli's functional pieces broken off into standalone services that can be used by other things. As for the comment about window manager themes or operating system versions changing and breaking the script due to icon changes, I think the vision detection of a gui button actually is finding the button and window ids in Sikuli and ought to be able to hand those back.

The editor should also be broken off, of course it needs to be able to launch a screenshot capturing action but that does not mean it must be the sole application allowed to do this. And you could write (snap?) a Sikuli script to run a screenshot capture. Finally I think the Sikuli scripts ought to allow being compiled or otherwise optimized since obviously once it is run, Sikuli knows what the ID of the graphic element it finds is and thereafter need not do vision recognition, it seems.

Slashdot Mirror

MIT Offers Picture-Centric Programming To the Masses With Sikuli

154 comments