Has Flow-Based Programming's Time Arrived?
An anonymous reader writes "Flow-based programming keeps resurfacing lately. FBP claims to make it easier for non-programmers to build applications by stringing together transformations built by expert programmers. Many projects have already been using similar approaches for a long time, with less (or different?) hype. Is it time to take a closer look at flow-based programming? 'Clean functions – functions without side effects – are effectively pure transformations. Something comes in, something goes out, and the results should be predictable. Functions that create side effects or rely on additional inputs (say, from a database) are more complicated to model, but it’s easier to train programmers to notice that complexity when it’s considered unusual. The difficulty, of course, is that decomposing programs into genuinely independent components in a fine-grained way is difficult. Many programmers have to re-orient themselves from orthodox object-oriented development, and shift to a world in which data structures are transparent but the behavior – the transformation – is not.'"
old is new is old is new is old is...new?
Gee, never heard that one before.
What the people pushing these ideas don't seem to know is that it's not the tools, it's the way of thinking about a problem. I once worked at a place where we made a manager a tool that would let him create his own reports, and he immediately started adding up all flavors of apples and oranges (e.g., dollars of this and pounds of that). Then he wanted the small IT staff to help him make sense out of his reports.
Sheesh, evil *and* a jerk. -- Jade
it wasn't a good idea to start with and it hasn't gotten any better since then.
Anons need not reply. Questions end with a question mark.
NI LabView has had flow based visual programming for more than a decade. The nice thing is make parallel flows to your data, and it's multi-threaded.. It makes an easy way to visualize that something is run in parallel.
The basic problem is that, while it sounds great in theory, in practice the transformations you want don't exist. If they did, you'd have software doing the job already and you wouldn't need to create it. Your business isn't going to go very far just doing the same thing everybody else is doing, is it? You need to be doing something they aren't. Which means, in this context, you need transformations that don't already exist (either because they haven't been written yet (ideally) or because the people who wrote them are keeping them confidential to keep you from copying what their business is doing (less than ideal)). So on top of your FBP team stringing components together, you're still going to need that expensive conventional development team to write the components for the FBP team to string together. You haven't saved much, in fact it's probably costing you more than just having the conventional dev team.
Plus, stringing components together isn't quite as simple as it sounds. Real-world systems usually depend on interaction and complex relationships between otherwise simple components. Keeping track of the synchronization between all those parts and keeping everything coordinated is usually the hard part. For instance, when creating an application to set up appointments the part where you take a set of existing appointment slots (some occupied, some free) and a set of one or more slots the customer would like ordered by preference and determine the best requested slot to put them in is easy. Picking up the current set of slots, putting them up on a Web page in a sensible way, letting the user select what ones work for them and set the order of preference, sending that information back to the server and, across this whole process, making sure that either nobody could alter the assigned appointments between the time you picked them up to display and the time the customer hit Submit and you started to sort out the assignment, that's nowhere near as simple. Doing this in a modern system with multiple servers where the submission may not go back to the server that handled the initial request, when you've got thousands of customers and hundreds of your users making, changing and cancelling appointments at the same time, ... can we say "highly non-trivial"? And it really doesn't fit the FBP model at all.
Even where things fit the model, it's rarely as simple as "just string them together". I work with software that fits that model. Well, it did. Once. Long ago. And then the business people said "Oh, but customer A wants it do to some other thing if it's a transaction for them.". Followed by "Well, we want to do X, unless C applies in which case we want to do Y.". "Oh, unless M was specified for the account, in which case do X even when C applies except where X set V to 7 we need it set to 14.". Lather rinse repeat for 10 years and the quick one-line summary version ran to 5 pages single-spaced. Until that is we tried to print it out and found things were nested so deep some lines were starting past the right-hand edge of standard letter paper, so it's more like 10 pages, legal-sized, in landscape mode.
I think where this would be interesting is using Behavior Engineering (not UML!) to debug the design (and requirements) and then have automatically generated skeleton loaded into some sort of Flow-based programming system. If you're unfamiliar with BE (it's not really taught outside Griffith University) then you can have a look at http://en.wikipedia.org/wiki/Behavior_Trees
The two main problems of software engineering are getting data and control from where it is, to where it needs to be. Flow based programming focuses on those two aspects.
When you look at a C or Java program, when you're looking through the source code, the main thing that is presented to your eyes is the actual algorithm, the code that is doing the work. Sometimes you have to do a lot of effort to even figure out where a program starts. A good class diagram can capture a lot of that, but usually there is still a lot missing in class diagrams. So you might say those languages are algorithm based or something.
With flow based programming, when you look at a program, the first thing, and most obvious thing you see is the connections between modules; the way control and data are passed around the system is obvious. But you have to do extra work if you want to look inside the 'black box' modules that are actually doing the work.
With apologies to Dijkstra, one might say that Flow Based programming is an exceptionally bad idea which could have only originated at IBM (in my mind the division of labor between those who are making the 'black boxes' and those who are connecting them together is extremely difficult to get right), but the idea of making the connections between modules more obvious is definitely a good one. I hate looking at new codebases sometimes for exactly the reason that it's hard to see how the modules are put together.
"First they came for the slanderers and i said nothing."
It may or may not be a good idea, but it's not new. The article is not very good, either. It's all over the place, from punched cards to XML.
Data flow programming has been tried many times. It's used heavily in GPUs, where pipelines are the only way to get work done. So there's reasonably good understanding today of what can and can't be done by data flow methods.
If you like "flow-based programming", one easy way to do it is to write programs in single-assignment style. Assign to each variable only once, and make it "const" or read-only if your language supports it. This makes your program a directed acyclic graph. Single assignment is a lot like functional programming, but values have names and the nesting depth of calls doesn't get so deep. It's also possible to use the same result as input to more than one function, which is hard to do in pure functional programming (That's the difference between a directed acyclic graph and a tree.)
The use of pure functions makes for cleaner programs, but more data copying. Data copying isn't necessarily bad today. It's cheap to copy data you just used or created, because it's in the cache. Modern CPUs have wide buses, are good at copying, and can probably do the copy in parallel with something else. Don't avoid copying data to "improve performance" unless the data item is large.
The place where this all comes unglued is when the program's goal is to affect something, not grind data down to a "result". Trying to write a GUI in a functional style is tough. (I once wrote a GUI in pure Prolog. Very bad idea.)
grep EVENT log*.txt | sort | uniq | awk '{print $2}' | ssh reportserver "gzip > results.txt.gz"
Flow based, side effect free, distributed computing on one line. There is a reason shell scripts refuse to die in the face of python, perl or anything else.
Talend.com offers Talend Open Studio - a great free software product for flow-based visual programming, based on Eclipse.
I used it for projects for 5 years and was amazed, how easy to make complex data transformation by dragging and dropping components and links.
A result is a java (or Perl) program, which can run standalone.
The program is a visual data flow, easy to modify and understand even for non-programmers.
I really appreciate Talend team effort and recommend this product for all, who need data transformations.
That sounds like a Structure Query Language that was developed some years ago to allow business users to query databases. It would free programmers to focus on programming instead of running reports. Looks like it worked pretty well.
-- Reality checks don't bounce.
"Everything Flows"
-Cyberneticists
As some other people already remarked, on the face of it this looks like the venerable Unix approach of small tools in a script. My point is that the real world outside, that you are trying to capture in a programming language, can be very complicated. For some domains, e.g. logic or arithmetics, the language can be pretty complicated too - see APL, LISP or Prolog.
But in thirty years of programming (computational linguistiscs), I have found that Unix scripts, awk and plain C covered pretty much 90% of (my) programming needs. If and when necessary I tacked on a larger database system. Of course I tried the new (well, in the nineties they were new) OO systems, but I rapidly got lost in a jungle of libraries and methods and even more documentation. Compare that to the almost ascetic (and aesthetic) clarity of the Unix environment.
Yes, I feel that Unix still has a lot of mileage in it and intentionally or not, this item and the reactions on it, confirm me in this view
Paai
Welcome to the late 90s. Out here 15 years later, nobody uses UML. The idea of UML as code is completely dead, the corpse was staked, burned, and scattered to the four winds. It never worked. The only thing that's left of it is class diagrams, which people were doing 20 years before UML existed (if not longer).
I still have more fans than freaks. WTF is wrong with you people?
This has been tried over, and over again many times over the last 30 years.
It just plain-old doesn't work. You get inefficient, bloated code at the best of times, and 99% of the time you need some sort of custom function that still requires conventional software development.
So, no, flow-based programming's time has _NOT_ arrived.
I was thinking of that too. LabView looks utterly cool on some sort of brochure or in a movie scene but was a slow way to put together scripts (just like "it's a Unix system, I know this" - with the similarly cool looking but ultimately useless file viewer).
This idea was obsolete when I did a subject in doing this sort of programming for analog computers (patch cables to amplifiers) in the 1980s. Oddly enough a good way to design analog computers was to optimise the model which started with discrete components into something like a script and then convert that back into a simpler system of discrete components.
Done that too - and also programmed an analog machine with patch cables (the inspiration for labview IMHO). That taught me that it's a bad idea to go anywhere near the patch cables without the equivalent of a normal script in the first place. You just end up with a tangle that may be able to do done in a simpler or better way but you just cannot see how in that representation.
My last labview program had lots of lines going underneath other ones which really defeated the entire purpose of such a simple representation. If it's going to work well it has to be modular enough that it's going to look simple no matter how you represent it.
A complex problem won't be simpler just because the tool isn't as powerful. All these ideas comes from a fundamental misunderstanding about what it is that actually makes system development hard.
So you mean like unix pipe. Where you have those wonderful transformations like sed, grep, cut, sort, etc ? Youngsters.
If programs would be read like poetry, most programmers would be Vogons.
If you're wondering what flow-based programming feels like, I work on an open-source framework called FlowVR.
It is advertised towards high-performance computing but suits small applications just fine.
Arguably, 'Flow Based Programming' (at least back when it was called 'Functional Programming'), arises out of an elegant impulse, just look at what the concept of a 'Function' can do for you in so very many fields of mathematics; but seems painfully likely to fall over in a screaming heap in the face of a real-world programming team, under real-world pressure.
Lambda calculus predates the invention of the compiler. It's always funny when programmers rediscover stuff and think they've invented something new. :P
Indeed, unix pipes, especially if usning named FIFOs as well, is a subset of the Flow based programming paradigm fullfilling quite a bit of it's specifications. A full FBP implementation adds a few things like option ports (could be done in shell with parameters though), custom information packet design (is only text strings in unix), and so called "initiation information packets", and surely a few more.
There is actually an attempt at implementing a full FBP system in shell scripts only, which is kind of cool. It was discussed in the flow based programming forum,here: https://groups.google.com/forum/#!searchin/flow-based-programming/shell/flow-based-programming/PC96WYOAwAU/ICRZg_5K1XMJ
This notion of allowing non-programmers to do programming is flawed on its face. The challenges of programming can not be overcome by drawing some cute pictures. And if you have expert programmers making the building blocks it will only take them a few minutes more to connect them which eliminates the need for the "non-programmer" entirely. Then there is this notion of "pure functions". Dude, functional programming is a fad. Real programs manipulate data.
I've been "programming" with Simulink for 5 years now and it's great for control systems but shit for most other kinds of programming. So let your controls engineer use it to design and test and even generate C code. Then drop that code into your app and call it as a function. Never let that guy convince your organization that this is the way to go and ALL of the software should be created this way.
Both. For different reasons. In this context, specifically, I've not run into anything quite as useful as UNIX pipes anywhere else. They are somewhat limited to manipulating a stream of data, but they are eminently versatile with minimal typing, and minimal code confusion. This is language feature.
Sometimes libraries attempt cool things like this by using operator overloading, etc, but it's not the same. People writing UNIX cli programs write to the paradigm, and join an environment wide community. If I download a well written bash script, I can reasonably anticipate that it can be incorporated into more complex bash commands, with little effort. Libraries that do stuff like this wind up as one-offs that must be mastered for narrow uses.
Languages shouldn't try to do everything themselves, but they do need to present a paradigm that libraries will adhere to. The better the paradigm, the more seemless the libraries written for it will work together. Most languages don't take library cross-integration as a design goal.
Here's my biggest pet-peeve when it comes to libraries. Each one is designed to be used in a particular way, according to a particular paradigm... and they rarely describe these patterns of thought for you at the outset. You've got to read through half the object/function descriptions, and then google code samples to figure out what the library author was thinking. This is a documentation problem. (end rant)
That's a given.
This is also somewhere modern languages fall down. I'm not sure what the solution will be, but there's always that pesky step between building an elegant theoretical model and trying to express it in code. Once done, the code never expresses the model, and very few programmers go back and leave documentation. They'll include a few comments, but nothing meaningful in the big picture.
I hold that we write bad code because our languages and tools encourage us to write bad code.
(and bad programmers will never go away)
I understand your frustration here, but this can be a very handy feature. I primarily write prototype code and one-offs. Starting to code before I fully understand what I'm doing saves a good deal of time and effort. And If I need to start over from scratch and really design something, I'll know what I'm up against. It's a royal pain to redesign a whole project from scratch 4 or 5 times because you didn't understand what you were doing.
(This isn't to say that I take no thought, nor that I expect code written this way to be any good. I start with a plan, even if it has great big holes in it. I write nothing this way unless I'm prepared to rewrite it from scratch.)
But for large projects, business critical projects, "production" code, or anything remotely long term (more than a day or two), I agree with your sentiment completely.
Fortune at the bottom of the page: "Any program which runs right is obsolete."
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.