Pentaho 3.2 Data Integration
diddy81 writes "A book about the open source ETL tool Kettle (Pentaho Data Integration) is finally available. Pentaho 3.2 Data Integration: Beginner's Guide by María Carina Roldán is for everybody who is new to Kettle. In a nutshell, this book will give you all the information that you need to get started with Kettle quickly and efficiently, even if you have never used it before.The books offers loads of illustrations and easy-to-follow examples. The code can be downloaded from the publisher website and Kettle is available for free from the SourceForge website. In sum, the book is the best way to get to know the power of the open source ETL tool Kettle, which is part of the Pentaho BI suite. Read on for the rest of diddy81's review.
Pentaho 3.2 Data Integration: Beginner's Guide
author
Maria Carina Roldan
pages
492
publisher
Packt Publishing
rating
9/10
reviewer
diddy81
ISBN
1847199542
summary
If you have never used PDI before, this will be a perfect book to start with.
The first chapter describes the purpose of PDI, its components, the UI, how to install it and you go through a very simple transformation. Moreover, the last part tells you step by step how to install MySQL on Windows and Ubuntu.
It's just what you want to know when you touch PDI for the first time. The instructions are easy to follow and understand and should help you to get started in no time. I honestly quite like the structure of the book: Whenever you are learning something new, it is followed by a section that just recaps everything. So it will help you to remember everything much easier.
Maria focuses on using PDI with files instead of the repository, but she offers a description on how to work with the repository in the appendix of the book.
Chapter 2: You will learn how to reading data from a text file and how to handle header and footer lines. Next up is a description of the "Select values ..." step which allows you to apply special formatting to the input fields, select the fields that you want to keep or remove. You will create a transformation that reads multiple text fields at once by using regular expressions in the text input step. This is followed by a troubleshooting section that describes all kind of problems that might happen in the setup and how to solve them. The last step of the sample transformation is the text file output step.
Then you improve this transformation by adding the "Get system info" step, which will allow you to pass parameters to this transformation on execution. This is followed by a detailed description of the data types (I wish I had all this formatting info when I started so easily at hand). And then it even gets more exciting: Maria talks you through the setup of a batch process (scheduling a Kettle transformation).
The last part of this chapter describes how to read XML files with the XML file input step. There is a short description of XPath which should help you to get going with this particular step easily.
Chapter 3 walks you through the basic data manipulation steps. You set up a transformation that makes use of the calculator step (loads of fancy calculation examples here). For more complicated formulas Maria also introduces the formula step. Next in line are the Sort By and Group By step to create some summaries. In the next transformation you import a text file and use the Split field to rows step. You then apply the filter step on the output to get a subset of the data. Maria demonstrates various example on how to use the filter step effectively. At the end of the chapter you learn how to lookup data by using the "Stream Lookup" step. Maria describes very well how this step works (even visualizing the concept). So it should be really easy for everybody to understand the concept.
Chapter 4 is all about controlling the flow of data: You learn how to split the data stream by distributing or copying the data to two or more steps (this is based on a good example: You start with a task list that contains records for various people. You then distribute the tasks to different output fields for each of these people). Maria explains properly how "distribute" and "copy" work. The concept is very easy to understand following her examples. In another example Maria demonstrates how you can use the filter step to send the data to different steps based on a condition. In some cases, the filter step will not be enough, hence Maria also introduces the "Switch/Case" step that you can use to create more complex conditions for your data flow. Finally Maria tells you all about merging streams and which approach/step best to use in which scenario.
In Chapter 5 it gets really interesting: Maria walks you through the JavaScript step. In the first example you use the JavaScript step for complex calculations. Maria provides an overview of the available functions (String, Numeric, Date, Logic and Special functions) that you can use to quickly create your scripts by dragging and dropping them onto the canvas. In the following example you use the JavaScript step to modify existing data and add new fields. You also learn how to test your code from within this step. Next up (and very interesting) Maria tells you how to create special start and end scripts (which are only executed one time as opposed to the normal script which is executed for every input row). We then learn how to use the transformation constants (SKIP_TRANSFORMATION, CONTINUE_TRANSFORMATION, etc) to control what happens to the rows (very impressive!). In the last example of the chapter you use the JavaScript step to transform a unstructured text file. This chapter offered quite some in-depth information and I have to say that there were actually some things that I didn't know.
In the real world you will not always get the dataset structure in the way that you need it for processing. Hence, chapter 6 tells you how you can normalize and denormalize data sets. I have to say that Maria took really huge effort in visualizing how these processes work. Hence, this really helps to understand the theory behind these processes. Maria also provides two good examples that you work through. In the last example of this chapter you create a date dimension (very useful, as everyone of us will have to create on at some point).
Validating data and handling errors is the focus of chapter 7. This is quite an important topic, as when you automate transformation, you will have to find a way on how to deal with errors (so that they don't crash the transformation). Writing errors to the log, aborting a transformation, fixing captured errors and validating data are some of the steps you go through.
Chapter 8 is focusing on importing data from databases. Readers with no SQL experience will find a section covering the basics of SQL. You will work with both the Hypersonic database and MySQL. Moreover Maria introduces you to the Pentaho sample database called "Steel Wheels", which you use for the first example. You learn how to set up a connection to the database and how to explore it. You will use the "Table Input" to read from the database as well as the "Table Output" step to export the data to a database. Maria also describes how to parameterize SQL queries, which you will definitely need to do at some point in real world scenarios. In next tutorials you use the Insert/Update step as well as the Delete step to work with tables on the database.
In chapter 9 you learn about more advance database topics: Maria gives an introduction on data modelling, so you will soon know what fact tables, dimensions and star schemas are. You use various steps to lookup data from the database (i.e. Database lookup step, Combination lookup/update, etc). You learn how to load slowly changing dimensions Type 1, 2 and 3. All these topics are excellently illustrated, so it's really easy to follow, even for a person which never heard about these topics before.
Chapter 10 is all about creating jobs. You start off by creating a simple job and later learn more about on how to use parameters and arguments in a job, running jobs from the terminal window and how to run job entries under conditions.
In chapter 11 you learn how to improve your processes by using variables, subtransformations (very interesting topic!), transferring data between transformations, nesting jobs and creating a loop process. These are all more complex topics which Maria managed to illustrate excellently.
Chapter 12 is the last practical chapter: You develop and load a datamart. I would consider this a very essential chapter if you want to learn something about data warehousing. The last chapter 13 gives you some ideas on how to take it even further (Plugins, Carte, PDI as process action, etc) with Kettle/PDI.
In the appendix you also find a section that tells you all about working with repositories, pan and kitchen, a quick reference guide to steps and job entries and the new features in Kettle 4.
This book certainly fills a gap: It is the first book on the market that focuses solely on PDI. From my point of view, Maria's book is excellent for anyone who wants to start working with Kettle and even those ones that are on an intermediate level. This book takes a very practical approach: The book is full of interesting tutorials/examples (you can download the data/code from the Pakt website), which is probably the best way to learn about something new. Maria also made a huge effort on illustrating the more complex topics, which helps the reader to understand the step/process easily.
All in all, I can only recommend this book. It is the easiest way to start with PDI/Kettle and you will be able to create complex transformations/jobs in no time!
You can purchase Pentaho 3.2 Data Integration: Beginner's Guide from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
It's just what you want to know when you touch PDI for the first time. The instructions are easy to follow and understand and should help you to get started in no time. I honestly quite like the structure of the book: Whenever you are learning something new, it is followed by a section that just recaps everything. So it will help you to remember everything much easier.
Maria focuses on using PDI with files instead of the repository, but she offers a description on how to work with the repository in the appendix of the book.
Chapter 2: You will learn how to reading data from a text file and how to handle header and footer lines. Next up is a description of the "Select values ..." step which allows you to apply special formatting to the input fields, select the fields that you want to keep or remove. You will create a transformation that reads multiple text fields at once by using regular expressions in the text input step. This is followed by a troubleshooting section that describes all kind of problems that might happen in the setup and how to solve them. The last step of the sample transformation is the text file output step.
Then you improve this transformation by adding the "Get system info" step, which will allow you to pass parameters to this transformation on execution. This is followed by a detailed description of the data types (I wish I had all this formatting info when I started so easily at hand). And then it even gets more exciting: Maria talks you through the setup of a batch process (scheduling a Kettle transformation).
The last part of this chapter describes how to read XML files with the XML file input step. There is a short description of XPath which should help you to get going with this particular step easily.
Chapter 3 walks you through the basic data manipulation steps. You set up a transformation that makes use of the calculator step (loads of fancy calculation examples here). For more complicated formulas Maria also introduces the formula step. Next in line are the Sort By and Group By step to create some summaries. In the next transformation you import a text file and use the Split field to rows step. You then apply the filter step on the output to get a subset of the data. Maria demonstrates various example on how to use the filter step effectively. At the end of the chapter you learn how to lookup data by using the "Stream Lookup" step. Maria describes very well how this step works (even visualizing the concept). So it should be really easy for everybody to understand the concept.
Chapter 4 is all about controlling the flow of data: You learn how to split the data stream by distributing or copying the data to two or more steps (this is based on a good example: You start with a task list that contains records for various people. You then distribute the tasks to different output fields for each of these people). Maria explains properly how "distribute" and "copy" work. The concept is very easy to understand following her examples. In another example Maria demonstrates how you can use the filter step to send the data to different steps based on a condition. In some cases, the filter step will not be enough, hence Maria also introduces the "Switch/Case" step that you can use to create more complex conditions for your data flow. Finally Maria tells you all about merging streams and which approach/step best to use in which scenario.
In Chapter 5 it gets really interesting: Maria walks you through the JavaScript step. In the first example you use the JavaScript step for complex calculations. Maria provides an overview of the available functions (String, Numeric, Date, Logic and Special functions) that you can use to quickly create your scripts by dragging and dropping them onto the canvas. In the following example you use the JavaScript step to modify existing data and add new fields. You also learn how to test your code from within this step. Next up (and very interesting) Maria tells you how to create special start and end scripts (which are only executed one time as opposed to the normal script which is executed for every input row). We then learn how to use the transformation constants (SKIP_TRANSFORMATION, CONTINUE_TRANSFORMATION, etc) to control what happens to the rows (very impressive!). In the last example of the chapter you use the JavaScript step to transform a unstructured text file. This chapter offered quite some in-depth information and I have to say that there were actually some things that I didn't know.
In the real world you will not always get the dataset structure in the way that you need it for processing. Hence, chapter 6 tells you how you can normalize and denormalize data sets. I have to say that Maria took really huge effort in visualizing how these processes work. Hence, this really helps to understand the theory behind these processes. Maria also provides two good examples that you work through. In the last example of this chapter you create a date dimension (very useful, as everyone of us will have to create on at some point).
Validating data and handling errors is the focus of chapter 7. This is quite an important topic, as when you automate transformation, you will have to find a way on how to deal with errors (so that they don't crash the transformation). Writing errors to the log, aborting a transformation, fixing captured errors and validating data are some of the steps you go through.
Chapter 8 is focusing on importing data from databases. Readers with no SQL experience will find a section covering the basics of SQL. You will work with both the Hypersonic database and MySQL. Moreover Maria introduces you to the Pentaho sample database called "Steel Wheels", which you use for the first example. You learn how to set up a connection to the database and how to explore it. You will use the "Table Input" to read from the database as well as the "Table Output" step to export the data to a database. Maria also describes how to parameterize SQL queries, which you will definitely need to do at some point in real world scenarios. In next tutorials you use the Insert/Update step as well as the Delete step to work with tables on the database.
In chapter 9 you learn about more advance database topics: Maria gives an introduction on data modelling, so you will soon know what fact tables, dimensions and star schemas are. You use various steps to lookup data from the database (i.e. Database lookup step, Combination lookup/update, etc). You learn how to load slowly changing dimensions Type 1, 2 and 3. All these topics are excellently illustrated, so it's really easy to follow, even for a person which never heard about these topics before.
Chapter 10 is all about creating jobs. You start off by creating a simple job and later learn more about on how to use parameters and arguments in a job, running jobs from the terminal window and how to run job entries under conditions.
In chapter 11 you learn how to improve your processes by using variables, subtransformations (very interesting topic!), transferring data between transformations, nesting jobs and creating a loop process. These are all more complex topics which Maria managed to illustrate excellently.
Chapter 12 is the last practical chapter: You develop and load a datamart. I would consider this a very essential chapter if you want to learn something about data warehousing. The last chapter 13 gives you some ideas on how to take it even further (Plugins, Carte, PDI as process action, etc) with Kettle/PDI.
In the appendix you also find a section that tells you all about working with repositories, pan and kitchen, a quick reference guide to steps and job entries and the new features in Kettle 4.
This book certainly fills a gap: It is the first book on the market that focuses solely on PDI. From my point of view, Maria's book is excellent for anyone who wants to start working with Kettle and even those ones that are on an intermediate level. This book takes a very practical approach: The book is full of interesting tutorials/examples (you can download the data/code from the Pakt website), which is probably the best way to learn about something new. Maria also made a huge effort on illustrating the more complex topics, which helps the reader to understand the step/process easily.
All in all, I can only recommend this book. It is the easiest way to start with PDI/Kettle and you will be able to create complex transformations/jobs in no time!
You can purchase Pentaho 3.2 Data Integration: Beginner's Guide from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
My goodness, would it kill you to state what an acronym stands for the first time you use it?
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
PCMCIA - People Can't Memorize Computer Industry Acronyms.
Was it made things three times more complicated than it needed to be. We needed to integrate one of our products with another and the other product's developer recommended Talend and Pentaho for the job. After two days of looking through the documentation it was complete overkill for what we needed. So we said screw it and directly mapped to their database using JDBC and Plan Ole XML as our transport layer. That only took a day to build.
"The problem with socialism is eventually you run out of other people's money" - Thatcher.
So is PDI something like a database agnostic version of MSSQL DTS packages?
Sig it.
Seriously, I can't imagine how dumb some people are... complaining about acronyms that can easily be looked up on Wikipedia!
I mean, a quick search obviously reveals that ETL stands for Express Toll Lanes. Any slashdotter should know that these lanes are used by the many cars generated by the numerous analogies dotting slashdot "discussions".
And as for Pentaho... let's just break this word down into parts shall we? Penta is the root word for the number 5... duh! Of course, Ho is an accurate description of the only type of woman who will talk to the average slashdotter... assuming the slashdotter has a sufficient Benjamin supply.
So let's put all of this together shall we? This book is obviously about how you can pick up 5 hoes on a highway quickly and efficiently. This is a life skill that I'm sure many slashdotters are keenly interested in acquiring. How the hell anyone could possibly complain that the reviewer didn't expressly spell out these stupidly obvious terms is frankly beyond me.
AntiFA: An abbreviation for Anti First Amendment.
add PDI and ETL to my Resume. I wonder what they mean?
Pent (house) with ho's.
In my experience, ETL guys are the most obnoxious, self-important douches ever to walk the corridors of the building. Everything is "datamart this" and "database that", when all I can see is a handful of SQL hackers with a big budget and a loud boss.
I want to delete my account but Slashdot doesn't allow it.
Awesome review! Truly enlightening. Before I saw this article, I had absolutely no idea what Pentaho was, or why I would want it. Now, I know exactly what I'm getting both my friends for Christmas this year. I can't wait to discuss all 492 pages of this treasure with them in the new year.
A republic cannot succeed till it contains a certain body of men imbued with the principles of justice and honour.
When it takes a good 10 minutes of trawling TFA and Wikipedia just to find out what ETI and PDI stand for and what a datamart is, you know that the product is hyped up just enough to be worthless.
Sure, any or all of this stuff can be Google'd/Wikipedia'ed/etc., but does one want to go through that for an article summary? Especially when it would have been soooo easy to just expand the acronym...
Especially when it's standard journalism (and general writing) practice to expand acronyms the first time they're used, particularly when they are obscure.
To expect every reader to either know the definition of the acronym, or to search Google for it is the height of arrogance. It's also a good way to turn off readers.
Putting moderation advice in your
Now I am just left with the thought, is this "Intelligence" effort trying to market me? If so they are doing a pretty lousy job of it, seeing that after reading the article and Googleing I am still at a loss to explain what I just read. I regularly read about advanced mathematics in Relativity and Quantum Physics for fun, but I am obviously too stupid to understand marketing.
It is not nice to call Maria a ho, much less one of the penta variety. That's not just calling her a ho, but calling her a ho for 5 distinct reasons.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Yes, actually, I would like my memory refreshed on the exact expansion of SQL and XML, especially in a review of a tutorial product such as book that is supposed to help people solidify their knowledge about a topic. I have had a university class on databases. I have written a PostgreSQL-based order tracking system. I have maintained builds of PostgreSQL and other data base engines, but, going from memory I don't recall off the meaning of the "S" in SQL and would have to guess that it stands for Simple Query Language. I have occasionally but rarely used XML tools, and I also am a bit vauge on the meaning of X in XML (Extended? Markup Language). My time is valuable, and I don't want to have to look up the acronym for casual reading. So, yes, I would love to see a policy that all acronyms should be expanded the first time they're used in an article, which might also help encourage people to use single words when they are more appropriate, such as times when statements about a "Central Processing Unit" actually to all processors in a computer, not just the central ones (e.g, numerous micro-controllers) or to remind us of how much as has changed since some acronyms were coined, such as PCMCIA cards ("Personal Computer *MEMORY* Card International Association).
I'm going to expand on this one a bit. When it said data integration, I immediately found out that ETL might be Extract, transform, load. The only reason I know this is because I work for a TLA type company. Kettle seems to be the name of something that already has a name, "Pentaho Data Integration". I'm not sure why it has two names. It is also part of the Pentaho BI suite.
A good review would give us a link to this tool, so we can figure out if the book is even relevant. Otherwise the assumption is that everyone knows what it is an everyone is using it. http://kettle.pentaho.org/ There's a FAQ which deals with usage, not what it's about, and no overview. So despite finding the website myself I still have no idea what this thing does. Does it solve the problem of exporting data from MS SQL Server and re-loading it somewhere else? Cos that's what I need.
A good review would also indicate if it's a free and/or open source tool, so we can decide if we're even interested in the tool, let alone the book. The source is available and hosted on sourceforge, so that answers that. But there is a separate link under Products for PDI, with links to Buy. Is this a poor attempt at a slashvertisement? Why would I use kettle instead of PDI? Is there a difference? http://www.pentaho.com/products/data_integration/
A good review would also identify the audience of the book, letting people know who might use it. It's a datbase tool - if I'm a Microsoft shop would I have any interest in reading about this?
I'm a bit mystified about chapter 8, which sounds a whole heck of a lot like "apt-get install mysql-server" for those whom can't apt-get.
From what little info I have, this software seems to summarize to a super complicated way to push data in and out of databases. The kind of thing normal people would whip up write-once-read-never-again perl scripts full of obscene regexes and mysterious one liners, but if you'd rather do it differently, here's this giant complicated system written in Java and XML with the verbosity of COBOL that'll do more or less the same thing, but more slowly and complicatedly, for people whom don't know what SQL is or even how to install mysql.
Somebody please P.R. me and explain what this thing is, or why I'd want it, or what in the world I'd do with it.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
"A book about the open source ${ACRONYM} tool ${DICTIONARYWORD} (${NONSESNEWORD} Data Integration) is finally available. ${NONSESNEWORD} ${VERSION} Data Integration: Beginner's Guide by ${AUTHORNAME} is for everybody who is new to ${DICTIONARYWORD}. In a nutshell, this book will give you all the information that you need to get started with ${DICTIONARYWORD} quickly and efficiently, even if you have never used it before.The books offers loads of illustrations and easy-to-follow examples. The code can be downloaded from the publisher website and ${DICTIONARYWORD} is available for free from the SourceForge website. In sum, the book is the best way to get to know the power of the open source ${ACRONYM} tool ${DICTIONARYWORD}, which is part of the ${NONSESNEWORD} ${DIFFERENTACRONYM} suite.
The simple answer is that Kettle is a generic name that is very hard to copyright. Pentaho Data Integration and Kettle are synonyms although Kettle used a bit more often to identify the open source project.
As for the pentaho.com website... you would think that the webcasts, papers, etc would be hard to miss but hey I guess if you don't need a data integration tool you probably don't know what it's for.
After I did a Kettle lightning talk at FOSDEM a few years ago I met a student who was working on a thesis. He had been gathering data in a database, originating from some electron microscope (or something like that) for the past 6 months. He said if he had known about Kettle he could have done it in a few weeks at most. The problem is that reaching certain non-technical audiences is a very tough call. Heck, it's even hard to convince those people that claim it's faster to code it all in Java/C/C++/Perl/Ruby or even bf. (see other threads below)
News about the Kettle Open Source project: on my blog
I am glad to see someone has got a book out about this package. If you need something like Pentaho, then writing simple translation scripts is probably not where you want to be. Kettle has a steep learning curve, but has proven to be reasonably reliable, and very flexible.
ETL stands for Extract Translate Load. Basically you want to extract data out of your very normalized application database. Translate it into something that makes a little more sense for historical reporting and trending. Then load it into your data warehouse.
Considering that the author, María Carina Roldán, is Argentinian, it's obvious that "pentaho" is a misspelling for "pendejo". This book is about a latino asshole who drives an old truck very slowly in the express lane, ignoring all the honking cars behind him. The truck is slow because the radiator is boiling, its nickname is the "Kettle".
"What I'd do if I had 2.5 million dollars"
-Lawrence, Office Space
include $sig;
1;
Seen lots of negative Pentaho experiences here, and I'd generally agree. It's one of those "Open Source" projects which forces you into buying their commercial version because they've made it way too complex.
Luckly the Pentaho project is an umbrella which contains a number of seperate products, most of which were developed independantly. Which results in there being a big difference in the quality of each component.
From my experiences, Kettle is a really nice tool for ETL. It is, IMHO, easier to use than Microsoft's Integration Services (its closest competitor). It's straight forward, performs well and importantly can be used without using the rest of Pentaho.
Its nice to have real tangible documentation for this beast. it looked like it has a lot of promise and is powerful out of the box, without having to spend tons of $ on a commercial product but the documentation was dismal. 9 at least all that i have found.
ETL is not cheap, and if you have a small project, pretty much unattainable.
---- Booth was a patriot ----
Undefined TLA near every single fucking line. Bailing out, giving up and going home...
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
PCMCIA = People Can't Memorize Computer Industry Acronyms.
-- "This world is a comedy to those who think, a tragedy to those who feel."
...and yet, you still fail to define exactly wth kettle is.
I'm not interested in webcasts and whitepapers yet. I'm still looking for a two paragraph summary that tells me what this thing makes easier and HOW it makes it easier. Said summery should be followed by a feature highlight list.
Does it use a special syntax to define a formatting schema to translate data from one thing to another?
The information on the page itself reads like marketing speak.
"but hey I guess if you don't need a data integration tool you probably don't know what it's for."
I wouldn't know if this tool is potentially useful even if I did need a data integration tool.
PICNIC - Problem In Chair Not In Computer
Kettle is a visual programming tool to do data integration. Again, if you have no need for data integration, you won't be looking for it.
Here is a link to the FOSDEM presentation itself:
https://docs.google.com/viewer?url=http://archive.fosdem.org/2008/slides/lightningtalks/fosdem08_ltalk_kettle.pdf
News about the Kettle Open Source project: on my blog
What did they just call me?
As for the pentaho.com website... you would think that the webcasts, papers, etc would be hard to miss but hey I guess if you don't need a data integration tool you probably don't know what it's for.
"My Humble Blog"? You sound like an arrogant prick. Not to mention your third paragraph explains why your second paragraph is wrong: if the student knew what Kettle was in the first place, he could've saved a lot of time.
And finally, it is standard practice to tailor any piece of communication to the audience to which it is being communicated. It's likely we ./ readers know what SQL is, but ETL or whatever is less widely known and asking for its definition is not out of line. If you have an interest in the success of Pentaho/Kettle, and it appears you do, then tell people what it can do for them and even help them to find ways to use it to make their lives easier. Even in your response to the comment below mine, you say 'if you have no need for data integration, you won't be looking for it'. The student you spoke of had a need for data integration, but didn't know what to look for.
You're absolutely right, I'm totally arrogant. Try answering over seven thousand posts on the Kettle forums. My apologies in any case.
As for typing ETL or Data Integration in Google: you should try it.
You also seemed to mis the point that I did in fact reach the student in question by spending my spare time speaking at the open source conference, for free. The presentation I gave there was in fact tailored to people that don't know any data integration tools. The point I was trying to make was that efforts where you reach 50 or even a few hundred people at a time don't even make a dent in the huge crowd that doesn't know or doesn't *want* to know about even the possibility of using a data integration tool to get a job done, let alone an open source data integration tool. Without multi-million dollar marketing campaigns I wouldn't know what to do about it.
News about the Kettle Open Source project: on my blog