How Open Sourcing Made Apache Kafka A Dominant Streaming Platform (techrepublic.com)
Open sourced in 2010, the Apache Kafka distributed streaming platform is now used at more than a third of Fortune 500 companies (as well as seven of the world's top 10 banks). An anonymous reader writes:
Co-creator Neha Narkhede says "We saw the need for a distributed architecture with microservices that we could scale quickly and robustly. The legacy systems couldn't help us anymore." In a new interview with TechRepublic, Narkhede explains that while working at LinkedIn, "We had the vision of building the entire company's business logic as stream processors that express transformations on streams of data... [T]hough Kafka started off as a very scalable messaging system, it grew to complete our vision of being a distributed streaming platform."
Narkhede became the CTO and co-founder of Confluent, which supports enterprise installations of Kafka, and now says that being open source "helps you build a pipeline for your product and reduce the cost of sales... [T]he developer is the new decision maker. If the product experience is tailored to ensure that the developers are successful and the technology plays a critical role in your business, you have the foundational pieces of building a growing and profitable business around an open-source technology... Kafka is used as the source-of-truth pipeline carrying critical data that businesses rely on for real-time decision-making."
Narkhede became the CTO and co-founder of Confluent, which supports enterprise installations of Kafka, and now says that being open source "helps you build a pipeline for your product and reduce the cost of sales... [T]he developer is the new decision maker. If the product experience is tailored to ensure that the developers are successful and the technology plays a critical role in your business, you have the foundational pieces of building a growing and profitable business around an open-source technology... Kafka is used as the source-of-truth pipeline carrying critical data that businesses rely on for real-time decision-making."
The amount of corporate bullshit in TFS makes my head hurt and spin... at the same time.
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
The majority sports culture in the United States suffocates the non-adherents. Super Bowl is not inclusive of the full spectrum of society. Its existence only serves to show how little progress our society has made in the past decades and how much work remains to be done. Stop the oppression of America's football-rejecting subcultures now by making this year the last year of Super Bowl!!
More at 11
Hey, cock sucking fagget, are you in canada yet?
Have pity on a fake journalist!
The experience I've had testing Kafka with large amounts of data lead me to a couple conclusions.
Kafka is a lot of overhead to control streams, that don't solve the problems you are having when you need distributed streaming solutions. Primarily, bottlenecks, write speeds, read speeds, and processing performance irregularity (including debugging).
The idea that Kafka helps you with stream processing in a way that more traditional methods (load balancing, splitting on load, processing in parallel) can't or don't or that it's easier, is false.
If you're on an ec2, open a socket to S3 and write, have something process it. You'll save a lot of cycles (in every way).
This article is some slick IBM-style marketing which is not very helpful to people with existing technical challenges.
That was Kafkaesque.
we are building a massive igloo over all canada to keep.....everyone out.....we even have 30 foot mutated polar bears imported from ...alaska to guard us....oh and a giant 50 foot beaver is said to be wandering about eating any americans it finds....
because getting through those docs and getting the whole d*** thing to work is creating more jobs than it proclaims to save.
Yikes, reality surpasses Dilbert.
Suppose you have some service that produces data. This service might be on one server, or a group of servers.
Some other service receives this data. Perhaps the receiving service transforms the data in some way before passing it along to some other system.
Kafka helps with that. It avoids some simple problems. For example, I once worked on a system in which a cron transferred the data at midnight each day. Each day, it sent over that day's data. Records created right at midnight might get skipped, or might get sent twice. In case of a network glitch, you'd have to manually retry in the morning. Kafka avoids those kinds of problems.
Kafka is built on the idea that both producers and consumers may be groups of partially redundant servers, with the data split up between different servers. Kafka has features to enable load balancing.
So it's appropriate where you want to get data from some group of servers to another group, possibly through a middle group, you want it reliable, load balanced, etc, without inventing and later correcting your own protocols.
> The idea that Kafka helps you with stream processing in a way that more traditional methods (load balancing, splitting on load, processing in parallel) can't or don't or that it's easier, is false.
My read was not that Kafka is supposed to *replace* "load balancing, splitting on load, processing in parallel", but that it's intended to *enable* "load balancing, splitting on load, processing in parallel". Not that it does something that load balancing doesn't do, but that it provides a proven load balancing solution, or at least some key parts.
"An anonymous reader" ... named Neha Narkhede, by any chance?
I've got no idea what Kafka does, and the summary really doesn't tell you much at all. I was about to put in a helpful post saying what it is, but even after visiting their home page I've still got no idea.
Apparently Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
How about the Intro
We think of a streaming platform as having three key capabilities:
It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system.
It lets you store streams of records in a fault-tolerant way.
It lets you process streams of records as they occur.
What is Kafka good for?
It gets used for two broad classes of application:
Building real-time streaming data pipelines that reliably get data between systems or applications
Building real-time streaming applications that transform or react to the streams of data
OK, I still am not really sure what it does.
Specialist Mac support for creative pros, Melbourne
Frameworks like this are invariably heavyweight and have a real appreciable cost to use. Be sure you actually have enough work to do with one to make the investment (and continuing care-and-feeding headache) worth the time and money. Otherwise something built directly on your business process will be a much better solution, though you may have to periodically defend it against the framework salesmen.
WTF was all that gibberish? Can someone tell me what this thing actually does?
Thank you very much for the clarification.
I am a Business Intelligence Analyst and to my shame I had never heard of this solution, or maybe I had but it was so riddled with buzzwords and corporate bullshit that it became unintelligible to plebs like me.
Yes, I can see quite a few use cases for it. If they only used your words to describe it :)
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
I had no idea what that Kafka was about before this, but it sure does sound like a BizTalk clone. The only difference seems to be that they gave some new material for the buzzword bingos.
If you think Kafka is like BizTalk, I suggest you look at the documentation / download it, it's nothing like it. It's a highly scalable, ultra-high throughput messaging system. The stream processing API is just a bolt on the side, but again is nothing like BizTalk. Kafka has been proved in production to handle 1.2 trillion messages per day, no way BizTalk can do that.
Dum spiro spero
You mean like BizTalk?
It's not like you couldn't have JFGI'ed when you encountered something you had never heard of.
From what I can dig up about it, Kafka is just the message queue. BizTalk uses a message queue along with all of the routing, transformation, and customization stuff, all in one package.
Basically, if you're running a Microsoft platform, you don't need Kafka because WCF or MSMQ or some other suitable alternative is already built in to the OS and dev tool chain. BizTalk is built on top of those things and goes way above and beyond Kafka. Also, BizTalk is built for heavier message loads than Kafka is, with maybe (maybe!) not as much throughput... unless you need it, then you can cluster it (or rent it on Azure, because, as the GP said, your groin will suffer).
More LIke MQ with more optional plumbing?
I have never, ever, heard those words used to describe BizTalk
Dum spiro spero
Ho ho ho!
Dum spiro spero