How Open Sourcing Made Apache Kafka A Dominant Streaming Platform (techrepublic.com)

← Back to Stories (view on slashdot.org)

How Open Sourcing Made Apache Kafka A Dominant Streaming Platform (techrepublic.com)

Posted by EditorDavid on Sunday February 5, 2017 @11:39AM from the world-domination dept.

Open sourced in 2010, the Apache Kafka distributed streaming platform is now used at more than a third of Fortune 500 companies (as well as seven of the world's top 10 banks). An anonymous reader writes: Co-creator Neha Narkhede says "We saw the need for a distributed architecture with microservices that we could scale quickly and robustly. The legacy systems couldn't help us anymore." In a new interview with TechRepublic, Narkhede explains that while working at LinkedIn, "We had the vision of building the entire company's business logic as stream processors that express transformations on streams of data... [T]hough Kafka started off as a very scalable messaging system, it grew to complete our vision of being a distributed streaming platform."

Narkhede became the CTO and co-founder of Confluent, which supports enterprise installations of Kafka, and now says that being open source "helps you build a pipeline for your product and reduce the cost of sales... [T]he developer is the new decision maker. If the product experience is tailored to ensure that the developers are successful and the technology plays a critical role in your business, you have the foundational pieces of building a growing and profitable business around an open-source technology... Kafka is used as the source-of-truth pipeline carrying critical data that businesses rely on for real-time decision-making."

2 of 48 comments (clear)

Min score:

Reason:

Sort:

helps you what in the what? by war4peace · 2017-02-05 11:52 · Score: 5, Interesting

The amount of corporate bullshit in TFS makes my head hurt and spin... at the same time.

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
Moves data from this cluster to that cluster relia by raymorris · 2017-02-05 14:35 · Score: 5, Informative

Suppose you have some service that produces data. This service might be on one server, or a group of servers.
Some other service receives this data. Perhaps the receiving service transforms the data in some way before passing it along to some other system.
Kafka helps with that. It avoids some simple problems. For example, I once worked on a system in which a cron transferred the data at midnight each day. Each day, it sent over that day's data. Records created right at midnight might get skipped, or might get sent twice. In case of a network glitch, you'd have to manually retry in the morning. Kafka avoids those kinds of problems.
Kafka is built on the idea that both producers and consumers may be groups of partially redundant servers, with the data split up between different servers. Kafka has features to enable load balancing.
So it's appropriate where you want to get data from some group of servers to another group, possibly through a middle group, you want it reliable, load balanced, etc, without inventing and later correcting your own protocols.