Spark Advances From Apache Incubator To Top-Level Project

← Back to Stories (view on slashdot.org)

Spark Advances From Apache Incubator To Top-Level Project

Posted by timothy on Saturday March 1, 2014 @01:50AM from the distribution-solution dept.

rjmarvin writes "The Apache Software Foundation announced that Spark, the open-source cluster-computing framework for Big Data analysis has graduated from the Apache Incubator to a top-level project. A project management committee will guide the project's day-to-day operations, and Databricks cofounder Matei Zaharia will be appointed VP of Apache Spark. Spark runs programs 100x faster than Apache Hadoop MapReduce in memory, and it provides APIs that enable developers to rapidly develop applications in Java, Python or Scala, according to the ASF."

10 of 24 comments (clear)

Min score:

Reason:

Sort:

Not good by Anonymous Coward · 2014-03-01 02:00 · Score: 1

Generally when Spark advances you get engine knock.
1. Re:Not good by Hognoxious · 2014-03-01 03:54 · Score: 1
  
  I not only get your joke, but I've adjusted the spring-and-weights thingy that controls it. I can't remember its name, mind.
  Shit, I'm getting old.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
2. Re:Not good by TheRealHocusLocus · 2014-03-02 06:18 · Score: 1
  
  spring-and-weights thingy
  
  I think you mean a governor, guv'ner. The origin of the Motor-Operated Pushover is aptly described here, "To think, all I had to do was put the balls on the other side! Aren't they beautiful?"
  I like the Future, I'm in it.
  
  --
  <blink>down the rabbit hole</blink>
3. Re:Not good by NelsChristian · 2014-03-02 07:00 · Score: 1
  
  Governor? How about adjusting the contact points which you'd find in the distributor?
I'm gonna tinker with it by Mister+Liberty · 2014-03-01 02:15 · Score: 1

Only thing -- where do I get my big data?
And Tachyon boosts Spark another 2-8x by michaelmalak · 2014-03-01 03:15 · Score: 2

Spark runs programs 100x faster than Apache Hadoop MapReduce in memory

And Tachyon, another component of Matei's Berkeley Data Analytics Stack, boosts Spark another factor of 2-8x by sidestepping JVM garbage collection issues.
Re:I hope this is far better than Apache Solr by iggymanz · 2014-03-01 03:54 · Score: 1

the target market for Solr is the "enterprise". big corporations who have developers and operations people on staff with heavy duty skills.
don't cry because because you can't handle it
Re:I hope this is far better than Apache Solr by jockm · 2014-03-01 03:57 · Score: 1

So do you judge every Apache project this way? Are Apache, Tomcat, Commons, Batik, CouchDB, etc etc etc all crap until proven otherwise because of Solr? Apache is a collection of projects, maintained by different people.
And not to trash your friend's company, but he picked a technology without trying it out yet? Then that company had bigger problems that Solr. Nor would I judge Solr by that story (I have never used Solr, nor am I involved with it in any way).

--

What do you know I wrote a novel
Spark rarely performs as well as advertised by Anonymous Coward · 2014-03-01 06:04 · Score: 1

On one carefully selected benchmark, discounting a lot of things that matter (like data movement) spark performs better than Hadoop. Tech reports generated by the authors suggest that this is a corner case and that the variance in spark performance is wildly variable. Don't believe the hype.
1. Re:Spark rarely performs as well as advertised by techhead79 · 2014-03-01 07:49 · Score: 1
  
  I think the major advantage to using Spark isn't just in the performance but in using libraries such as MLBase/MLLib. Is this not correct? While I realize R is mostly adopted in the industry, MLLib seems to be catching up very fast.