Here's one area I have to give Apple some props in: their OSX interface puts some damn pretty and friendly makeup on the pig that was the old FreeBSD interface
Small correction:
s/Apple/NeXT/ s/FreeBSD/BSD/
What Apple's acquisition did was give the NeXT team money to update OpenStep for a next generation of hardware and throw marketing dollars at it to put it in front of people. Don't get me wrong: Apple's work post-acquisition on updating the interface was fantastic, but let's give credit where credit is due.
I really hate the reporting around Hadoop. Most of these people have absolutely no clue what they are talking about, and this article is just another example of that. Any bit of simple research would have revealed that the actual open source community of developers around Hadoop, Hive, Solr, etc, can be found at ApacheCon. Of course Strata is amazingly commercial: O'Reilly, being a corporate entity, is trying to make cash around the latest craze. If they weren't, they'd make sure the ASF and the other OSS organizations that help make the software had some space and would actually attend.
The easiest way to find a company who hires for open source work is to look at who is actually submitting patches back, participates on mailing lists, files bugs, etc. From my own experiences, it seems as though almost every Bay Area startup or former startup from the past 10 years (but clearly not all of them) are doing work in open source either out in the open or behind closed doors. Many positions don't have open source in big bright letters, so you might need to just flat out ask. If you are outside of the Bay Area, those companies exist but will require more legwork.
I don't see why anyone would not want to use the GPL if they want their software to be free and open. Why create something, give it out for free, and then allow businesses to take your work, profit from it, and give nothing back? Maybe these developers are hoping to get bought out by a large company someday?
There are many businesses that want to profit from their own open source projects by including them or parts of them in other, proprietary works. The GPL essentially makes that impossible.
but I see no reason that it couldn't serve you well as a large personal file service.
HDFS is not POSIX or mountable. So actually using the data from something that is expecting POSIX is going to painful. "But there is a FUSE plug-in!" Yes, there is, but you'll take a 60% perf hit using it, assuming that it still works in newer versions of Hadoop. See none of the hardcore devs actually use it, so there is a very good chance it is completely busted.
In any case, there are still problems around losing the fsimage and having no real HA for the NN, needing quite a bit of RAM for any significant amount of files, don't forget that 8TB now turns into at least 24TB counting the 3x replication factor, etc, etc, etc.
So no, really this isn't a solution for this particular problem.
We've been using the Buffalo modified version of DD-WRT for a few months now. It replaced a Linksys E3k that was continually dropping connections. Overall, we're pretty happy with it (QoS, DHCP, etc). I'll definitely check on the link speed, although it is connected to DSL modem that can't do gigabit anyway.:)
This isn't about Microsfot getting involved with open source. This is about Microsoft not getting left out. Beyond the countless startups, Apache Hadoop already has major players like Amazon, Dell, EMC, HP, IBM, NetApp, Oracle, VMware,... trying to make a dent in the community in some form or another. Hell, I have a SuperMicro catalog on my desk emblazoned with the Apache Hadoop logo all over it. Like Oracle, they are coming in very late to the party and now need to play catch-up. Buying off Hortonworks is a very fast way to do that.
Actually, there is an ever increasing amount of JNI (read: C) code in Hadoop that is in the critical path for security and performance features. Most of that code is not very portable. So either MS is going to pay for some major overhauling of that code, completely new code/branch to replicate that functionality or MS Hadoop is going to be severely lacking in features/performance.
In the case of a Hadoop task failure, the errant monkey was genetically cloned but put into a different environment. So it also served as a nature vs. nurture experiment.:)
Yup, I realize that going to Atom or ARM for a CPU bound process is suicide, but so is only using the tiny amount of money to try and solve the problem.:)
This is absolutely correct and if I had mod points, I'd spend them here.
If your budget is only £4000, you don't have the funding to build a real, actual grid for something that is CPU bound. If you are lucky, you have enough to get one or two boxes and some network gear to put on the top of someone's desk.... at least if you are doing AMD or Intel higher end procs.
Here are two ideas worth exploring...
1) Look at boxes like SeaMicro and other Atom-based mini-grids-in-a-box. 2) Look at building your own with Atom- and Arm- based machines
I wonder if PHP has the same problem we do in Hadoop-land... the lack of enough qualified security people interested enough in a project to actually review code. For example, I'd love for someone with a clue to review Alfredo ( http://cloudera.github.com/alfredo/docs/latest/index.html ) before we build a dependency on it ( https://issues.apache.org/jira/browse/HADOOP-7119 ) . But it seems as though getting the right people involved is extremely difficult.:(
Harmony is an effort that was begun and shepherded by Amanda Brock"
To some of us, Harmony is the name of Apache's Java implementation. Sort of surprising that this naming clash wasn't considered given the context. Heck, TFA even mentions Apache HTTPD.
Someone is watching all of the complete episodes in order. As someone who has watched almost all of the remaining episodes (in fact, I started watching Troughton's War Games episode for the 3rd time last night), it is fascinating hearing an outsider's perspective of some of the episodes.
I tend to agree with most of what has been said here. For Classic, start with Baker and work your way up. Even though I still think most of Pertwee, Troughton and Hartnell is great to watch, there are a lot of slow episodes in those first three Doctors that you likely won't survive.
For modern, start with Eccleston (altho he wasn't my favorite) if only because there is a lot of background provided that you'll need for the rest of the (new) series.
... and to make matters worse, top500 is based primarily on LINPACK. So top500 is really a measure as to how fast something can do floating point with a distributed shared memory model and not much else. Most of the systems listed in the top500 would fail miserably at heavy IO loads, which is what most of the increasingly common Big Data problems need. It concerns me that manufacturers are building systems for one top of heavy duty computing based on top500 while ignoring the others.
To me, this is just more evidence that they will be dropping OS X and moving to iOS for all devices over the next five years. If they were to introduce a new Xserve now, I suspect that the support date is past whatever EOL date they have in mind for OS X. What is essentially an appliance OS won't work for what are technically meant to be back end servers except for very limited applications. The people who buy the most Xserves (HPC, etc) do not fall into that category.
They likely don't have security personnel. I wouldn't be surprised if there were some "just get in the way" or "security can be added later" or "who cares" or any number of other statements made that clueless developers later on regret.
ON-TAP lacks a decent Kerberos stack. It is embarrassingly ancient in its support of encryption types, which makes it pretty much a non-starter for secure NFS.
I wish I had mod points. I really really do.
Here's one area I have to give Apple some props in: their OSX interface puts some damn pretty and friendly makeup on the pig that was the old FreeBSD interface
Small correction:
s/Apple/NeXT/
s/FreeBSD/BSD/
What Apple's acquisition did was give the NeXT team money to update OpenStep for a next generation of hardware and throw marketing dollars at it to put it in front of people. Don't get me wrong: Apple's work post-acquisition on updating the interface was fantastic, but let's give credit where credit is due.
I really hate the reporting around Hadoop. Most of these people have absolutely no clue what they are talking about, and this article is just another example of that. Any bit of simple research would have revealed that the actual open source community of developers around Hadoop, Hive, Solr, etc, can be found at ApacheCon. Of course Strata is amazingly commercial: O'Reilly, being a corporate entity, is trying to make cash around the latest craze. If they weren't, they'd make sure the ASF and the other OSS organizations that help make the software had some space and would actually attend.
The easiest way to find a company who hires for open source work is to look at who is actually submitting patches back, participates on mailing lists, files bugs, etc. From my own experiences, it seems as though almost every Bay Area startup or former startup from the past 10 years (but clearly not all of them) are doing work in open source either out in the open or behind closed doors. Many positions don't have open source in big bright letters, so you might need to just flat out ask. If you are outside of the Bay Area, those companies exist but will require more legwork.
... except 100 gigabytes is not 1 terabyte.
I don't see why anyone would not want to use the GPL if they want their software to be free and open. Why create something, give it out for free, and then allow businesses to take your work, profit from it, and give nothing back? Maybe these developers are hoping to get bought out by a large company someday?
There are many businesses that want to profit from their own open source projects by including them or parts of them in other, proprietary works. The GPL essentially makes that impossible.
but I see no reason that it couldn't serve you well as a large personal file service.
HDFS is not POSIX or mountable. So actually using the data from something that is expecting POSIX is going to painful. "But there is a FUSE plug-in!" Yes, there is, but you'll take a 60% perf hit using it, assuming that it still works in newer versions of Hadoop. See none of the hardcore devs actually use it, so there is a very good chance it is completely busted.
In any case, there are still problems around losing the fsimage and having no real HA for the NN, needing quite a bit of RAM for any significant amount of files, don't forget that 8TB now turns into at least 24TB counting the 3x replication factor, etc, etc, etc.
So no, really this isn't a solution for this particular problem.
The fuse support has likely gotten worse since no one on the core dev team really spends any time with it. I'd be surprised if it still compiles.
+1 on this one.
We've been using the Buffalo modified version of DD-WRT for a few months now. It replaced a Linksys E3k that was continually dropping connections. Overall, we're pretty happy with it (QoS, DHCP, etc). I'll definitely check on the link speed, although it is connected to DSL modem that can't do gigabit anyway. :)
It isn't. There is an incredible overuse of glibc/Linux-isms to the point that even porting it to another UNIX is difficult.
This isn't about Microsfot getting involved with open source. This is about Microsoft not getting left out. Beyond the countless startups, Apache Hadoop already has major players like Amazon, Dell, EMC, HP, IBM, NetApp, Oracle, VMware, ... trying to make a dent in the community in some form or another. Hell, I have a SuperMicro catalog on my desk emblazoned with the Apache Hadoop logo all over it. Like Oracle, they are coming in very late to the party and now need to play catch-up. Buying off Hortonworks is a very fast way to do that.
Actually, there is an ever increasing amount of JNI (read: C) code in Hadoop that is in the critical path for security and performance features. Most of that code is not very portable. So either MS is going to pay for some major overhauling of that code, completely new code/branch to replicate that functionality or MS Hadoop is going to be severely lacking in features/performance.
In the case of a Hadoop task failure, the errant monkey was genetically cloned but put into a different environment. So it also served as a nature vs. nurture experiment. :)
Yup, I realize that going to Atom or ARM for a CPU bound process is suicide, but so is only using the tiny amount of money to try and solve the problem. :)
This is absolutely correct and if I had mod points, I'd spend them here.
If your budget is only £4000, you don't have the funding to build a real, actual grid for something that is CPU bound. If you are lucky, you have enough to get one or two boxes and some network gear to put on the top of someone's desk.... at least if you are doing AMD or Intel higher end procs.
Here are two ideas worth exploring...
1) Look at boxes like SeaMicro and other Atom-based mini-grids-in-a-box.
2) Look at building your own with Atom- and Arm- based machines
Yes, but I don't think Yahoo! is adding any more universities to it. In fact, I don't think it ever expanded beyond CMU.
ObDisclosure: I was on the ops-side of that project for Yahoo!.
I wonder if PHP has the same problem we do in Hadoop-land... the lack of enough qualified security people interested enough in a project to actually review code. For example, I'd love for someone with a clue to review Alfredo ( http://cloudera.github.com/alfredo/docs/latest/index.html ) before we build a dependency on it ( https://issues.apache.org/jira/browse/HADOOP-7119 ) . But it seems as though getting the right people involved is extremely difficult. :(
To some of us, Harmony is the name of Apache's Java implementation. Sort of surprising that this naming clash wasn't considered given the context. Heck, TFA even mentions Apache HTTPD.
For the record, we actually used ApplixWare prior to StarOffice.
Even if the schedule is "Tuesday-9am: Give trach to Mrs. Lattimer"?
Someone is watching all of the complete episodes in order. As someone who has watched almost all of the remaining episodes (in fact, I started watching Troughton's War Games episode for the 3rd time last night), it is fascinating hearing an outsider's perspective of some of the episodes.
I tend to agree with most of what has been said here. For Classic, start with Baker and work your way up. Even though I still think most of Pertwee, Troughton and Hartnell is great to watch, there are a lot of slow episodes in those first three Doctors that you likely won't survive.
For modern, start with Eccleston (altho he wasn't my favorite) if only because there is a lot of background provided that you'll need for the rest of the (new) series.
... and to make matters worse, top500 is based primarily on LINPACK. So top500 is really a measure as to how fast something can do floating point with a distributed shared memory model and not much else. Most of the systems listed in the top500 would fail miserably at heavy IO loads, which is what most of the increasingly common Big Data problems need. It concerns me that manufacturers are building systems for one top of heavy duty computing based on top500 while ignoring the others.
To me, this is just more evidence that they will be dropping OS X and moving to iOS for all devices over the next five years. If they were to introduce a new Xserve now, I suspect that the support date is past whatever EOL date they have in mind for OS X. What is essentially an appliance OS won't work for what are technically meant to be back end servers except for very limited applications. The people who buy the most Xserves (HPC, etc) do not fall into that category.
They likely don't have security personnel. I wouldn't be surprised if there were some "just get in the way" or "security can be added later" or "who cares" or any number of other statements made that clueless developers later on regret.
ON-TAP lacks a decent Kerberos stack. It is embarrassingly ancient in its support of encryption types, which makes it pretty much a non-starter for secure NFS.