The headline is misleading. It sounds like Facebook actually wants to leverage blockchain technologies to accomplish tasks, and provide value/security for their services. That is distinct from a "cryptocurrency." The latter makes use of a distributed ledger (typically a blockchain, more in a second) to create a medium of exchange. This medium of exchange is essentially a commodity, like gold (difficult to acquire, with a limited supply), which is traded to settle debts (i.e. buy stuff), a function typically performed using money. In other words a cryptocurrency is a commodity being used as money. For whipper snappers this is a foreign concept. For old farts like myself, we still recall the days when gold was used as a common medium of exchange in the barter system. "Money" was invented to make payments easier, because large amounts of gold are bulky, and heavy. Instead paper was traded, initially the paper notes were issued by "banks" (I use that term loosely), and that paper could be redeemed from said issuer for gold. That function was eventually taken over by governments, first state, then federal. The US dollar was backed by gold from 1879 to 1933. A cryptocurrency is the electronic equivalent of this system.
What makes cryptocurrencies possible is the invention of "blockchain" technology. A blockchain is essentially a public ledger that has been distributed. What's unique is how the blockchain uses cryptography to ensure a consensus among untrusted parties over what transactions are on the blockchain, and thus valid. That is the "proof of work" idea, whereby the network only accepts blocks which included the requisite proof of work. Cryptographic signatures are used to create transactions, where a signed transaction transfers a given unit from one private key to another. The holder of the associated private key is in essence the holder of the units, as anyone with access to the private key can create valid transactions which transfer (or spend) that unit. The blocks themselves are linked together cryptographically, using hashes, such that a modification to a given block would invalidate all of the blocks that follow it. That is why we talk about confidence in transactions, which that confidence governed by the number of blocks which follow it.
It's worth noting that many cryptocurrencies don't use true blockchains to record transactions. They only use cryptographically signed transactions to validate transfer. These are essentially gift cards, as the ledger can be altered, modified, etc, if the group in charge wants it.
Like I said above, with bitcoin, and most cryptocurrencies, you can see every transaction that has ever been made, and how much was transferred. If you determine who controlled the addresses (aka private keys), you know who was behind a transaction. This is incredibly easy. That is why, I exclusively hold and use Monero, which is an improvement on the original blockchain concept (aka bitcoin) which makes use of cryptography to validate, but also mask transactions. This provides significantly more privacy.
I use the Kinesis Advantage (w the foot pedals for the modifiers). I have three of them already, but all of them are the standard model (1 in storage). This article convinced me to buy their new Low-Force version, which uses the Cherry MX Red switches. I'm hoping it helps with my arthritis. Its a recent addition to the Kinesis Advantage family, and one that wasn't available when I purchased my current HID. L~
I am one of the admins for the free email service Lavabit. We have a graph on the net showing adoption, built from about 150k messages a day. (We don't include messages for users who have disabled this inbound check, or for messages which are blocked for some reason other than SPF.)
My problem is the thought of an external audit. I have sensitive user information on my network. I am charged with keeping that information secure. You don't keep that type of data secure by opening it up to outsiders and letting them run scans/traces/sniffs, or whatever else it is they do on your network while its transmitting sensitive data.
In general, enterprise has come to mean overpriced and underperforming. By making something "enterprise" your saying you designed it such that you can throw money at the problem. By breaking things into multiple "tiers" your saying that if any one tier gets overloaded, you can fix the problem by throwing money/hardware at that teir.
From my perspective, the best way to solve the performance/reliability problems are with sound design, good programming (algorithims), and careful tool selection. That means architecting your app so that there is no single point of failure. If one node goes down, can the other nodes recover and continue functioning? In the same vein, can you add nodes to the cluster and scale increased loads across more machines without encountering bottlenecks? Its been documented elsewhere, but you want a(n) algorithims, not a(log n) algorithims. All to often the answer to scaling "enterprise" software is buy a bigger box. That can get expensive very fast. The better, albeit more difficult solution is to write the app so that multiple machines can work in concert. And finally, making sure that the tools you use will be able to scale. In general this relates to what database system, and libraries you use (and not so much the language).
I'll address the language issue too. A lot of people have mentioned Erlang. While I think its a great language for server applications, there just isn't the community support to make it a pracitical choice. (Exception, if you don't need libraries or you plan to write _everything_ yourself, as is often the case for embedded systems, then maybe Erlang is a good choice. Hence, why you find Erlang is routers.) Erlang also is problematic because of the small number of people skilled in its use. For me it really comes down to choosing C on Linux, or C# on Windows. (I've written scalable apps, supporting several hundred thousand users using both.) Its a simple fact that it takes less time to write production code in C# (or Java) than in C (or C++). So what you need to ask yourself is whether the efficiency savings of the former outweigh the added development cost of the latter.
One of the apps I wrote is the SMTP/POP/IMAP server used to support my free e-mail service (http://lavabit.com/). For that project, hardware was comparatively expensive (I paid for everything myself), and my time was relatively cheap. (I started by only working on the code in between consulting gigs.) So it made sense to write the app in C, and use Linux. Over time, the efficiency savings have made the decision, while painful at times, the correct one. I'm able to support 70K users very cheaply. If I had chosen C#/Windows, I might have gotten the project done faster, but I'd need more expensive hardware. (I'm using Dell 1650's at the application tier, with beefier machines at the database/storage tier. Note, I have a two tier architecture.) I would also have had to shell out lots of dollars for Windows licenses. It just didn't make sense. For more on my mail server, read this other post http://slashdot.org/comments.pl?sid=191034&cid=157 11157.
Another large project I worked on was a social networking site sponsored by a large carbonated beverage firm. In this case the pockets were deeper but the timeline was shorter. So it made sense to write the app in C#. In reminds me of the saying that in software development you have three factors: cost, quality and time. You get to pick two of those, but not three.
I'll close by saying again, the best way to solve the performance/reliability problems are with sound design, good programming (algorithims), and careful tool selection.
If you want to break into the gaming industry, sign up for the Guildhall at SMU. If memory serves, its an _intense_ 18 month program. I believe somewhere around 96% of the graduates end up with jobs in the gaming industry.
I should probably mention that I've chosen to use CentOS 4.3, running on dual processor Dell 1650 servers. All of my code is written in C, and compiled with GCC.
This is important because it means I use the 2.6 version of the Linux kernel, and the CentOS/RHEL version of the kernel uses the Native Posix Thread Library (NPTL) by default.
This debate hits home with me. I wrote a server daemon to handle the SMTP and POP protocols, and when I first started out I had to make a choice. The choice I made back then was to use a threaded model. The way it works is I spawn X threads which collectively use blocking calls to the accept() function. Each thread will only return from accept() once they have been assigned a new connection by the kernel. For performance I spawn the threads ahead of time. This architecture was a mistake. The issue is that I have to spawn a seperate pool of threads to listen on port 25 (SMTP), port 110 (POP), port 465 (SMTP over SSL) and port 995 (POP over SSL). With this model if I could end up with extra threads listening on port 25, when I need more threads listening and processing connections on port 465. This problems leads me to overcompensate by spawning _extra_ threads just in case. Of course this strategy wastes resources as now extra threads eat memory without benefit.
To address the SEGFAULT issue, ie one rouge thread taking the whole system down, I also fork multiple processes. In my case I fork 12 processes with 128 threads each. If one process gets killed by a SEGFAULT, the remaining processes continue to work. When I first launched the system, and it faced a torrent of email... 100K+ messages a day, I would have about one process die every 24 hours. With careful debugging work, I've gotten the code stable enough now that I haven't lost a process in about 9 months.
My theory when I first wrote this code was to leave scheduling to the kernel. I figured that if a thread was blocked waiting for IO data the kernel wouldn't schedule time slices for it. This meant those extra threads sat in the background waiting, but not using CPU time. I am starting to wonder whether this is a good theory? I am considering switching to a different model (more on that in a second), but am not sure which one is best? By the way, the reason each process has so many threads is for DB connection pooling. Each process gets 8 DB connections which are shared between the 128 threads. Each process also has its own copy of the antivirus database. I know its possible, but trying to share DB connections and data between processes is much more difficult.
I plan to refactor this code soon, and have been struggling with what to do. I am curious to hear the thoughts of others?
The current plan is to move to a model where I spawn a single thread for each port. When these listening threads have a new connection, they dump the socket handle, and the the protocol into a buffer. I would then also spawn a pool of worker threads which read the incoming connections out of the buffer. Using semaphores and reflection these worker threads would pickup incoming connections and feed them to the right function depending on the protocol. I think this model would work much better than what I have now, but is this the best option?
The other option is to create system where I spawn only 8 worker threads (or some similar number). This pool of 8 threads then uses epoll() to find out which sockets need attention and address them accordingly. The problem with this model is that if an incomplete message is receieved, the thread couldn't process all the way into the output stage. Instead the data would need to be stored until the message sending was complete. Let me give an example, the thread might get "RCPT TO: " the first time it checked a socket. The thread stores this incomplete message. Then the second time around another thread picks up "example@example.com". The thread assembles the message into "RCPT TO: example@example.com" and then processes the entire command accordingly.
Does this model work better? Keep in mind that when DB calls need to be made, the MySQL library won't work the same way. A slow database server could hang all 8 worker threads effectively killing the model. There are also SPF, SPAM and Virus libraries. Any one of them could tie up a thread for an extended period, thereby killing this model. What does everyone else think? Am I not thinking about an event processing model correctly? Or is that this type of daemon is better off served using the one thread per connection model?
I've used Kasamba.com to hire programmers for small projects I didn't have time to complete myself. There is a Kasamba competitor, but the name doesn't come to mind.
Just a quick note, I run an email service, and I've had the most problems getting past blocks from:
AOL Excite Comcast
The easiest was AOL, they have a number you can call 24 hours a day to get removed (but it takes 48 hours for the removal to take effect). The other two have been blocking mail from my servers for two weeks. I have filled out contact forms, and left voicemails to no avail.
I haven't recieved a complaint about Verizon yet, but that could be because I have SPF records.
I started a free email service to compete with Gmail about 2 months after Gmail launched. For those interested, the name is Nerdshack.com.
At first I used Postfix and Cyrus, but I found it to be a nightmare when your talking about more than 50k accounts.
What I wanted was an email platform that integrated with ClamAV, DSPAM, supported SPF, Greylisting/Blacklisting/Whitelisting, and was all controlled from a MySQL database. I also wanted it to support SSL, and clustering.
Frankly I didn't find anything. So I wrote my own. This may not be your cup of tea, so if not I reccomend looking at DB Mail (www.dbmail.org) and Cyrus (asg.web.cmu.edu/cyrus). Both are compotent mail servers, can be built to support a large user base. The problem I had was expanding their feature sets.
As has been mentioned numerous times above, getting stock open source software to support a large user base is a huge pain. Combine that with trying to add in things like DSPAM, SPF and ClamAV, and your going to be faced with a nightmare. The system you end up with will be a kluge of hacks, custom scripts, and chewing gum. To me that seemed to much like a house cards. On top of which, most open source sytems do not handle large quota accounts very well. Run benchmarks against your favorite mail server using a 10 to 20 gig mail store for a single user. You will quickly find that even maildir struggles with that many files. (Hint, make sure you use ReiserFS at least.)
So I went the route of Gmail, Yahoo, and Hotmail. I wrote my own, and after a few early bumps in the road, its pretty solid. I've had 100% uptime for over 300 days, util last month when I moved datacenters.
Basically how I set things up is I have an Alteon AD4 load balancer that balances traffic amongst my application cluster. This app cluster runs my custom code which speaks SMTP and POP (IMAP is about 75% done), and interfaces with DSPAM, ClamAV, libspf, OpenSSL, LZW (for compression), and supports a host of other features. I even support using public key encryption (ECC) to store messages on disk with your public key, and then encrypt your private key with AES256 using your password as the key. Its seemless to the end user, but guarantees privacy while the mail is on my servers. I even created a point system to allow me to automatically block IP addresses that attempt dictionary attacks, etc (though its disabled at the moment). Each server caches everything it can to reduce database load, and uses a connection pool for retrieving messages and running queries. I wrote my server in C, so its very, very fast.
These app servers store user information, preferences, etc, in a MySQL database. The actual messages are stored on message storage servers using a custom algorithim, and protocol for speed. Every message is stored on two servers, with both locations stored in the database. Needless to say my system rocks. Each Dell 1650 server can support close to 1000 simultaneous connections while using less than 10% of the CPU.
What I'm working on now is the IMAP server, integration with Memcached, and moving configuration settings into an XML file. Right now config settings are DEFINE parameters, which means changing anything requires a recompile. I've also found that my database is the bottleneck, so I want to offload as much as I can to Memcached. Checking whether an email address exists, or using my custom point system with the database is too inefficient, so I hope Memcached will help.
I've thought about releasing the source under the GPL, but I don't think its quite ready. I want to at least get config settings into an XML file first. I'd also like to find a company to sponser my development, but that hasn't happened yet. (I still have a day job.)
Executive summary. Its always more fun to write your own, and then post to/. about it.
Before you get carried away though, I'll say that I have a Dell 2650 which I use for development and testing, and it runs FC3. The original FC3 SMP kernel ran fine, but the all of the upgrades have caused kernel panics on boot. My guess would be that the newer kernels don't handle the ReiserFS partition properly, and that causes the crash, but I haven't had time to dig. The result is that I'm stuck on 2.6.9. I plan to try upgrading to FC4 soon.
"Which is not bloody likely. People would use the service because they don't want to buy DVDs."
I'm sorry, but I rent DVDs all the time and occasionally I find one that I like enough to buy. The movie "Brazil" was probably the most recent example.
As for 'early adopters', I'm sorry but 85%* of Americans have never heard of Netflix. Whereas 85%* of Americans have heard of Blockbuster, and probably 50%* have heard of Amazon. That's what I meant by early adopters.
Now if you sample the/. crowd, I'm sure those numbers are more like 95%/95%/95%*.
*Yes I pulled these numbers out of my posterior. The fact is these numbers are close enough to the truth that they illustrate my point.
They have two advantages over Netflix. The first is that Amazon has a more mainstream audience. Netflix clearly dominates the market amongst early-adopters, but that leaves things open for Blockbuster and Amazon both of which have a more mainstream customer base to draw from.
The second is that Amazon can run this program at a net loss, or breakeven point while it builds the economies of scale needed for profit. They can do this because they have money in the bank from their other lines of business, and because they can view what money they lose on this operation as an 'advertising' expense. This service will no-doubt drive clicks to amazon.com, which will result in more sales from their other product offerings. Not to mention how likely it is that someone will purchase a DVD they rented using the Amazon service.
They already have a massive distribution network, with wharehouses all over the country, equipped with the latest in inventory management hardware/software. They have agreements in place with all of the major shipping companies, and DVD distributors.
All they lack is a website. I'm not quite sure if they can figure out how to create a website though. Might be why they are seeking to hire some engineers.
In my experience, the beauty of Linux is how easy it is to deploy a custom application atop. I say this because all of the major components are open source, it is much easier to interface directly with the operating system. In addition, because there is so much open source code availible for the system, it is easy to find examples.
In the Windows world, everything is a blackbox. Your primary reference material comes from Barnes and Noble, and trying to find a piece of OSS that will run atop Windows, and accomplishes what you want is next to impossible. Whereas this is a disadvantage for custom software, its an advantage for off-the-shelf software. So if I'm Joe Bob Hilfiger, I know that the online shopping cart software I just downloaded will involve double clicking an icon, and going through an install shield. I then KNOW the thing will work with my installation of IIS/ASP.NET. You just can't say things are this easy with TAR balls, and having to compile the application yourself.
I believe this is why large enterprises are adopting Linux, and smaller enterprises are shying away. If I am Amazon, I can afford the programmers needed to create a custom solution for my website. If I'm a small guy, I need something that is off-the-shelf.
On a final note, I will say the knowledge level needed to write software on Linux is much higher. Windows has the advantage of VS.NET and the.NET platform. They call these Rapid Application Development (RAD) tools for a reason. They are easy to use. Take a Windows developer, and sit him down at a Linux box with a copy of vi, gcc, and man pages, and see how fast it takes him to write a simple application (let alone a GUI application). True the gap is closing with things like mono, and sharpdevelop, but the gap is still there, and it will take time to close it.
I've used Kasamba to hire inexpensive free lance programmers for big, and small projects. Like eBay, you need to be aware of who you are hiring, and what you are receiving back, but with proper guidance, you can get a good value on the site.
I work in a world where I am responsible for about 100 servers, most of which run Windows 2000/2003, but a handful of which run CentOS 4 (RHEL4).
I have to say that either operating system is secure in the hands of a knowledgeable administrator. The key difference is simply that Linux can be made more secure by someone with ample experience, whereas Windows can be made moderately secure much more easily.
Let me explain. In the Linux world, because everything is open source, a very knowledgeable person can strip away `features` from the operating system, leaving fewer areas which could possibility contain security holes. In doesn't matter whether the NFS server has a security hole, if the NFS server isn't running, or even installed. To be more specific, a very knowledgeable person could even recompile their kernel, etc, such that the only things that will run on the box is that which is intended. A box configured for single use is easy to secure because then there are only a handful of areas which can be exploited. Because of this limited number, there are then only a handful of lists/newsgroups that need to be monitored for security updates.
Windows on the other hand posseses the advantage that Microsoft stands behind their product, and says apply these patches, and your secure. Therefore, to make a `relatively` secure machine is very easy. Just run auto-update regularly, and your secure. On the other hand, taking security to the next level. The level described above is almost imposible. You can't eliminate features from the Windows kernel by recompiling. Nor is it easy to pick and choose which DLL's get installed with the operating system. The result is a bigger window of opertunity for an exploit to be discovered which can then be used on your system. Now it is still possible to disable services, etc, but that is a more difficult task in Windows because of the interconnectivity. In the Linux world, because most components are developed by different people, they have few dependancies. This isn't true in the Windows world, and that makes it more difficult to lock down.
My point is that if there are three security levels, secure, very secure, and air tight. It is easier to get to the first level with Windows, but easier to get past the first level, to the second level and third levels with Linux. Granted large corporations can afford to modify Windows to get the other levels of security, but its more difficult because Windows is such a closed environment.
I've rambled enough. A good article on locking down a Linux box can be found here:
Three wireless satelite links. Three underground fiber links.
Me thinks someone will be downloading quite the porn collection in Athens.
Of course, all that bandwidth, all that redundancy won't give you any kind of reliability if the person (people) responsible for it spill their coffee on the keyboard, or fat finger a routing table.
You can plan for hardware failures by building in redundancy. However no amount of redundancy will prevent human error.
The headline is misleading. It sounds like Facebook actually wants to leverage blockchain technologies to accomplish tasks, and provide value/security for their services. That is distinct from a "cryptocurrency." The latter makes use of a distributed ledger (typically a blockchain, more in a second) to create a medium of exchange. This medium of exchange is essentially a commodity, like gold (difficult to acquire, with a limited supply), which is traded to settle debts (i.e. buy stuff), a function typically performed using money. In other words a cryptocurrency is a commodity being used as money. For whipper snappers this is a foreign concept. For old farts like myself, we still recall the days when gold was used as a common medium of exchange in the barter system. "Money" was invented to make payments easier, because large amounts of gold are bulky, and heavy. Instead paper was traded, initially the paper notes were issued by "banks" (I use that term loosely), and that paper could be redeemed from said issuer for gold. That function was eventually taken over by governments, first state, then federal. The US dollar was backed by gold from 1879 to 1933. A cryptocurrency is the electronic equivalent of this system.
What makes cryptocurrencies possible is the invention of "blockchain" technology. A blockchain is essentially a public ledger that has been distributed. What's unique is how the blockchain uses cryptography to ensure a consensus among untrusted parties over what transactions are on the blockchain, and thus valid. That is the "proof of work" idea, whereby the network only accepts blocks which included the requisite proof of work. Cryptographic signatures are used to create transactions, where a signed transaction transfers a given unit from one private key to another. The holder of the associated private key is in essence the holder of the units, as anyone with access to the private key can create valid transactions which transfer (or spend) that unit. The blocks themselves are linked together cryptographically, using hashes, such that a modification to a given block would invalidate all of the blocks that follow it. That is why we talk about confidence in transactions, which that confidence governed by the number of blocks which follow it.
It's worth noting that many cryptocurrencies don't use true blockchains to record transactions. They only use cryptographically signed transactions to validate transfer. These are essentially gift cards, as the ledger can be altered, modified, etc, if the group in charge wants it.
Like I said above, with bitcoin, and most cryptocurrencies, you can see every transaction that has ever been made, and how much was transferred. If you determine who controlled the addresses (aka private keys), you know who was behind a transaction. This is incredibly easy. That is why, I exclusively hold and use Monero, which is an improvement on the original blockchain concept (aka bitcoin) which makes use of cryptography to validate, but also mask transactions. This provides significantly more privacy.
I use the Kinesis Advantage (w the foot pedals for the modifiers). I have three of them already, but all of them are the standard model (1 in storage). This article convinced me to buy their new Low-Force version, which uses the Cherry MX Red switches. I'm hoping it helps with my arthritis. Its a recent addition to the Kinesis Advantage family, and one that wasn't available when I purchased my current HID. L~
http://www.kinesis-ergo.com/sh...
I am one of the admins for the free email service Lavabit. We have a graph on the net showing adoption, built from about 150k messages a day. (We don't include messages for users who have disabled this inbound check, or for messages which are blocked for some reason other than SPF.)
http://lauren.lavabit.com/export/graph_162.html
My problem is the thought of an external audit. I have sensitive user information on my network. I am charged with keeping that information secure. You don't keep that type of data secure by opening it up to outsiders and letting them run scans/traces/sniffs, or whatever else it is they do on your network while its transmitting sensitive data.
This "enterprise" post reminds me of several WTF's. But I link to one in particular:
e _Enterprise_Bug.aspx
7 11157.
http://worsethanfailure.com/Articles/Bitten_by_th
In general, enterprise has come to mean overpriced and underperforming. By making something "enterprise" your saying you designed it such that you can throw money at the problem. By breaking things into multiple "tiers" your saying that if any one tier gets overloaded, you can fix the problem by throwing money/hardware at that teir.
From my perspective, the best way to solve the performance/reliability problems are with sound design, good programming (algorithims), and careful tool selection. That means architecting your app so that there is no single point of failure. If one node goes down, can the other nodes recover and continue functioning? In the same vein, can you add nodes to the cluster and scale increased loads across more machines without encountering bottlenecks? Its been documented elsewhere, but you want a(n) algorithims, not a(log n) algorithims. All to often the answer to scaling "enterprise" software is buy a bigger box. That can get expensive very fast. The better, albeit more difficult solution is to write the app so that multiple machines can work in concert. And finally, making sure that the tools you use will be able to scale. In general this relates to what database system, and libraries you use (and not so much the language).
I'll address the language issue too. A lot of people have mentioned Erlang. While I think its a great language for server applications, there just isn't the community support to make it a pracitical choice. (Exception, if you don't need libraries or you plan to write _everything_ yourself, as is often the case for embedded systems, then maybe Erlang is a good choice. Hence, why you find Erlang is routers.) Erlang also is problematic because of the small number of people skilled in its use. For me it really comes down to choosing C on Linux, or C# on Windows. (I've written scalable apps, supporting several hundred thousand users using both.) Its a simple fact that it takes less time to write production code in C# (or Java) than in C (or C++). So what you need to ask yourself is whether the efficiency savings of the former outweigh the added development cost of the latter.
One of the apps I wrote is the SMTP/POP/IMAP server used to support my free e-mail service (http://lavabit.com/). For that project, hardware was comparatively expensive (I paid for everything myself), and my time was relatively cheap. (I started by only working on the code in between consulting gigs.) So it made sense to write the app in C, and use Linux. Over time, the efficiency savings have made the decision, while painful at times, the correct one. I'm able to support 70K users very cheaply. If I had chosen C#/Windows, I might have gotten the project done faster, but I'd need more expensive hardware. (I'm using Dell 1650's at the application tier, with beefier machines at the database/storage tier. Note, I have a two tier architecture.) I would also have had to shell out lots of dollars for Windows licenses. It just didn't make sense. For more on my mail server, read this other post http://slashdot.org/comments.pl?sid=191034&cid=15
Another large project I worked on was a social networking site sponsored by a large carbonated beverage firm. In this case the pockets were deeper but the timeline was shorter. So it made sense to write the app in C#. In reminds me of the saying that in software development you have three factors: cost, quality and time. You get to pick two of those, but not three.
I'll close by saying again, the best way to solve the performance/reliability problems are with sound design, good programming (algorithims), and careful tool selection.
If you want to break into the gaming industry, sign up for the Guildhall at SMU. If memory serves, its an _intense_ 18 month program. I believe somewhere around 96% of the graduates end up with jobs in the gaming industry.
I should probably mention that I've chosen to use CentOS 4.3, running on dual processor Dell 1650 servers. All of my code is written in C, and compiled with GCC.
This is important because it means I use the 2.6 version of the Linux kernel, and the CentOS/RHEL version of the kernel uses the Native Posix Thread Library (NPTL) by default.
What should I do?
This debate hits home with me. I wrote a server daemon to handle the SMTP and POP protocols, and when I first started out I had to make a choice. The choice I made back then was to use a threaded model. The way it works is I spawn X threads which collectively use blocking calls to the accept() function. Each thread will only return from accept() once they have been assigned a new connection by the kernel. For performance I spawn the threads ahead of time. This architecture was a mistake. The issue is that I have to spawn a seperate pool of threads to listen on port 25 (SMTP), port 110 (POP), port 465 (SMTP over SSL) and port 995 (POP over SSL). With this model if I could end up with extra threads listening on port 25, when I need more threads listening and processing connections on port 465. This problems leads me to overcompensate by spawning _extra_ threads just in case. Of course this strategy wastes resources as now extra threads eat memory without benefit.
To address the SEGFAULT issue, ie one rouge thread taking the whole system down, I also fork multiple processes. In my case I fork 12 processes with 128 threads each. If one process gets killed by a SEGFAULT, the remaining processes continue to work. When I first launched the system, and it faced a torrent of email... 100K+ messages a day, I would have about one process die every 24 hours. With careful debugging work, I've gotten the code stable enough now that I haven't lost a process in about 9 months.
My theory when I first wrote this code was to leave scheduling to the kernel. I figured that if a thread was blocked waiting for IO data the kernel wouldn't schedule time slices for it. This meant those extra threads sat in the background waiting, but not using CPU time. I am starting to wonder whether this is a good theory? I am considering switching to a different model (more on that in a second), but am not sure which one is best? By the way, the reason each process has so many threads is for DB connection pooling. Each process gets 8 DB connections which are shared between the 128 threads. Each process also has its own copy of the antivirus database. I know its possible, but trying to share DB connections and data between processes is much more difficult.
I plan to refactor this code soon, and have been struggling with what to do. I am curious to hear the thoughts of others?
The current plan is to move to a model where I spawn a single thread for each port. When these listening threads have a new connection, they dump the socket handle, and the the protocol into a buffer. I would then also spawn a pool of worker threads which read the incoming connections out of the buffer. Using semaphores and reflection these worker threads would pickup incoming connections and feed them to the right function depending on the protocol. I think this model would work much better than what I have now, but is this the best option?
The other option is to create system where I spawn only 8 worker threads (or some similar number). This pool of 8 threads then uses epoll() to find out which sockets need attention and address them accordingly. The problem with this model is that if an incomplete message is receieved, the thread couldn't process all the way into the output stage. Instead the data would need to be stored until the message sending was complete. Let me give an example, the thread might get "RCPT TO: " the first time it checked a socket. The thread stores this incomplete message. Then the second time around another thread picks up "example@example.com". The thread assembles the message into "RCPT TO: example@example.com" and then processes the entire command accordingly.
Does this model work better? Keep in mind that when DB calls need to be made, the MySQL library won't work the same way. A slow database server could hang all 8 worker threads effectively killing the model. There are also SPF, SPAM and Virus libraries. Any one of them could tie up a thread for an extended period, thereby killing this model. What does everyone else think? Am I not thinking about an event processing model correctly? Or is that this type of daemon is better off served using the one thread per connection model?
I've used Kasamba.com to hire programmers for small projects I didn't have time to complete myself. There is a Kasamba competitor, but the name doesn't come to mind.
Just a quick note, I run an email service, and I've had the most problems getting past blocks from:
AOL
Excite
Comcast
The easiest was AOL, they have a number you can call 24 hours a day to get removed (but it takes 48 hours for the removal to take effect). The other two have been blocking mail from my servers for two weeks. I have filled out contact forms, and left voicemails to no avail.
I haven't recieved a complaint about Verizon yet, but that could be because I have SPF records.
I started a free email service to compete with Gmail about 2 months after Gmail launched. For those interested, the name is Nerdshack.com.
/. about it.
At first I used Postfix and Cyrus, but I found it to be a nightmare when your talking about more than 50k accounts.
What I wanted was an email platform that integrated with ClamAV, DSPAM, supported SPF, Greylisting/Blacklisting/Whitelisting, and was all controlled from a MySQL database. I also wanted it to support SSL, and clustering.
Frankly I didn't find anything. So I wrote my own. This may not be your cup of tea, so if not I reccomend looking at DB Mail (www.dbmail.org) and Cyrus (asg.web.cmu.edu/cyrus). Both are compotent mail servers, can be built to support a large user base. The problem I had was expanding their feature sets.
As has been mentioned numerous times above, getting stock open source software to support a large user base is a huge pain. Combine that with trying to add in things like DSPAM, SPF and ClamAV, and your going to be faced with a nightmare. The system you end up with will be a kluge of hacks, custom scripts, and chewing gum. To me that seemed to much like a house cards. On top of which, most open source sytems do not handle large quota accounts very well. Run benchmarks against your favorite mail server using a 10 to 20 gig mail store for a single user. You will quickly find that even maildir struggles with that many files. (Hint, make sure you use ReiserFS at least.)
So I went the route of Gmail, Yahoo, and Hotmail. I wrote my own, and after a few early bumps in the road, its pretty solid. I've had 100% uptime for over 300 days, util last month when I moved datacenters.
Basically how I set things up is I have an Alteon AD4 load balancer that balances traffic amongst my application cluster. This app cluster runs my custom code which speaks SMTP and POP (IMAP is about 75% done), and interfaces with DSPAM, ClamAV, libspf, OpenSSL, LZW (for compression), and supports a host of other features. I even support using public key encryption (ECC) to store messages on disk with your public key, and then encrypt your private key with AES256 using your password as the key. Its seemless to the end user, but guarantees privacy while the mail is on my servers. I even created a point system to allow me to automatically block IP addresses that attempt dictionary attacks, etc (though its disabled at the moment). Each server caches everything it can to reduce database load, and uses a connection pool for retrieving messages and running queries. I wrote my server in C, so its very, very fast.
These app servers store user information, preferences, etc, in a MySQL database. The actual messages are stored on message storage servers using a custom algorithim, and protocol for speed. Every message is stored on two servers, with both locations stored in the database. Needless to say my system rocks. Each Dell 1650 server can support close to 1000 simultaneous connections while using less than 10% of the CPU.
What I'm working on now is the IMAP server, integration with Memcached, and moving configuration settings into an XML file. Right now config settings are DEFINE parameters, which means changing anything requires a recompile. I've also found that my database is the bottleneck, so I want to offload as much as I can to Memcached. Checking whether an email address exists, or using my custom point system with the database is too inefficient, so I hope Memcached will help.
I've thought about releasing the source under the GPL, but I don't think its quite ready. I want to at least get config settings into an XML file first. I'd also like to find a company to sponser my development, but that hasn't happened yet. (I still have a day job.)
Executive summary. Its always more fun to write your own, and then post to
There are new pictures availible. I will do my best to keep this mirror up to date...
http://www.nerdshack.com/katrina/
Several hundred megabytes of pictures from sigmund.biz, taken in the disaster zone by Mike and his team at DirectNIC have been mirrored to:
/. away. Sits atop four GigE, and a load balanced www cluster. If anyone else needs a mirror of Katrina content, let us know.
http://www.nerdshack.com/katrina/
Several hundred megabytes of pictures from the disaster zone have been mirrored to:
/. away. Sits atop four GigE, and a load balanced www cluster.
http://www.nerdshack.com/katrina/
I can't speak much on Vanilla kernels, but I do use CentOS 4 with the recompiled RHEL 2.6 version of the kernel, and it has had no problems.
[user@megan ~]# uname -a
Linux megan.nerdshack.com 2.6.9-5.0.3.ELsmp #1 SMP Sat Feb 19 15:45:14 CST 2005 x86_64 x86_64 x86_64 GNU/Linux
[user@megan ~]# uptime
15:31:30 up 156 days, 13:48, 1 user, load average: 0.30, 0.21, 0.23
My WBEL 3 box still uses the 2.4 kernel.
[root@cp root]# uname -a
Linux cp.razorsites.com 2.4.21-20.ELsmp #1 SMP Thu Sep 16 14:07:31 EDT 2004 i686 i686 i386 GNU/Linux
[root@cp root]# uptime
16:21:44 up 294 days, 5:15, 1 user, load average: 0.16, 0.28, 0.46
Before you get carried away though, I'll say that I have a Dell 2650 which I use for development and testing, and it runs FC3. The original FC3 SMP kernel ran fine, but the all of the upgrades have caused kernel panics on boot. My guess would be that the newer kernels don't handle the ReiserFS partition properly, and that causes the crash, but I haven't had time to dig. The result is that I'm stuck on 2.6.9. I plan to try upgrading to FC4 soon.
"Which is not bloody likely. People would use the service because they don't want to buy DVDs."
I'm sorry, but I rent DVDs all the time and occasionally I find one that I like enough to buy. The movie "Brazil" was probably the most recent example.
As for 'early adopters', I'm sorry but 85%* of Americans have never heard of Netflix. Whereas 85%* of Americans have heard of Blockbuster, and probably 50%* have heard of Amazon. That's what I meant by early adopters.
Now if you sample the
*Yes I pulled these numbers out of my posterior. The fact is these numbers are close enough to the truth that they illustrate my point.
I don't know about you, but personally I liked Doom, Doom 2 and Doom 3.
Now as for the Madden series of games...
They have two advantages over Netflix. The first is that Amazon has a more mainstream audience. Netflix clearly dominates the market amongst early-adopters, but that leaves things open for Blockbuster and Amazon both of which have a more mainstream customer base to draw from.
The second is that Amazon can run this program at a net loss, or breakeven point while it builds the economies of scale needed for profit. They can do this because they have money in the bank from their other lines of business, and because they can view what money they lose on this operation as an 'advertising' expense. This service will no-doubt drive clicks to amazon.com, which will result in more sales from their other product offerings. Not to mention how likely it is that someone will purchase a DVD they rented using the Amazon service.
Simple answer, yes they can.
They already have a massive distribution network, with wharehouses all over the country, equipped with the latest in inventory management hardware/software. They have agreements in place with all of the major shipping companies, and DVD distributors.
All they lack is a website. I'm not quite sure if they can figure out how to create a website though. Might be why they are seeking to hire some engineers.
In my experience, the beauty of Linux is how easy it is to deploy a custom application atop. I say this because all of the major components are open source, it is much easier to interface directly with the operating system. In addition, because there is so much open source code availible for the system, it is easy to find examples.
.NET platform. They call these Rapid Application Development (RAD) tools for a reason. They are easy to use. Take a Windows developer, and sit him down at a Linux box with a copy of vi, gcc, and man pages, and see how fast it takes him to write a simple application (let alone a GUI application). True the gap is closing with things like mono, and sharpdevelop, but the gap is still there, and it will take time to close it.
In the Windows world, everything is a blackbox. Your primary reference material comes from Barnes and Noble, and trying to find a piece of OSS that will run atop Windows, and accomplishes what you want is next to impossible. Whereas this is a disadvantage for custom software, its an advantage for off-the-shelf software. So if I'm Joe Bob Hilfiger, I know that the online shopping cart software I just downloaded will involve double clicking an icon, and going through an install shield. I then KNOW the thing will work with my installation of IIS/ASP.NET. You just can't say things are this easy with TAR balls, and having to compile the application yourself.
I believe this is why large enterprises are adopting Linux, and smaller enterprises are shying away. If I am Amazon, I can afford the programmers needed to create a custom solution for my website. If I'm a small guy, I need something that is off-the-shelf.
On a final note, I will say the knowledge level needed to write software on Linux is much higher. Windows has the advantage of VS.NET and the
I've used Kasamba to hire inexpensive free lance programmers for big, and small projects. Like eBay, you need to be aware of who you are hiring, and what you are receiving back, but with proper guidance, you can get a good value on the site.
www.kasamba.com
I work in a world where I am responsible for about 100 servers, most of which run Windows 2000/2003, but a handful of which run CentOS 4 (RHEL4).
:
I have to say that either operating system is secure in the hands of a knowledgeable administrator. The key difference is simply that Linux can be made more secure by someone with ample experience, whereas Windows can be made moderately secure much more easily.
Let me explain. In the Linux world, because everything is open source, a very knowledgeable person can strip away `features` from the operating system, leaving fewer areas which could possibility contain security holes. In doesn't matter whether the NFS server has a security hole, if the NFS server isn't running, or even installed. To be more specific, a very knowledgeable person could even recompile their kernel, etc, such that the only things that will run on the box is that which is intended. A box configured for single use is easy to secure because then there are only a handful of areas which can be exploited. Because of this limited number, there are then only a handful of lists/newsgroups that need to be monitored for security updates.
Windows on the other hand posseses the advantage that Microsoft stands behind their product, and says apply these patches, and your secure. Therefore, to make a `relatively` secure machine is very easy. Just run auto-update regularly, and your secure. On the other hand, taking security to the next level. The level described above is almost imposible. You can't eliminate features from the Windows kernel by recompiling. Nor is it easy to pick and choose which DLL's get installed with the operating system. The result is a bigger window of opertunity for an exploit to be discovered which can then be used on your system. Now it is still possible to disable services, etc, but that is a more difficult task in Windows because of the interconnectivity. In the Linux world, because most components are developed by different people, they have few dependancies. This isn't true in the Windows world, and that makes it more difficult to lock down.
My point is that if there are three security levels, secure, very secure, and air tight. It is easier to get to the first level with Windows, but easier to get past the first level, to the second level and third levels with Linux. Granted large corporations can afford to modify Windows to get the other levels of security, but its more difficult because Windows is such a closed environment.
I've rambled enough. A good article on locking down a Linux box can be found here
http://www.puschitz.com/SecuringLinux.shtml
Three wireless satelite links. Three underground fiber links.
Me thinks someone will be downloading quite the porn collection in Athens.
Of course, all that bandwidth, all that redundancy won't give you any kind of reliability if the person (people) responsible for it spill their coffee on the keyboard, or fat finger a routing table.
You can plan for hardware failures by building in redundancy. However no amount of redundancy will prevent human error.
>Bonch, dude, you just spent 3 hours writing a
/., imagine that.
>review of a 2 hour film....
>Kinda makes you wonder doesn't it?
Must be a nerd. And on
Have you been to the Nerdshack lately?
I wish some of these many hot women that you speak of would sleep with me! Or any nerd for that matter.
Have you been to the Nerdshack yet?