Slashdot Mirror


Linux Needs Resource Management For Complex Workloads

storagedude writes: Resource management and allocation for complex workloads has been a need for some time in open systems, but no one has ever followed through on making open systems look and behave like an IBM mainframe, writes Henry Newman at Enterprise Storage Forum. Throwing more hardware at the problem is a costly solution that won't work forever, he notes.

Newman writes: "With next-generation technology like non-volatile memories and PCIe SSDs, there are going to be more resources in addition to the CPU that need to be scheduled to make sure everything fits in memory and does not overflow. I think the time has come for Linux – and likely other operating systems – to develop a more robust framework that can address the needs of future hardware and meet the requirements for scheduling resources. This framework is not going to be easy to develop, but it is needed by everything from databases and MapReduce to simple web queries."

161 comments

  1. This obsession with everything in RAM needs to end by Anonymous Coward · · Score: 0

    I know you're afraid of the garbage collector, but it won't bite. I promise.

  2. Re: This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    So then what should we be obsessed with? Light weight, shiny screens, and rounded corners?

  3. Re: This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    12345678910+Ã--Ã=%_@$!#/\&*()

  4. Oblig XKCD by Anonymous Coward · · Score: 0

    http://xkcd.com/619/

    1. Re:Oblig XKCD by Anonymous Coward · · Score: 0

      That's so painfully true because Linux still has choppy playback of Flash/HTML5 video on low-performance hardware. It still is mostly a server OS (a very good one though).

    2. Re:Oblig XKCD by Anonymous Coward · · Score: 0

      Firefux still has choppy playback of HTML5 video on Windows. Give me Flash or go to hell.

    3. Re:Oblig XKCD by Stumbles · · Score: 1

      I still get that problem with firefox + flash. They all suck.

      --
      My karma is not a Chameleon.
    4. Re:Oblig XKCD by Zero__Kelvin · · Score: 1

      "It still is mostly a server OS ..."

      Yes. I just answered a call on my Samsung S3 server a little while ago in fact. I also watched some TV on my Comcast Server Set-top box. I'm thinking you either don't know very much about Linux, or what a server is.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    5. Re:Oblig XKCD by Anonymous Coward · · Score: 0

      Maybe your computer sucks. I have no issues.

    6. Re:Oblig XKCD by jedidiah · · Score: 1

      > That's so painfully true because Linux still has choppy playback of Flash/HTML5 video on low-performance hardware. It still is mostly a server OS (a very good one though).

      It's bullshit because EVERY platform has choppy playback of Flash video on low-performance hardware. It's a feature of how lame Flash is. It has nothing to do with Linux.

      Low performance hardware will happily decode much more interesting video so long as the coders in question have bothered to hook into relevant "shortcuts".

      Adobe can't be trusted to do that (on any platform).

      Adobe likes making excuses, instead just taking care of business like all the "hobbyists" have done.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    7. Re:Oblig XKCD by Anonymous Coward · · Score: 0

      Adobe likes making excuses, instead just taking care of business like all the "hobbyists" have done.

      Given that the flash spec has been out for years why haven't the "hobbyists" taken care of business with a superior flash player yet then?

    8. Re:Oblig XKCD by Anonymous Coward · · Score: 0

      They are busy developing something called Flashblock instead.

  5. next-generation by Anonymous Coward · · Score: 0

    next-generation is a word that schould be forbidden

  6. Re:This obsession with everything in RAM needs to by Lisias · · Score: 5, Insightful

    I know you're afraid of the garbage collector, but it won't bite. I promise.

    Yes, it will. It's not common, but it happens - and when it happens, it's nasty. Pretty nasty.

    But not so nasty as micromanaging the memory by myself, so I keep licking my wounds and moving on with it.

    (but sometimes would be nice to have fine control on it)

    --
    Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org
  7. From the "is it 2005? department" by dbIII · · Score: 2

    "next-generation technology like non-volatile memories and PCIe SSDs"

    That generation has been going on for a while storagedude. People have been scaling according to load to deal with it.

    1. Re:From the "is it 2005? department" by Anonymous Coward · · Score: 1

      "next-generation technology like non-volatile memories and PCIe SSDs"

      That generation has been going on for a while storagedude. People have been scaling according to load to deal with it.

      He just woke up from a coma you insensitive clod.

    2. Re:From the "is it 2005? department" by K.+S.+Kyosuke · · Score: 1

      Uh, no. PCIe SSDs are just coming into regular use in many places, and I haven't even heard of non-volatile memories being on the market (GB-sized, mind you - not tiny FRAMs for embedded applications).

      --
      Ezekiel 23:20
    3. Re:From the "is it 2005? department" by viperidaenz · · Score: 1

      Fusion-io's ioDrive has been around since 2007. It's been in regular use for those who need it - like 4k video editing.
      The original 7 year old drive is still faster than any SATA SSD you can find today.

    4. Re:From the "is it 2005? department" by K.+S.+Kyosuke · · Score: 2

      That's the former, not the latter, but OK. (I also said "in many places", one would have thought it obvious that these things sort of trickle down from the top over time, especially given the initial limitations on the technology.)

      --
      Ezekiel 23:20
    5. Re:From the "is it 2005? department" by swb · · Score: 1

      Yeah, but how many people were editing 4k video in 2007? I'm sure the 3 people at the time weren't worrying about scheduling their Fusion ioDrives across workloads, either, just pounding them into submission. Wider adoption usually means mixed workloads where scheduling scarce resources matters more and is more complicated.

      FWIW I don't know if I agree with the article premise -- it seems like most of these resource scheduling decisions/monitoring/adjustments are being made in hypervisors now (think VMware DRS, as only one example). And a lot of storage resource allocation isn't even done at the hypervisor level, it's done in the SAN which simply allocates maximum storage bandwidth to to the host and figures out on its own which storage to use.

    6. Re:From the "is it 2005? department" by dbIII · · Score: 1

      Uh, no. PCIe SSDs are just coming into regular use in many places

      OCZ seem to have been selling them via retail outlets for three years or more - let alone high end use.
      There were various PCI things before the PCIe interface came into use.

    7. Re:From the "is it 2005? department" by EETech1 · · Score: 1

      IBM has DIMMs with flash memory already.

      www-03.ibm.com/systems/x/options/storage/solidstate/exflashdimm/

    8. Re:From the "is it 2005? department" by K.+S.+Kyosuke · · Score: 1

      That's all fine and dandy, but the technological limitations of Flash memories put this in the "not quite there yet" territory when it comes to non-volatile RAMs. You wouldn't want to put your plain old in-memory data structures into that thing, so we're not quite there yet when it comes to unified memory architectures.

      --
      Ezekiel 23:20
  8. Re:This obsession with everything in RAM needs to by Anonymous Coward · · Score: 1

    Why not map everything in RAM? These days even Windows gives every process 128 terabytes of address space. TERA BYTES.

  9. Re: This obsession with everything in RAM needs to by JMJimmy · · Score: 5, Funny

    Boobs.

  10. This belongs in the cluster manager by Animats · · Score: 4, Informative

    That level of control probably belongs at the cluster management level. We need to do less in the OS, not more. For big data centers, images are loaded into virtual machines, network switches are configured to create a software defined network, connections are made between storage servers and compute nodes, and then the job runs. None of this is managed at the single-machine OS level.

    With some VM system like Xen managing the hardware on each machine, the client OS can be minimal. It doesn't need drivers, users, accounts, file systems, etc. If you're running in an Amazon AWS instance, at least 90% of Linux is just dead weight. Job management runs on some other machine that's managing the server farm.

    1. Re:This belongs in the cluster manager by Anonymous Coward · · Score: 0

      This exactly ... this stuff does not belong in the OS itself ... the OS needs to have the appropriate hooks to support this kind of external configuration/administration ...

    2. Re:This belongs in the cluster manager by K.+S.+Kyosuke · · Score: 2

      Honestly, in MVS (z/OS), it probably makes perfect sense to have this in an OS, especially if you're paying through the nose for the hardware already. But solving it on the VM level surely makes it a huge win for everyone.

      --
      Ezekiel 23:20
    3. Re:This belongs in the cluster manager by Tough+Love · · Score: 2

      If you're running in an Amazon AWS instance, at least 90% of Linux is just dead weight

      Which 90% would that be, and in what way would it be dead weight? If you don't mind my asking.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    4. Re:This belongs in the cluster manager by Lennie · · Score: 4, Interesting

      Yes and no.

      No, large (Linux using) companies like Google, Facebook, Twitter have always used some kind of Linux container solution, not virtualization.

      Yes, policy is controlled by the cluster manager.

      But for example Google uses nested CGroups for implemeting those policies for controlling resources/priorities on their hosts.

      Virtualization is very ineffcient and Docker/Linux containers are a perfect example of how peole are starting to see that again:
      https://www.youtube.com/watch?... / https://www.youtube.com/watch?...

      Suppposedly, CPU utilization on AWS is very low, maybe even only 7%:
      http://huanliu.wordpress.com/2...

      The reason for that is, is that VMs get allocated resources they never end up using. Because the host kernel/hypervisor doesn't know what the VM (kernel) is going to do/need.

      For their own services Google doesn't use VMs, but Google does offer VMs to customers and to control the resources used by VM they run the VM inside a container.

      Here are some talks Google did at DockerCon that mentions some of the details of how they work:
      https://www.youtube.com/watch?...
      https://www.youtube.com/watch?...

      --
      New things are always on the horizon
    5. Re:This belongs in the cluster manager by DuckDodgers · · Score: 1

      If I understand the situation correctly - and it may be that I don't - this is what projects like Docker and chroot jails (?) were created to handle. You get most of the benefits of virtualization without most of the overhead. In a lot of cases you don't need the features that full virtualization provides over them.

    6. Re:This belongs in the cluster manager by bytestorm · · Score: 1

      Or more established/full featured, openvz, xen pv, lxc, cgroups/namespaces, and friends. I think linux (the kernel) already has the tools necessary to do task prioritization like the article requests.

    7. Re:This belongs in the cluster manager by DuckDodgers · · Score: 1

      I am familiar with cgroups, not the others. Thanks for letting me know where to continue my research.

  11. Linux Cgroups by corychristison · · Score: 3, Informative

    Is this not what Linux Cgroups is for?

    From wikipedia (http://en.m.wikipedia.org/wiki/Cgroups):
    cgroups (abbreviated from control groups) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.

    From what I understand, LXC is built on top of Cgroups.

    I understand the article is talking about "mainframe" or "cloud" like build-outs but for the most part, what he is talking about is already coming together with Cgroups.

    1. Re:Linux Cgroups by Anonymous Coward · · Score: 2, Informative

      the article is not about "mainframe" or "cloud"... it is "advertising" for IBM... a company in the middle of multi-billion dollar deals with apple, all the while fighting to remain even slightly relevant.

      IBM has the magic solution to finally allow the world to run simple web queries.

      FUCK OFF

    2. Re: Linux Cgroups by Anonymous Coward · · Score: 0

      Not an expert on cgroups but yes, some of what he wants certainly seems to be covered like IO bandwidth and per group memory resources

  12. What's He trying to sell ? by Anonymous Coward · · Score: 0

    Load balancing clustering, JIT storage, cloud services, mainframe offloading, dedicated database servers, high avail redudant networking, etc....

    The whole world is a nail to the man with a hammer....

    So Who is paying his salary (or this trip)

  13. Re: This obsession with everything in RAM needs to by loufoque · · Score: 1

    Garbage collection necessarily wastes memory by factor of 1.5 to 2.
    The collection itself also slows down the program, and in some languages cannot even happen asynchronously.

    Finally, the most important aspect for program performance is locality and memory layout, something you cannot even optimize for in a language where every object is a pointer to some memory on a garbage-collected heap.

  14. Look better it's already there by dutchwhizzman · · Score: 1

    KVM, Xen and other hypervisors make Linux systems look like IBM mainframes. The whole "Virtual Machine" hype where we have guest operating systems running on hypervisors is just like IBMs Z series.

    --
    I was promised a flying car. Where is my flying car?
    1. Re:Look better it's already there by Anonymous Coward · · Score: 1

      KVM, Xen and other hypervisors make Linux systems look like IBM mainframes. The whole "Virtual Machine" hype where we have guest operating systems running on hypervisors is just like IBMs Z series.

      IBM had the System Resource Manager back in the 1980's when the "zOS" was still OS/MVS.

      More recently, Solaris had resource tuning features, although in my experience, people were preferring throwing cheap hardware at resource consumption over having tuning specialists or runing-aware system operations.

      The recent addition of cgroups to Linux means that it also has the potential to become tunable in terms of business goals, but again the question is, are people going to pay for the required expertise or are they going to persist in brute-force resource management?

      Offhand, I'd say that the major shops will find it cheaper in the long run to run resource management, whereas SOHO users won't. And so the big shops will once again likely consist of lots of expensive equipment and specialists to keep it happy. Just like in the old mainframe days.

  15. Vista got this by eyjeryjertj · · Score: 1

    This feature was introduced in Windows Vista, and as we all know, this is the best OS ever because of that. Cant wait until Linux will becomes more like Vista.

  16. Is this real or fantasy? by m00sh · · Score: 3, Interesting

    I read the article and I can't tell if this is a real problem that is really affecting thousands of users and companies, or a fantasy that the author wrote up in 30 minutes after having a discussion with an old IBM engineer.

    Sure, IBM has all these resource prioritization in mainframes because mainframes cost a lot of money. Nowadays, hardware is so cheap you don't have to do all that stuff.

    If some young programmer undertook the challenge and created the framework, would anyone use it and test it? Will there be an actual need for something like this?

    My point is that an insider information to what is really going on in the cutting edge usage of linux or just some smoke being blown around to an obligated write up.

    1. Re:Is this real or fantasy? by Kjella · · Score: 1

      These resources are all being managed today, there already are priorities for CPU, QoS for network bandwidth, ionice and quotas for storage and so on with a lot of specialization in each. He wants to build some kind of comprehensive resource management framework where everything from CPU time, memory, storage, network bandwidth etc. is being prioritized. It sounds extremely academic to me, particularly when I read the line:

      I will make the assumption that everything at every level is monitored and tracked (...)

      Besides, resource management isn't something that happens only on this level, for example if I have an SQL server then clearly who gets priority there matters, these are order transactions that should have millisecond latency and here's the consolidated monthly report we need by noon tomorrow. Load balancers, cache servers, read-only slaves, thread pools, TCP congestion logic, it's like you took something that you can write a whole library about and said "we need a framework for it". Good luck writing a framework that can balance anything in any situation, yes I suppose that from a galaxy away it might look like everything is a resource and we have consumers who need prioritization but the specifics of the situation matter a lot. Which is why there are many, many specialized systems that all do their specialized kind of resource management.

      --
      Live today, because you never know what tomorrow brings
    2. Re:Is this real or fantasy? by Anonymous Coward · · Score: 1

      Nowadays, hardware is so cheap you don't have to do all that stuff.

      Instead of spending a bit of those resources to allocate the rest with good efficiency, the standing assumption is that resources are effectively free anyway and so wasting them with gay abandon is worth it. This is the assumption, but it's not really true.

      At sufficient scale even the smallest cost becomes non-negligible. This isn't just for the few of us who write "truly web-scale" or whatever the term is today. Even in something as simple as an end-user application like, oh, a video player, "saving" programmer time effectively moves the cost onto the end user. This should be multiplied with the number of users as well as the frequency with which the end user gets hit with it. Especially that per user multiplication we often forget. As an example, VLC has an estimated 30-odd million users, so that say, shaving one second off the start-up time, means a yield of almost a year not forcing your users to sit and wait. It's not just start-up, it's hiding in just about everything computers do.

      While it's true that some optimisations simply aren't worth it, what I'm on about is the reverse: Deliberately not caring about even reasonable care not to waste resources wantonly. Consequently, that wasting does happen a lot.

      If some young programmer undertook the challenge and created the framework, would anyone use it and test it? Will there be an actual need for something like this?

      Personally I don't believe in fabricating frameworks. It's mainly self-serving make-work so the programmer can kid himself he's being useful to the world. More often than not it results in slow gloopy bloat that needs to be carried around for its own sake possibly much more so than because it's useful.

      There are ways to avoid this, but it's basically never by setting out to write "a framework" before you've written a few applications that could use it.

      My point is that an insider information to what is really going on in the cutting edge usage of linux or just some smoke being blown around to an obligated write up.

      No idea. But the state of computing is such that there's a lot to be improved yet.

    3. Re:Is this real or fantasy? by Anonymous Coward · · Score: 0

      Actually, you have to use WLM whether you want to or not because DB2 stored procedures run in WLM address spaces.

    4. Re:Is this real or fantasy? by Anonymous Coward · · Score: 1

      Ha, your SQL server scenario is similar to one I've heard from IBM engineers (and IBM fellows) but with a priority inversion twist that requires SLAs and monitoring. That periodic consolidated report can become a nightmare when it finally grows to take longer than one period to complete! Enterprises come crashing down when these overlooked/implied invariants get violated. Eventually, increasing the job priority won't even work because it will squeeze out all the line of business workload, and what you really need is a monitoring trap to alert admins and engineers so they can deploy more resources...

      This is the single biggest difference between automation and full-blown dev-ops (IBM even called it autonomics years ago, not sure what they say now) and the conventional PC or Unix approach. You don't have monitoring or analysis feeding into a cloud of people who then make opaque decisions to modify the system configuration. Instead, you have integration of the monitoring, analysis, planning, and reconfiguration tasks. That's called Resource Management. The higher stakes environments even use stronger guarantees like reservations in combination with advance planning, i.e. schedulers and optimizers that can prove that the workload fits in the available or planned capacity. When it does not fit, the amended workload is rejected, rather than throwing it all into a pool and doing best effort scheduling until the whole mountain tips over due to capacity overcommitment.

    5. Re:Is this real or fantasy? by Anonymous Coward · · Score: 0

      His use case *is* limited. PC users don't need this (nor laptop, tablet or phone). Enterprise users will need it, and some of it is already partially addressed. Very large hardware is used less than small blades or servers. The unique hardware requires unique software, and it probably deserves some attention. One of the attributes of Linux is its ability to scale (both up and down). Yes it runs on a Strawberry PI. Yes it runs in my router. Yes it runs on a 500,000 core supercomputer (actually, a very large number of supercomputers). His specific needs will be assessed, and yes, I can see software coming out of his request.

    6. Re:Is this real or fantasy? by Anonymous Coward · · Score: 0

      It's amazing how many UNIX (initially UNICS, or jokingly, Unics) folk have absolutely zero idea of anything beyond the "everything is a file" mentality.

      It's quite retarding that they have no knowledge of MULTICS whence their own favorite OSs nose-thumbing names, and many shell commands, originate.

      Educate yourself before you wind up permantently retarded.

      http://en.wikipedia.org/wiki/Multics

      Now ask yourself: If personal computers have had memory virtualization for decades, then why the fuck are we still pretending that they don't in userland? That's simple: because "everything is a file". Ugh. As if information is not self descriptive via hash code (thus actually needs no "filename" for machines to identify it) or as if name collisions aren't plaguing your ridiculous name-hive-minded approach.

      Indeed, I'm positive few are truely aware of just how inside-the-box most POSIX admins are.

      Those who don't understand MULTICS are doomed to reimplement it, poorly.

  17. Have your cake, and eat it too by Anonymous Coward · · Score: 0

    Mainframes have always looked massively expensive, so we made do with cheap commodity crap. And crappy it was. You can see it everywhere, from (lack of, or bolted on as an afterthrought) management features, to single points of failure everywhere, to being cheaply made and so prone to breakage and very hard to diagnose. Most of us have never worked with anything else so have no idea that things could be massively better. Resource management in the OS is but a small thing lacking in comparison.

    What's most amazing is that this status quo is gospel, that nobody saw fit to sit back and really think about the whole thing and perhaps start a project or two to try and do something about it. Instead we see marginal fiddling that really isn't innovating at all. From the poetteringware that's deliberately but unnecessarily breaking compatability in the name of progress but hardly progressing at all, to a bright new "standard" in rack sizes, right smack dab between the previous two(!) existing standards in size while still managing to fail to seize the chance to go metric, with a lot of cheap more-of-the-same software and hardware inbetween. The larger theme in computing is that it's not progressing much at all. It's not even baby steps, it's fiddling, doodling, not going anywhere at all.

    1. Re:Have your cake, and eat it too by K.+S.+Kyosuke · · Score: 1

      right smack dab between the previous two(!) existing standards in size

      That reminds me of the (rejected) compromise that suggested that we index arrays starting with 0.5. :)

      --
      Ezekiel 23:20
    2. Re:Have your cake, and eat it too by Anonymous Coward · · Score: 0

      What do you mean "rejected"?? That's basically exactly what you do in OpenGL when you (mis-)use a texture as an array! ;)
      How's that saying with no idea is too stupid to not be implemented?

  18. This isn't just "Taken care of" by a hypervisor by Anonymous Coward · · Score: 0

    This really can be a user-visible problem.
    For example, the scheduling of things like SSD trims really needs to be stepped up.

    Right now you can get unexpected blocking behaviour, for up to a whole second.
    And there's no way for user-land to see it's going to happen, or even really to know what level of storage it is going to be using.

    Maybe this stuff wants to be done as cluster management, rather than as part of the core kernel; but from a user's point of view - it just needs to be done.

    1. Re:This isn't just "Taken care of" by a hypervisor by jones_supa · · Score: 1

      Why is the I/O layer team of Linux not taking responsibility to make TRIM work properly? Linux still sends individual TRIM sector commands instead of TRIM ranges. This creates unnecessary traffic in the bus and is especially nasty for everything before SATA 3.1, because then the TRIM command has to be executed synchronously, meaning that the device command queue has to be completely flushed first.

    2. Re:This isn't just "Taken care of" by a hypervisor by Anonymous Coward · · Score: 1

      because maintaining lists of blocks and having algorithms to coalesce them and flush to disk from time to time sounds simple, but is actually very complicated, almost as complicated as the rest of the driver. It is basically implementing garbage collection in a disk driver, which introuces all sorts of asynchrony and plays havoc with latencies. Love doing that sort of thing in kernel space, no? The spec is fine, but doing that sort of thing in a driver is asking too much. It should be done in user space.

    3. Re:This isn't just "Taken care of" by a hypervisor by jones_supa · · Score: 1

      That is true.

    4. Re:This isn't just "Taken care of" by a hypervisor by Anonymous Coward · · Score: 0

      FreeBSD does this just fine with similar or better performance than Linux. GEOM sits as an abstract layer that can make 512byte sectors look like 4KB sectors, so when you change HDs and you sector size changes, GEOM makes the drive look the same. It also allows for transparent use of TRIM. If your device support TRIM, it'll send them, if it does not, it won't send them. But to the higher layers, all devices support TRIM.

      This also means the kernel does not need to worry about TRIM itself, only GEOM does. This puts all TRIM handling in a single point of the entire OS. This also meant that many older File systems that did not support TRIM, automagically got it when installed over GEOM.

  19. Re:mainframe is old crap for geezers by Anonymous Coward · · Score: 1

    Cloud???
    Isn't that a mainframe connected over the internet with dumbed down terminals which require little complexity because the real complexity is located at a central point.

    To clarify, cloud services act as the modern equivalent of the classic mainframe and the communication channels between the core system and the terminals has changed.

  20. Lotta work for an OS nobody uses by Anonymous Coward · · Score: 0

    What is it, like 2% share? I mean, it was cool when 1% used it but now it's just an old, desperate OS looking for something, ANYTHING, to keepit from dying completely.

    1. Re:Lotta work for an OS nobody uses by Z00L00K · · Score: 2

      2% may be the desktop share for Linux, but when it comes to servers and handheld devices like Android it's a different story.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  21. So ... by Anonymous Coward · · Score: 0

    Linux grew because when people wanted/needed something, they wrote it themselves. Companies helped with money/manpower because they got some benefits.
    So, if there's something missing, then it's probably not needed, or the other solutions cover it well enough.

  22. Re:This obsession with everything in RAM needs to by K.+S.+Kyosuke · · Score: 1

    I guess that's why Azul hired all those smart people, to make that go away for good.

    --
    Ezekiel 23:20
  23. Re: This obsession with everything in RAM needs to by K.+S.+Kyosuke · · Score: 1, Insightful

    Garbage collection necessarily wastes memory by factor of 1.5 to 2.

    And manual memory management on a similar scale wastes CPU time. And the techniques that alleviate one also tend to help the other, or not?

    Finally, the most important aspect for program performance is locality and memory layout, something you cannot even optimize for in a language where every object is a pointer to some memory on a garbage-collected heap.

    There's not a dichotomy here. Oberon and Go are garbage collected without everything being a heap pointer.

    --
    Ezekiel 23:20
  24. Straw Proposals? by Anonymous Coward · · Score: 1

    I thought the title wanted to talk about something revolutionary, so I read through the details.

    What I discovered was that the title was bullshit, so were the concerns surrounding Linux's capabilities. Some of them make sense for general all-purpose computation, some of them don't. I don't see why anybody should take these proposals too seriously for kernel inclusions.

    The portion on primary memory management is perfect. Hadoop does suffer from lack of cache aware code; So far, only modified kernels have been in use with systems such as Azul's C4 based Virtual Machine.

    The portion on user driven resource management (CPU/disk) is a very thorny issue. Most people don't use big monolithic computers, but provisioned, distributed systems. This leads to better separation of concerns and better diagnostics. This may be a non issue for most people than create complicated, entangled, scheduler code.
    The portion on User Accounting generally does not make sense for most Linux machines in production today. Most people who favor lower latencies do not want context switches.

    Linux is not a product, but a meta-product. It is up to the implementor to take a variety of components, put them together in a logical way, and configure the modules/userland to work with them correctly. If IBM feels that they want to bring in that prickly complexity to run Linux on a boilerplate expensive mainframe computer, it's their headache.

  25. Re:mainframe is old crap for geezers by K.+S.+Kyosuke · · Score: 1

    When you go cloudy, you can do the same things on a somewhat higher level. As in, when you go Google-sized, the allocation and management of resources with a granularity of a computing node doesn't probably bother you much, because you have tens or hundreds of thousands of them. Trying to solve these problems on the single system level might be a waste of time for many applications. This is more of a problem for on-site big iron. It's an interesting problem, and if solved, could be of use to many people, but it would be much less useful for cloud providers.

    --
    Ezekiel 23:20
  26. 64-bit address space.. by Anonymous Coward · · Score: 0

    ..ought be enough for everyone. I mean, 2^64 could address all atoms in the solar system. How much porn do you expect to be able to store anyway?

    1. Re:64-bit address space.. by Anonymous Coward · · Score: 0

      The amount of available information is only a fraction of what is possible information at any point of time. Since there is time, anything requiring to document information is proportional to the product of space and time. Since time is infinite unless YOU prove it otherwise, you need a lot of space to store all of that.

    2. Re:64-bit address space.. by Anonymous Coward · · Score: 0

      It is "only" 2^48 currently though.

    3. Re:64-bit address space.. by Anonymous Coward · · Score: 1

      I mean, 2^64 could address all atoms in the solar system.

      False. It could almost address all atoms in a milligram of matter, though.

  27. complex application example by lkcl · · Score: 4, Insightful

    i am running into exactly this problem on my current contract. here is the scenario:

    * UDP traffic (an external requirement that cannot be influenced) comes in
    * the UDP traffic contains multiple data packets (call them "jobs") each of which requires minimal decoding and processing
    * each "job" must be farmed out to *multiple* scripts (for example, 15 is not unreasonable)
    * the responses from each job running on each script must be collated then post-processed.

    so there is a huge fan-out where jobs (approximately 60 bytes) are coming in at a rate of 1,000 to 2,000 per second; those are being multiplied up by a factor of 15 (to 15,000 to 30,000 per second, each taking very little time in and of themselves), and the responses - all 15 to 30 thousand - must be in-order before being post-processed.

    so, the first implementation is in a single process, and we just about achieve the target of 1,000 jobs but only about 10 scripts per job.

    anything _above_ that rate and the UDP buffers overflow and there is no way to know if the data has been dropped. the data is *not* repeated, and there is no back-communication channel.

    the second implementation uses a parallel dispatcher. i went through half a dozen different implementations.

    the first ones used threads, semaphores through python's multiprocessing.Pipe implementation. the performance was beyond dreadful, it was deeply alarming. after a few seconds performance would drop to zero. strace investigations showed that at heavy load the OS call futex was maxed out near 100%.

    next came replacement of multiprocessing.Pipe with unix socket pairs and threads with processes, so as to regain proper control over signals, sending of data and so on. early variants of that would run absolutely fine up to some arbitrarry limit then performance would plummet to around 1% or less, sometimes remaining there and sometimes recovering.

    next came replacement of select with epoll, and the addition of edge-triggered events. after considerable bug-fixing a reliable implementation was created. testing began, and the CPU load slowly cranked up towards the maximum possible across all 4 cores.

    the performance metrics came out *WORSE* than the single-process variant. investigations began and showed a number of things:

    1) even though it is 60 bytes per job the pre-processing required to make the decision about which process to send the job were so great that the dispatcher process was becoming severely overloaded

    2) each process was spending approximately 5 to 10% of its time doing actual work and NINETY PERCENT of its time waiting in epoll for incoming work.

    this is unlike any other "normal" client-server architecture i've ever seen before. it is much more like the mainframe "job processing" that the article describes, and the linux OS simply cannot cope.

    i would have used POSIX shared memory Queues but the implementation sucks: it is not possible to identify the shared memory blocks after they have been created so that they may be deleted. i checked the linux kernel source: there is no "directory listing" function supplied and i have no idea how you would even mount the IPC subsystem in order to list what's been created, anyway.

    i gave serious consideration to using the python LMDB bindings because they provide an easy API on top of memory-mapped shared memory with copy-on-write semantics. early attempts at that gave dreadful performance: i have not investigated fully why that is: it _should_ work extremely well because of the copy-on-write semantics.

    we also gave serious consideration to just taking a file, memory-mapping it and then appending job data to it, then using the mmap'd file for spin-locking to indicate when the job is being processed.

    all of these crazy implementations i basically have absolutely no confidence in the linux kernel nor the GNU/Linux POSIX-compliant implementation of the OS on top - i have no confidence that it can handle the load.

    so i would be very interested to hear from anyone who has had to design similar architectures, and how they dealt with it.

    1. Re:complex application example by Anonymous Coward · · Score: 0

      It's a interesting combination when you are clearly a very knowledgeable guy but don't use capital letters to begin sentences. :)

    2. Re:complex application example by Anonymous Coward · · Score: 0

      Here? Try stack overflow.

    3. Re:complex application example by sonamchauhan · · Score: 1

      Try putting a load balancer (Cisco ACE, Citrix NetScaler) on a virtual IP and load balancing the UDP packets across several nodes behind the balancer.

    4. Re:complex application example by Anonymous Coward · · Score: 0

      If you've got no confidence in the Linux kernel then why don't you port your code to some alternative OSes (Solaris, FreeBSD, OS X, Windows etc.) to compare performance?
      Reading your post there's a few oddities that occur to me, though obviously I'm probably missing a lot of the relevant information.
      If you're trying to achieve maximum performance I'm wondering why you're coding with python.
      Why are all of your processes waiting for epoll? Surely you've got one process reading the network data and spawning the required threads?
      You might find you don't need much resource locking at all with the right design.
      Have you worked out the theoretical maximum performance you could achieve with the hardware configuration you've chosen? How close to this are you getting with your current implementation? Maybe it would be more practical to scale your system horizontally rather than spending more time and money trying to squeeze more performance out of your current architecture.

    5. Re:complex application example by Anonymous Coward · · Score: 0

      You might benefit from looking at zeromq, which can simplify this type of coordinated processing, both in single- and multi-node systems. There's a Python binding, so you should be able to give it a go quite quickly. Not guaranteed that this is the right approach for your particular requirements, but it does sound similar to stuff I've worked on in the past, and in my opinion zmq does simplify away a lot of the complexity in a reliable way. Performance is pretty amazing too! See the zeromq guide for details

    6. Re:complex application example by Mr+Thinly+Sliced · · Score: 5, Insightful

      > the first ones used threads, semaphores through python's multiprocessing.Pipe implementation.

      I stopped reading when I came across this.

      Honestly - why are people trying to do things that need guarantees with python?

      The fact you have strict timing guarantees means you should be using a realtime kernel and realtime threads with a dedicated network card and dedicated processes on IRQs for that card.

      Take the incoming messages from UDP and post them on a message bus should be step one so that you don't lose them.

    7. Re:complex application example by lkcl · · Score: 4, Informative

      > the first ones used threads, semaphores through python's multiprocessing.Pipe implementation.

      I stopped reading when I came across this.

      Honestly - why are people trying to do things that need guarantees with python?

      because we have an extremely limited amount of time as an additional requirement, and we can always rewrite critical portions or later the entire application in c once we have delivered a working system that means that the client can get some money in and can therefore stay in business.

      also i worked with david and we benchmarked python-lmdb after adding in support for looped sequential "append" mode and got a staggering performance metric of 900,000 100-byte key/value pairs, and a sequential read performance of 2.5 MILLION records. the equivalent c benchmark is only around double those numbers. we don't *need* the dramatic performance increase that c would bring if right now, at this exact phase of the project, we are targetting something that is 1/10th to 1/5th the performance of c.

      so if we want to provide the client with a product *at all*, we go with python.

      but one thing that i haven't pointed out is that i am an experienced linux python and c programmer, having been the lead developer of samba tng back from 1997 to 2000. i simpy transferred all of the tricks that i know involving while-loops around non-blocking sockets and so on over to python. ... and none of them helped. if you get 0.5% of the required performance in python, it's so far off the mark that you know something is drastically wrong. converting the exact same program to c is not going to help.

      The fact you have strict timing guarantees means you should be using a realtime kernel and realtime threads with a dedicated network card and dedicated processes on IRQs for that card.

      we don't have anything like that [strict timing guarantees] - not for the data itself. the data comes in on a 15 second delay (from the external source that we do not have control over) so a few extra seconds delay is not going to hurt.

      so although we need the real-time response to handle the incoming data, we _don't_ need the real-time capability beyond that point.

      Take the incoming messages from UDP and post them on a message bus should be step one so that you don't lose them.

      .... you know, i think this is extremely sensible advice (which i have heard from other sources) so it is good to have that confirmed... my concerns are as follows:

      questions:

      * how do you then ensure that the process receiving the incoming UDP messages is high enough priority to make sure that the packets are definitely, definitely received?

      * what support from the linux kernel is there to ensure that this happens?

      * is there a system call which makes sure that data received on a UDP socket *guarantees* that the process receiving it is woken up as an absolute priority over and above all else?

      * the message queue destination has to have locking otherwise it will be corrupted. what happens if the message queue that you wish to send the UDP packet to is locked by a *lower* priority process?

      * what support in the linux kernel is there to get the lower priority process to have its priority temporarily increased until it lets go of the message queue on which the higher-priority task is critically dependent?

      this is exactly the kind of thing that is entirely missing from the linux kernel. temporary automatic re-prioritisation was something that was added to solaris by sun microsystems quite some time ago.

      to the best of my knowledge the linux kernel has absolutely no support for these kinds of very important re-prioritisation requirements.

    8. Re:complex application example by Mr+Thinly+Sliced · · Score: 4, Informative

      First - the problem with python is that because it's a VM you've got a whole lot of baggage in that process out of your control (mutexes, mallocs, stalls for housekeeping).

      Basically you've got a strict timing guarantee dictated by the fact that you have incoming UDP packets you can't afford to drop.

      As such, you need a process sat on that incoming socket that doesn't block and can't be interrupted.

      The way you do that is to use a realtime kernel and dedicate a CPU using process affinity to a realtime receiver thread. Make sure that the only IRQ interrupt mapped to that CPU is the dedicated network card. (Note: I say realtime receiver thread, but in fact it's just a high priority callback down stack from the IRQ interrupt).

      This realtime receiver thread should be a "complete" realtime thread - no malloc, no mutexes. Passing messages out of these realtime threads should be done via non-blocking ring buffers to high (regular) priority threads who are in charge of posting to something like zeromq.

      Depending on your deadlines, you can make it fully non-blocking but you'll need to dedicate a CPU to spin lock checking that ring buffer for new messages. Second option is that you calculate your upper bound on ring buffer fill and poll it every now and then. You can use semaphores to signal between the threads but you'll need to make that other thread realtime too to avoid a possible priority inversion situation.

      > how do you then ensure that the process receiving the incoming UDP messages is high enough priority to make sure that the packets are definitely, definitely received

      As mentioned, dedicate a CPU mask everything else off from it and make the IRQ point to it.

      > what support from the linux kernel is there to ensure that this happens

      With a realtime thread the only other thing that could interrupt it would be another realtime priority thread - but you should make sure that situation doesn't occur.

      > is there a system call which makes sure that data received on a UDP socket *guarantees* that the process receiving it is woken up as an absolute priority over and above all else

      Yes, IRQ mapping to the dedicated CPU with a realtime receiver thread.

      > the message queue destination has to have locking otherwise it will be corrupted. what happens if the message queue that you wish to send the UDP packet to is locked by a *lower* priority process

      You might get away with having the realtime receiver thread do the zeromq message push (for example) but the "real" way to do this would be lock-free ring buffers and another thread being the consumer of that.

      > what support in the linux kernel is there to get the lower priority process to have its priority temporarily increased until it lets go of the message queue on which the higher-priority task is critically dependent

      You want to avoid this. Use lockfree structures for correctness - or you may discover that having the realtime receiver thread do the post is "good enough" for your message volumes.

      > to the best of my knowledge the linux kernel has absolutely no support for these kinds of very important re-prioritisation requirements

      No offense, but Linux has support for this kind of scenario, you're just a little confused about how you go about it. Priority inversion means you don't want to do it this way on _any_ operating system, not just Linux.

    9. Re:complex application example by Alef · · Score: 1

      Honestly - why are people trying to do things that need guarantees with python?

      Oh, you got that far at least? What I wonder is, why are people trying to do things that need guarantees using UDP with no back-communication, no redundancy built in to the protocol, and not even detection of lost packets? External requirement my ass, why do you accept a contract under those conditions? The correct thing to say is "this is broken, and it's not going to work". If they still want the turd polished, it should be under very clear conditions of not accepting responsibility for the end result, and they should be known and understood by all decision makers at the customer. And even so I would be wary.

      Otherwise, you're in a prime position for getting hit by the blame when shit hits the fan, either because it doesn't work, or because you didn't tell them that in the first place, since you are supposed to be the expert.

    10. Re: complex application example by rkit · · Score: 1

      You should look up mutex attributes, in particular priority inheritance. Also, I think you are experiencing the "thundering herd" effect. Maybe the leader/follower pattern could be effective here.

      --
      sig intentionally left blank
    11. Re:complex application example by Mr+Thinly+Sliced · · Score: 1

      FWIW I agree vis-a-vis using UDP for a business critical thing. I'd want exemption from responsiblity for any missed packets purely due to the infrastructure in between.

    12. Re:complex application example by Gothmolly · · Score: 1

      a) Your UDP buffers probably suck. OOB RedHat gives you 128K, and each packet takes up 2304 bytes of buffer space. Try 100MB, or whatever YOUR_RATE/2304 works out to.
      b) Pull off the queue and buffer in RAM as fast as you can
      c) Have a second thread read from RAM
      d) Don't invoke scripts to process each packet, you're spinning all your time in process creation. In fact, don't use interpreted scripts at all.

      --
      I want to delete my account but Slashdot doesn't allow it.
    13. Re:complex application example by anon+mouse-cow-aard · · Score: 1

      Given this problem, there are several options for fanout... Im assuming that hardware can be added, so adding a load balancer and then three or four machines to cope with the load behind the load balancer might be the quickest (least code change) way to address the issue. Especially if there is no global state needed, this is likely the most expedient.

      An option that might be a bit more flexible on a single box, while still scalable, would be to have a task that parses each incoming job and posts it to a rabbitmq instance (AMQP bus.) rabbitmq works very well out of the box, with little tweaking. you then have the fifteen scripts called in subscriber instances as separate processes. You are essentially farming out all the IPC to the broker, and the broker does this sort of thing very well. The scripts are now isolated processes, and their memory management etc... now become separate issues (if one misbehaves, you an always have the subscription management wrapper around it restart it from time to time.)

      Pika would be the preferred python bindings appropriate for speaking with the broker. You might still be beyond what can be done with a single node, but growing things with AMQP/rabbit is straight-forward.

    14. Re:complex application example by Anonymous Coward · · Score: 1

      ...If you're trying to achieve maximum performance I'm wondering why you're coding with python...

      That was my Daily WTF too

      1) even though it is 60 bytes per job the pre-processing required to make the decision about which process to send the job were so great that the dispatcher process was becoming severely overloaded

      So the OP is using 1 thread, even though each incoming UDP packet can be "pre-processed" embarrassingly parallel fashion? The main issue I see with the OP's design is that each UDP packet is being worked on by at least 17 threads/processes:
            1) the dispatcher (pre-processing) thread
            2) all the ## "scripts" (OP said 15)
            3) then the "post-precessing" thread

      That is a hell of a lot of inter-process (or inter-thread) communication for EACH UDP packet. How about this design:

            1) thread to handle incoming UDP packets; ie just put then into a queue that worker threads pull from -- if the queue is full, start dropping packets
            2) a thread-pool of 'workers', who:
                    a) "pre-processes" the UDP packet
                    b) does the 'work' of the 'scripts' single-threadedly (each worker thread can handle ANY UDP packet)
                    c) post-processes and return results to where ever they go

      Now each UDP packet is touched by 2 threads, not 17.

      captcha: calmness

    15. Re:complex application example by hyc · · Score: 1

      Totally agreed. The lack of guarantees re: UDP is built into the UDP spec, it's not a failing of the Linux kernel (nor any other OS) that it won't tell you about dropped packets. Luke, you should know better than this.

      --
      -- *My* journal is more interesting than *yours*...
    16. Re:complex application example by raxx7 · · Score: 1

      Interesting. I sounds a bit like an application I have.
      Like yours, it involves UDP and Python.
      I have 150.000 "jobs" per second arriving in UDP packets. "Job" data can be between 10 and 1400 bytes and as many "jobs" are packed into each UDP packet as possible.

      I use Python because, intermixed with the high performance job processing, I also mix slow but complex control sequences (and I'd rather cut my wrists than move all that to C/C++).
      But to achieve good performance, I had to reduce Python's contribution to the critical path as much as possible and offload to C++.

      My architecture has 3 processes, which communicate through shared memory and FIFOs.
      The shared memory is divided into fixed size blocks, each big enough to contain the information for a maximum size jobs.

      Processs A is C++ and has two threads.
      Thread A1 receives the UDP packets, decodes the contents, writes the decoded job into a shared memory block and stores the block index number into a queue.
      Thread A2 handles communication with process B. This communication consists mainly of sending process B block index numbers (telling B where to get job data) and receiving block index numbers back from process B (telling A that the block can be re-used).

      Process B is a single threaded Python.
      When in the critical loop, it's main job is to forward block index numbers from process A to process C and from process C back to process A.
      (It also does some status checks and control functions, which is why it's in the middle).
      In order to keep the overhead low, the block index numbers are passed in batches of 128 to 1024 (each block index number corresponding to a job).

      Process C is, again, multi-threaded C++.
      The main thread takes the data from the shared memory, returns the block index numbers to process B and pushes the jobs through a sequence of processing modules, in batches of many jobs.
      Withing each processing module, the module hands out the batch of jobs to a thread pool and back, while preserving the order.

    17. Re:complex application example by Greyfox · · Score: 1
      Could you put multiple network cards on your scheduler machine, put the workers on different subnets and randomly dole out the jobs between those subnets? Seems like you'd be less likely to drop UDP packets that way, I'm pretty sure I ran across a utility (lsipc or something) that would list IPC resources, including shared memory. I seem to recall that the segments also show up in /proc somewhere. It's been a while since I've looked at it.

      Not being able to ack important message packets seems like a design flaw.

      Even though we have a LOT more hardware now than we did back in the day, you still can't BFI your way through a lot of the big data applications that companies are starting to try to get into. In the past, the company would just throw more hardware at a poorly designed application and that would "solve" the problem. I once saw a team throw 48 gigabytes of RAM at a leaky Java program, and schedule weekly restarts for the goddamn thing. But it's a lot easier to hit hard walls with big data, to the point where you absolutely can't throw more hardware at the problem.

      --

      I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    18. Re:complex application example by awol · · Score: 1

      Absolutely. Soooo doomed. You cannot guarantee that the UDP packets even get across the wire to your NIC what difference does it matter whether you software gets them all out of the NIC

      --
      "The first thing to do when you find yourself in a hole is stop digging."
    19. Re:complex application example by Anonymous Coward · · Score: 0

      Let me guess: the user has to be able to add new scripts to the system as the needs change? And you therefore have a user limitation which tools to use? Like the others have suggested, you might need a real time part in the system, and a non-real time part for the processing. Sounds like you need some kind of sequence numbering in the job processing step ;).

    20. Re:complex application example by BitZtream · · Score: 1

      Honestly - why are people trying to do things that need guarantees with python?

      Because they don't actually know how to do what they are claiming the requirements are and they refuse to turn it over to someone who does.

      I'd have thought that was pretty clear. Trying to do real time work in python made it clear to me.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    21. Re:complex application example by Anonymous Coward · · Score: 0

      * the UDP traffic contains multiple data packets (call them "jobs") each of which requires minimal decoding and processing

      anything _above_ that rate and the UDP buffers overflow and there is no way to know if the data has been dropped. the data is *not* repeated, and there is no back-communication channel.

      How are you planning on handling UDP checksum errors without a backchannel or EC? The physical ethernet layer is lossy, so you're screwed even before the packet hits the NIC.

    22. Re:complex application example by Anonymous Coward · · Score: 0

      What you describe operates exactly like a modern GPU. Lots of little draw commands mixed into a packet. You know how they handle dispatching this stuff? They use FIFO ring buffers with a hardware dispatcher that sends the crap out to one of the many processing units. If each processing unit can *mostly* work in a small amount of memory (think the size of processor L1 cache), then stuff flies pretty well.

      Speaking of the receiving end on a processing unit... a nice way to receive requests from the dispatcher might be: Another ring buffer.

      Ring buffers are freaking great for this because the transmitter never ever needs to wait for the receiver unless it's full.

      Now here's where things get application specific... are these packets working on a shared state where everything was serialized (through some UDP miracle?) Then you're going to have a bottleneck on the work output side trying to order the resulting actions.

      A way around it though? Another fucking ring buffer. Just, it will have fixed slot sizes and be a good bit larger and all the work requests will be tagged from the dispatcher with a sequence number, so that when the window opens up, the work units can write their results into the correct slot and the output serializer can fly through them with only a "is work unit #57925 done yet?" kind of bit check in its buffer.

      I'll just put it out there too: Don't use python for this. Static allocated buffers = good. Dynamic shitheap with unpredictable delays when you're trying to be realtime = bad.

    23. Re:complex application example by sjames · · Score: 1

      You'll need a bit of C, but consider using sched_setscheduler on the receiver process to make sure you get the packets before the buffer fills. That process can have a big buffer and keep a queue stuffed for the actual handling. Probably one thread to receive and one to stuff the queue will work.

      The worker processes can remain as python processes at that point. As long as your queue is lossless and the workers are on average fast enough AND their jitter is smaller than your buffer in the high performance C code, it should work.

      By using the pipe, and a thread, you avoid the problems from the worker process priority not being boosted.

      Given that the work packets are small, consider reading and writing more than one in a single call to reduce context switch overhead.

    24. Re:complex application example by RelliK · · Score: 1

      > the first ones used threads, semaphores through python's multiprocessing.Pipe implementation. the performance was beyond dreadful, it was deeply alarming. after a few seconds performance would drop to zero. strace investigations showed that at heavy load the OS call futex was maxed out near 100%.

      uhhm... wait what?

      You are aware that python has global interpreter lock, right? And because of that multi-threaded performance in python is actually *worse* than single-threaded? But this is an inherent flaw in python interpreter and has nothing to do with Linux. It also has nothing to do with the topic of this article.

      --
      ___
      If you think big enough, you'll never have to do it.
    25. Re:complex application example by Anonymous Coward · · Score: 0

      This isn't a Linux resource management problem, this is a poor architecture problem where you have chosen the wrong tools for the job.

    26. Re:complex application example by Bengie · · Score: 1

      He said the CPU is mostly idle. He's trying to set up his system to handle lots of tiny tasks and Linux isn't playing well with the regular tools.

    27. Re:complex application example by Bengie · · Score: 1

      When you handling lots of little messages/jobs/tasks that are coming in quickly, passing data between processes is a horrible idea. Between context switching and system calls, you're destroying your performance.

      You need to make larger batches.

      1) UDP/Job comes in, write to single-writer many reader queue(large circular queues can be good for this) and the order number, maybe a 64bit incrementing integer. If the run time per job is quite constant, then you could use several single reader/writer queues and just round robin them. This would reduce potential lock contention, but would come at the cost of variable work loads could cause a bias towards a single worker.

      1.a) You're not receiving packets fast enough to worry about threading reading from the NIC. If you had to look into making this part faster, like millions of Packets Per Second, the first thing I would find out is if this packets are coming from multiple data sources and if jobs need to be processing in order relative to all sources or to themselves. If themselves, then you could have a load balancer trying to round-robin and sticky by Source IP.

      2) Worker sees jobs in queue(since this is a speed sensitive dedicated matching, polling could work, but may want event based), grabs N jobs, where those N Jobs can be reliably completed in a timely fashion, this may be 1 or may be 100, who knows until you test. Note the order number of your Jobs. You don't really need to grab N jobs if using a single reader/writer queue since there is no real contention, but reading in batches is good for high contention queues like multi-readers.

      3) Your worker will now loop through each job running each script, hopefully all on the same worker/thread.

      4) Write out the completed jobs to a single reader single writer queue. If you don't use a single reader/writer queue and instead have a multi-writer queue, you may want to commit finished jobs in batches to reduce contention.

      5) Have another worker poll/event each of queues for each worker. This worker can make sure the jobs are put back in order. This process I assume to be relatively lite, so probably a single worker to handle all of the worker queues, but could also be threaded. You just need to manage the ordering somehow.

      You should have no more than N number of workers per core, where N is probably a small number, like 2. Lots of threads is bad.

      I love single reader/writer queues, they can be lock-less.

      Your problem sounds close to what Disruptor handles (Google: disruptor ring buffer)(fun read: http://mechanitis.blogspot.com...). May want to also look into that kind of design. It's an interesting project that runs on Java and .Net, and I think C or something, but I can't remember. Still a good read.

    28. Re:complex application example by Bengie · · Score: 1

      But if done correctly, you can do line rate UDP with 0% loss. Routers can do line rate without loss all the time. He's talking about thousands of packets per second, not the millions to tens of millions a modern NIC can handle.

    29. Re:complex application example by Bengie · · Score: 1

      What kind of crappy network equipment does your job use that has packet loss at anything less than line rate? He's talking about near 1mbit/sec of UDP. I can get 0% packet-loss around the world for only 1mb/s

    30. Re:complex application example by Bengie · · Score: 1

      * the UDP traffic contains multiple data packets (call them "jobs") each of which requires minimal decoding and processing

      anything _above_ that rate and the UDP buffers overflow and there is no way to know if the data has been dropped. the data is *not* repeated, and there is no back-communication channel.

      How are you planning on handling UDP checksum errors without a backchannel or EC? The physical ethernet layer is lossy, so you're screwed even before the packet hits the NIC.

      Lossy?

      I just logged into my switch at home and it has 146 days of uptime with 20,154,030,043 frames processed and 0 frame errors. I can even do a 1gb/1gb, for a total of just under 2gb/s at once, iperf, and have 0 packets dropped.

      Let the network group worry about QoS. But yes, errors will eventually happen, they're just very rare. But when they do happen, it's probably pathological and you'll get a lot of them. But I wouldn't go so far to say "the physical ethernet layer is lossy", as a general statement.

    31. Re:complex application example by Anonymous Coward · · Score: 0

      Are you familiar with DPAA? I'm using the FreeScale P4080 for a project at work and it sounds like a good fit for your task. Think of DPAA as a network accelerator similar to a graphics accelerator. The Parse/Classify/Distribute engine can easily solve your #1 problem before the packet reaches the CPU by sorting your incoming traffic into separate work queues.

      I don't think a POSIX based target is what you need for this task, either. You might want to look into an RTOS such as Integrity or QNX - both thrive on small message passing between tasks.

      My P4080 project I ended up removing the TCP/IP stack and working on raw Ethernet frames because I was seeing the majority of CPU time being spent doing TCP/IP stuff. There's way too much going on there for this activity and honestly, you don't need it in there.

      What city are you in?

    32. Re:complex application example by Anonymous Coward · · Score: 0

      if you had ever done someting in assembler you will be whacking your head against an anvil .
      if it is critical you need real hardware irq's, a realtime kernel. if you have the time you can go read some qnx if you want.

    33. Re:complex application example by Alef · · Score: 1

      Of course it's technically possible to transmit packets with essentially 0% loss, and I'm sure there are set-ups that would work under the right circumstances. That's not the point. The point is that each and every component involved, from hardware through firmware to software, is designed under the premiss that it is okay to drop a packet at any time for any reason, or to duplicate or reorder packets. Even if you get it to work, the replacement of any single component, or the triggering of some corner case you haven't tested for (some hardware counter wrapping around or whatever imaginable), might suddenly blow everything up. It's just an insanely fragile system, and you need to have complete and total control of the implementation of every involved component, not just their specifications, in order to ensure that your system meets your spec.

      Either switch protocol, or implement something on top of UDP that adds the reliability. There is no other sane way.

    34. Re:complex application example by Anonymous Coward · · Score: 0

      i would have used POSIX shared memory Queues but the implementation sucks: it is not possible to identify the shared memory blocks after they have been created so that they may be deleted. i checked the linux kernel source: there is no "directory listing" function supplied and i have no idea how you would even mount the IPC subsystem in order to list what's been created, anyway.

      ipcs

      But don't use IPC shared memory. Use mmap.

    35. Re:complex application example by Anonymous Coward · · Score: 0

      so i would be very interested to hear from anyone who has had to design similar architectures, and how they dealt with it.

      You're trying to polish a turd. I wish I knew who you worked for, so I could avoid buying their products.

    36. Re:complex application example by Bengie · · Score: 1

      The point is that each and every component involved, from hardware through firmware to software, is designed under the premiss that it is okay to drop a packet at any time for any reason, or to duplicate or reorder packets.

      That entire sentence is damn near a lie. Those issue can happen, but they shouldn't happen. You almost have to go out of your way to make those situations happen. Dropping a packet should NEVER happen except when going past line rate. Packets should NEVER be duplicated or reordered except in the case of a misconfiguration of a network. Networks are FIFO and they don't just duplicate packets for the fun of it.

      As for error rates, many high end network devices can upwards of an error rate of 10E-18, which puts it at one error every 111petabytes. I assume you'd have to divide that error rate by the number of hops.

      I've seen enough system designs where they send data as UDP packets and they require incredibly low packet-loss rates, border-lining never. It can be done, but you're not going to be using dlink switches. You can purchase L4 switches now with multi-gigabyte buffers. They're meant to handle potentially massive throughput spikes and not drop packets.

      I assume this is all intra-datacenter traffic or at least an entirely reserved network.

    37. Re:complex application example by lkcl · · Score: 1

      hi mr thinly-sliced, thank you this is awesome advice, really really appreciated.

    38. Re:complex application example by Mr+Thinly+Sliced · · Score: 1

      You're welcome - I hope you get it sorted out.

      The only other thing I'd mention - you perhaps noticed I kept saying "threads like.." and "with regular threads" because it's basically introduced a number of single points of failure. Due to the lack of back channel or retransmission, things can go silently wrong without notice (network cable failure etc). In an ideal world you'd double up on some of that infrastructure and networking.

      I know you need to get something up and running, but it's perhaps something to bear in mind for a later iteration.

    39. Re:complex application example by Anonymous Coward · · Score: 0

      Did you have a look at message queues, especially http://zeromq.org/ ?

  28. "open systems" vs closed systems by Anonymous Coward · · Score: 0

    "Resource management and allocation for complex workloads has been a need for some time in open systems"

    Not that mainstream closed systems like microsoft corporation's so-called "windows" product and apple's "macos" system had anything like this.
    And in microsoft corporation's "windows" operating system it is even impossible to implement this, due to buggy system design and already existing tons of issues related to resource management that would require rewriting windows from scratch. Since microsoft corporation's "windows" is just a toy for (rather stupid) children, this will never happen.

    Just a note for people misunderstanding open and closed systems.

  29. Re:This obsession with everything in RAM needs to by Tough+Love · · Score: 3, Insightful

    Garbage collector with no overhead, hmm? Easy peasy with no satanic complexity I suppose. And of course no obnoxious corner cases. Equivalently in engineering, when your bridge won't stay up you just add a sky hook. Easy.

    --
    When all you have is a hammer, every problem starts to look like a thumb.
  30. Re:This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    No need to be facetious. If computer system engineering had no obnoxious corner cases, half of the problems not associated with GC would disappear as well. It's not like you can magically solve everything anyway.

    And yes, a garbage collector with zero overhead. Who would have thought? Well, pretty much anyone in the know, I guess.

  31. Re:mainframe is old crap for geezers by Anonymous Coward · · Score: 2, Informative

    Yeah - the sky is the limit!!!
    Use your Microsoft cloud capabilities without hesitation....

    This message was brought by you by your friendly NSA..

  32. complex application example by Anonymous Coward · · Score: 0

    Switch to Go. 15 scripts to re-write is not the end of the world. Use goroutines. profit ?

  33. Re:mainframe is old crap for geezers by Stumbles · · Score: 1

    Why yes, yes it is nothing more than a rehash of the old days where dumb terminals connected to a mainframe. Sometimes those dumb terminals were connected via terrestrial microwaves or phone lines. Now where did I put my 3270 and where did I put my modified termcap file for a vt220.

    --
    My karma is not a Chameleon.
  34. Re:mainframe is old crap for geezers by viperidaenz · · Score: 3, Informative

    On the contrary, if you can increase the performance of each node by 2x with 100,000 nodes, you've just saved 50,000 of them.

    That's a pretty big cost saving.

    The larger the installation, the more important resource management is. If you need to add more node, not only do you need to buy them, increase network capacity and power them, you also need to increase your cooling capacity, and floor space. Your failure rate goes up too. The higher the failure rate, the more staff you need to replace things.

  35. disk, memory access and cpu usage by Mister+Liberty · · Score: 1

    Weren't they added in Linux 0.01 around 1991?

  36. Re:mainframe is old crap for geezers by K.+S.+Kyosuke · · Score: 2

    I don't dispute the possible savings and their value on large scale, but in general, it seemed to me that these proposals (what TFA describes) covered inter-application interactions, and not intra-application performance management. That's what I had in mind. With application-dedicated nodes (in cloud systems), improving performance is still of paramount importance but you do that with better data structures, careful application design, basically using internal domain knowledge etc., not with some some sort of app/OS generic resource allocation protocols. Or did I miss something?

    --
    Ezekiel 23:20
  37. Re: This obsession with everything in RAM needs to by Anonymous Coward · · Score: 1

    So, rounded corners then.

  38. Parallel Processing Super Computers by Anonymous Coward · · Score: 0

    Parallel processing super computers are a cost effective way of managing complex resources. The new technologies mentioned in Mr. Newman's article will make these super computers all the more efficient.

    1. Re:Parallel Processing Super Computers by anon+mouse-cow-aard · · Score: 1

      uh, no, just the opposite. Many supercomputing applications are about getting access to compute/memory/io bandwidth with as little intermediation as possible. Job allocation methods on supercomputers typically allocate entire nodes, so the sort of fine grained prioritization is prescribed is rather irrelevant. Whole article looks a bit anachronistic, maybe it is sensible to people from a mainframe background where reliability/predictability trumps other requirements, but when performance is an important concern, this kind of intrusive over-monitoring described would not be wanted.

  39. Where is the market demand? by Stonefish · · Score: 1

    There is a solution that does this, it called a mainframe, they're hideously expensive, cooked a motherboard recently 1.2 million, want a 10G network card $20000. Now you can buy an awful lot of commodity hardware for much cheaper so that you have excess resources, need a dedicated system for a database buy one, run the other applications on a shared resource, you'll still end up with spare change if you dump a mainframe contract. You can replace a mainframe with commodity items you just need to plan for it. The cost of this scheduling is more expensive than deploying a couple of dedicated components.
    The last time that I looked the number of cycles being performance on mainframes had been decreasing for over 25 years. ie there's not a great deal of market demand in this area and most of this market is with legacy systems.
    The other litmus test is to look at how many successful IT companies that have developed in the last 20 years use a mainframe. I suspect that it is zero. Do google, facebook amazon etc use mainframes?
    Scheduling and resource control on systems, is a bit like QoS, if you can buy fat pipes just buy fat pipes, it's a better solution and it makes all of the problems go away. Introduce scheduling and you'll be employing goons for now to enternity trying to sort out which application is king and performance still sucks.

    1. Re:Where is the market demand? by Bengie · · Score: 1

      The whole point of QoS is to not have to add more hardware, but to make better use of your current hardware while not having large amounts of jitter. Mainframes don't need to worry about interactive processes, but many modern day work loads do. What they want is a good average throughput with a maximum latency.

  40. Linux ALREADY has it! by Cyberax · · Score: 2

    Really. Author is an idiot. He should actually read something that is not a documentation volume for his beloved IBM mainframe.

    Linux has cgroups support which allows to partition a machine into multiple hierarchic containers. Memory and CPU partitioning works well, so it's easy to give only a certain percentage of CPU, RAM and/or swap to a specific set of tasks. Direct disk IO is getting in shape.

    Lots of people are cgroups in production on very large scales. There are still some gaps and inconsistencies around the edges (for example, buffered IO bandwidth can't be metered) but kernel developers are working on fixing them.

    1. Re: Linux ALREADY has it! by Anonymous Coward · · Score: 0

      Mod the parent up. The author is lazy (didn't bother to look up the state of the art in Linux) or intentionally spewing made-up garbage for some ulterior motive.

    2. Re: Linux ALREADY has it! by Anonymous Coward · · Score: 0

      It's especially funny when punching linux resource management in Google, the very first result shows a nice Cgroups guide written by Red Hat.

  41. Re:This obsession with everything in RAM needs to by jones_supa · · Score: 1

    Microsoft's Virtual Address Space (Windows) page claims that it is 8 terabytes (with a special feature to allocate just a full 2 gigabyte chunk).

  42. Re:mainframe is old crap for geezers by blackjackshellac · · Score: 1

    Are you being intentionally obtuse? It would seem so, but sometimes it's hard to tell on /.

    --
    Salut,

    Jacques

  43. Re: This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    > And manual memory management on a similar scale wastes CPU time.
    No, it doesn't. Manual memory management could include no heap allocation at all, but let's suppose you mean RAII style. GC necessarily waste more than this, and stores up that more for some undefined point in the future. Why? Because if you have perfectly matched creation/deletion, it uses what it needs to. If you have a pooled RAII allocator for use in a specific call tree, it can benefit from the huge economy of scale of discarding an entire pool. Unlike GC though RAII memory management does not need to throw away the information it has at the point of discarding a reference, just to do it later.

  44. _why_ can't we keep throwing hardware at it? by fygment · · Score: 1

    Moore's Law speaks to computational horsepower per unit per cost. But even if the computational abilities do not continue to increase, the costs will keep coming down.

    Hardware is cheap. It's not an elegant solution, but it's cheap. And getting cheaper.

    Focus on the UX, because without that, who cares what your kernel can do? Machines are plenty powerful enough, what you want to do is get your OS in to the hands of the most users possible .... right?

    --
    "Consensus" in science is _always_ a political construct.
    1. Re:_why_ can't we keep throwing hardware at it? by Jeremi · · Score: 1

      Hardware is cheap. It's not an elegant solution, but it's cheap. And getting cheaper.

      Right, but if your company comes up with an elegant solution that gets 10x better performance out of a given piece of hardware, and your competitors cannot (or do not) do the same, then you've got a cost advantage over your competitors and can use that to get customers to choose to buy your product rather than theirs.

      That will always be true, no matter how fast and cheap the hardware gets. Either your customers will be able to do 10 times more work with your product, or (if there isn't 10 times more work to actually do), they can get the job done with 10 times less hardware (and thus 10 times less expense).

      Focus on the UX, because without that, who cares what your kernel can do?

      There is a whole world of software out there that runs in the background and doesn't require much (if any) UX. Think of the software that generates your credit card statement every month.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    2. Re:_why_ can't we keep throwing hardware at it? by Anonymous Coward · · Score: 0

      Microsoft software in the 90s was very fast and give them decades of advantage. But at some point moores law will not care about slow hardware.

  45. Re:This obsession with everything in RAM needs to by Lisias · · Score: 2

    And yes, a garbage collector with zero overhead. Who would have thought? Well, pretty much anyone in the know, I guess.

    MARK / RELEASE from the Pascal days used to work pretty well - this is the less overhead "garbage collector" possible.

    It's impossible to have a Garbage Collector without some kind of overhead - all you can do is try to move the overhead to a place where it's not noticed.

    There's no such thing as Free Lunch.

    --
    Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org
  46. Re:This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    That page has a comment on the bottom indicating that the information is out-of-date, with a pointer to more recent information: "Windows 8.1 and Windows Server 2012 R2: 128 TB"

    dom

  47. tuned by bill_mcgonigle · · Score: 1

    I don't have hard data yet, but I'm finding that EL7 is much much faster than EL6 on the same hardware for the workloads I've tried so far.

    I don't know that tuned is most responsible, but I can see that it's running and that's what it's supposed to do.

    I realize that the kernel is better and perhaps XFS helps, but those alone seem insufficient to realize the difference.

    Anyway, it's somewhat along the direction people are talking about, even if only minimally.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  48. Re:This obsession with everything in RAM needs to by jones_supa · · Score: 1

    Ah yes, I missed that. Apparently my microsecond-long attention span wasn't good enough to read the full page properly.

  49. Re:This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    And I promise to pull out in time, honey.

    Just lay back and think of England.....

  50. Re:mainframe is old crap for geezers by Anonymous Coward · · Score: 0

    And when those clouds swell and rise to the heights too fast, there will be a corporate hail storm and thunder. Possibly even a business tornado.

  51. Re:This obsession with everything in RAM needs to by IamTheRealMike · · Score: 2

    Not sure what you're getting at, but the Azul collector is well known for pulling off apparently magical GC performance. They do it with a lot of very clever computer science that involves, amongst other things, modifications to the kernel. I believe they also used to use custom chips with extended instruction sets designed to interop well with their custom JVM. Not sure if they still do that. The result is that they can do things like GC a 20 gigabyte heap in a handful of milliseconds. GC doesn't have to suck.

  52. I'll volunteer... by fahrbot-bot · · Score: 1

    ... but no one has ever followed through on making open systems look and behave like an IBM mainframe, ...

    But I'll need a punch-card station and reader, build out my server room with a glass service window, hire a disinterested, snarky guy to retrieve printouts ... Or have IBM mainframes changed since my college days back in the late '80s?

    --
    It must have been something you assimilated. . . .
  53. You mean like wpars? by Anonymous Coward · · Score: 0

    back to the future.

  54. Re:Linux Cgroups are a good subset of this by davecb · · Score: 3, Informative

    The only thing mainframes have that Unix/Linux Resource Managers lack is "goal mode". I can't set a TPS target and have resources automatically allocated to stay at or above the target. I *can* create minimum guarantees for CPU, memory and I/O bandwidth on Linux, BSD and the Unixes. I just have to manage the performance myself, by changing the minimums.

    --
    davecb@spamcop.net
  55. This is a job for QNX by Animats · · Score: 1

    Consider trying QNX, the message-passing real time OS, for this. This is a message passing problem, and Linux doesn't do message passing well. QNX has a scheduler optimized for message passing. You should be able to handle the UDP front end and fan-out without any problems. You can give the front-end process a higher priority than the other processes, which should let you get all the UDP packets into the fan-out program without losing any. That's what real-time OSs are for.

    Trying to do anything high-performance with CPython's threads is hopeless. Watch this presentation on performance issues with Python's Global Interpreter Lock, Python has an internal scheduler, and it behaves very badly under load.

    So each Python process should be single-thread. Have as many as you need, set up to get work via MsgReceive and reply by MsgReply. Don't set them up as "resource managers".

    Python under QNX is being used by the robotics community, where real-time matters for some things, but not others.

    QNX - great technology, marketing operation from hell.

  56. Re:mainframe is old crap for geezers by viperidaenz · · Score: 1

    or you do both.

  57. No, it was a different problem, IBM, DEC Vax and N by Anonymous Coward · · Score: 0

    All had hugely complex, sophisticated and mind bendingly expensive hardware with complex built in diagnostics ... 4/5 of the machine could die and the sytem continue.

    There are now much cheaper ways of doing this, no 1965 360/30 are not calling

  58. Re:This obsession with everything in RAM needs to by Wootery · · Score: 2

    I believe they also used to use custom chips with extended instruction sets designed to interop well with their custom JVM. Not sure if they still do that.

    I could've sworn I'd read that they'd stopped with their hardware work, but I think I was wrong: Appendix A of this page gives the impression (though I can't see it explicitly stated) that they're still doing custom hardware, but their software will work on ordinary Intel/AMD chips as well.

    GC doesn't have to suck.

    Indeed. It's Sturgeon's Law, but I think the '90%' part might be too low in this case. Major interpreters/'VMs' - even the ones with optimised native-code compilation - have awful GCs. Up until quite recently, Mono was using the Boehm GC. The GCs in OCaml and D show no signs of improving any time soon.

  59. Re:This obsession with everything in RAM needs to by Bengie · · Score: 1

    Earlier 64-bit AMD CPUs did not have a 64bit atomic compare-and-swap instruction, so Microsoft limited their OSes at those times to 8TB. If only Microsoft supported compiling for your arch. Stupid closed source OS.

  60. Re:mainframe is old crap for geezers by Bengie · · Score: 1

    If you got a 2x increase in single threaded performance on a 100k node cluster, you could probably get rid of quite a bit more than 50k nodes because of scaling issues.

  61. Re:This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    If you're going to MARK/RELEASE why not malloc/free? Same goes for languages like Java - if you have to null a reference for it to get collected, how is that different from free() or delete? It's still a line of code you have to remember to put in your program at the right place.

  62. Unnecessary micromanagement. by Teunis · · Score: 1

    I think this person is still mad that linux doesn't feed out accurate memory usage ever since COW pages were introduced, let alone multiple efficiency steps since then.

    Not going to say that task management over a greater picture's a bad idea, but have to make it more coarse (per server, approximations) rather than fine if one is to still be able to effectively use many of Linux' performance improvements above IBM mainframe approaches. Mind, I've built a couple of systems like that for proprietary infrastructure.

  63. Please no by Tablizer · · Score: 1

    Don't include JCL, for heaven's sake.

  64. control costs performance. by Anonymous Coward · · Score: 0

    The problem is that the author is advocating adding a significant amount of logic (and cache footprint) to hot paths. This might have made a lot of sense if we were seeing a trend towards bigger single-image boxes (ie, mainframes). But the fact is that everything points to proliferation of microservices, sharded-distributed implementations, even just bigger boxes divided into VMs. Yes, big boxes still exist, but they're relatively special-purpose, and certainly shouldn't be dictating the direction of kernel/OS development.

  65. Re:This obsession with everything in RAM needs to by Anonymous Coward · · Score: 0

    If only Microsoft supported compiling for your arch.

    ...then what? You're using old Microsoft systems in areas where you need more than 8TB per process?

  66. Re:mainframe is old crap for geezers by viperidaenz · · Score: 1

    a) when was the last time you saw a single threaded node?
    b) it was obviously an illustrative example. don't be a dick.

  67. Re: This obsession with everything in RAM needs to by JMJimmy · · Score: 1

    wow, who knew boobs could be so controversial

      Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Insightful (+1).

    It is currently scored Normal (2).

    Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Informative (+1).

    It is currently scored Insightful (3).

      Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Interesting (+1).

    It is currently scored Insightful (4).

    Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Overrated (-1).

    It is currently scored Insightful (3).

    Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Funny (+1).

    It is currently scored Insightful (4).

    Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Overrated (-1).

    It is currently scored Insightful (3).

    Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Offtopic (-1).

    It is currently scored Insightful (2).

      Re: This obsession with everything in RAM needs to, posted to Linux Needs Resource Management For Complex Workloads, has been moderated Funny (+1).

    It is currently scored Funny (3).

  68. I thought this had been solved quite a while back by mikein08 · · Score: 1

    in VMS (you know, that semi-mainframe OS invented by DEC and now owned by HP).

  69. DevOps understanding of resource mgmt. is lagging by Anonymous Coward · · Score: 0

    While Linux has many ways to manage resources like cgroups, understanding how to use resource management lags in the Developers and Operations people. Understanding how an application or VM impacts performance of the host and how one application or VM interacts with the performance of another application or VM is a very complex subject.

    A lot of good work is being done in this area, but general understanding in the industry is lagging. Most of the time the solution is to throw more hardware or upgraded hardware at the problem. More memory, more cores, upgrade to SSDs, more NICs.

    To add my 2 cents, I think resource management has three layers in a virtualized cluster environment:
    Cluster management: Deciding which hosts have the resources to execute the job.
    Host management: Managing the resources on the actual hardware host
    Instance management: Instrumenting VM or container based applications to accurately forecast and report their resource requirements.

    The last one is the hard one. Nobody wants to run short of resources so they always ask for what they think are their maximum needs. Understanding the actual resource requirements of an application is very difficult. Writing an application to work within those resource allocations is also very difficult. Coordinating all of this on a cluster wide basis is even harder.

    rlh100

    Sorry for posting this as "Anonymous Coward" but the new Slashdot always drops my login information when I go to a specific article.

  70. Re:This obsession with everything in RAM needs to by Lisias · · Score: 1

    If you're going to MARK/RELEASE why not malloc/free? Same goes for languages like Java - if you have to null a reference for it to get collected, how is that different from free() or delete? It's still a line of code you have to remember to put in your program at the right place.

    For two reasons:

    1) It's easier to MARK the heap on the beginning of the task, using it as there's no tomorrow and then just RELEASE everything at once on the end. (nothing prevents you from deleting some pointers in the job to save memory).

    2) You avoid HEAP fragmentation, easing the memory management's life.

    Anyway, it appears to me that you missed the point. I was criticizing the pretense "no overhead garbage collector" from Azul.

    --
    Lisias@Earth.SolarSystem.OrionArm.MilkyWay.Local.Virgo.Universe.org