Physical separation of modules is a very good idea. It helps to contain damage when one part fails, makes the app easier to upgrade piecewise, and forces you to think hard about interfaces.
Your first attempts at ultra-reliability will fail. But if you encapsulate well, with clean interfaces, you can make the individual modules ever more reliable over time.
Peers of a failing module should detect the failure without collapsing, of course. But consider centralizing the start/stop/restart of all modules in a process manager. Peers detecting a failure report the failure to the PM, but do not take action themselves.
I think you have an implicit assumption in "Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries". It's the word "the" - why only one language? Ask yourself if only certain parts of your app are subject to the constraints you cite. Maybe some parts are better suited to a scripting language. I don't mean to preach language, but I like Python so I'll use that as an example: It can interact with C++ by network protocols between separate processes, or within the same process through available APIs; it's good at cross-OS, unless you intentionally use OS-specific libraries; and you could code some parts much faster, leaving you more time to think hard about your interfaces.
Use message queues only if you need the asynchronous behavior. If synchronous request/reply in enough, skip the added subsystem.
For any inter-process interfaces where efficiency is not a dominant concern, consider text protocols. Your human intelligence is good at detecting errors in text, so this makes the interactions between modules more transparent. It's also handy to test an interface by typing at it.
If you go with remote calls between modules, consider whether they need to be object-oriented. Old-fashioned Sun RPCs still work fine, and they're simpler. Object-oriented design is great within a process; but stateless protocols are often best between processes.
Treat shooting a module as a primary use case. It's important for isolating failures of course, and also for partial upgrades to a running system.
Finally, have a single point of truth for everything the system must know. It's OK to distribute copies of data when you must, but be clear on what module is authoritative for every piece of data.
Physical separation of modules is a very good idea. It helps to contain damage when one part fails, makes the app easier to upgrade piecewise, and forces you to think hard about interfaces. Your first attempts at ultra-reliability will fail. But if you encapsulate well, with clean interfaces, you can make the individual modules ever more reliable over time. Peers of a failing module should detect the failure without collapsing, of course. But consider centralizing the start/stop/restart of all modules in a process manager. Peers detecting a failure report the failure to the PM, but do not take action themselves. I think you have an implicit assumption in "Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries". It's the word "the" - why only one language? Ask yourself if only certain parts of your app are subject to the constraints you cite. Maybe some parts are better suited to a scripting language. I don't mean to preach language, but I like Python so I'll use that as an example: It can interact with C++ by network protocols between separate processes, or within the same process through available APIs; it's good at cross-OS, unless you intentionally use OS-specific libraries; and you could code some parts much faster, leaving you more time to think hard about your interfaces. Use message queues only if you need the asynchronous behavior. If synchronous request/reply in enough, skip the added subsystem. For any inter-process interfaces where efficiency is not a dominant concern, consider text protocols. Your human intelligence is good at detecting errors in text, so this makes the interactions between modules more transparent. It's also handy to test an interface by typing at it. If you go with remote calls between modules, consider whether they need to be object-oriented. Old-fashioned Sun RPCs still work fine, and they're simpler. Object-oriented design is great within a process; but stateless protocols are often best between processes. Treat shooting a module as a primary use case. It's important for isolating failures of course, and also for partial upgrades to a running system. Finally, have a single point of truth for everything the system must know. It's OK to distribute copies of data when you must, but be clear on what module is authoritative for every piece of data.