I wonder whether you could do this using a blind source separation algorithm (like independent components analysis, for instance). The statistics of the background noise will be quite different to those of the speaker. Once phones get powerful enough to embed some serious computing in them, you could just have the receiving phone deconvolve the signal into the different sources. The user at the other end could scan through the different sources to try to find the person they're talking to, rather than the train noises. Sounds easy, but I imagine the technical difficulties would be pretty serious.
I wonder whether you could do this using a blind source separation algorithm (like independent components analysis, for instance). The statistics of the background noise will be quite different to those of the speaker. Once phones get powerful enough to embed some serious computing in them, you could just have the receiving phone deconvolve the signal into the different sources. The user at the other end could scan through the different sources to try to find the person they're talking to, rather than the train noises. Sounds easy, but I imagine the technical difficulties would be pretty serious.