I regularly use a Tilera chip which has 64 cores on a single die, each running Linux, with a common (distributed) L2 cache.
For most tasks there are not any particular scalability issues, but that is because my tasks are doing a lot of user space code.
I think the paper is concentrating on tasks which are dominated by kernel code, and in these cases they seem to have made a useful contribution.
The previous versions have had 36 and 64 cores arranged in squares.
The next power of 2 that is also a square would be 256 cores but this is probably getting a bit big.
I regularly use a Tilera chip which has 64 cores on a single die, each running Linux, with a common (distributed) L2 cache. For most tasks there are not any particular scalability issues, but that is because my tasks are doing a lot of user space code. I think the paper is concentrating on tasks which are dominated by kernel code, and in these cases they seem to have made a useful contribution.
The previous versions have had 36 and 64 cores arranged in squares. The next power of 2 that is also a square would be 256 cores but this is probably getting a bit big.