NetHack Development Team Polls Community For Advice On Unicode

← Back to Stories (view on slashdot.org)

NetHack Development Team Polls Community For Advice On Unicode

Posted by timothy on Sunday January 11, 2015 @04:10AM from the pressing-issues dept.

An anonymous reader writes After years of relative silence, the development team behind the classic roguelike game NetHack has posteda question: going forward, what internal representation should the NetHack core use for Unicode characters? UTF8? UTF32? Something else? (See also: NH4 blog, reddit. Also, yes, I have verified that the question authentically comes from the NetHack dev team.)

3 of 165 comments (clear)

Min score:

Reason:

Sort:

The answer is... by Anonymous Coward · 2015-01-11 04:13 · Score: 5, Insightful

utf-8
UTF-8 by Ark42 · 2015-01-11 04:39 · Score: 4, Insightful

The answer is UTF-8. It's pretty much going to be the de-facto character set now. It has backwards compatibility with ASCII, and can easily be extended in the future to support possible U+200000 - U+7FFFFFFF codepoints, as the original UTF-8 specification used to include that anyway.
Any important point is to not mess things up and end up with CESU-8 like MySQL did. There are completely valid 4-byte UTF-8 characters, so don't think of it as some special alternate UTF-8 by artificially capping UTF-8 at a max of 3 bytes per character.

--
Morphing Software
Go with the majority by namgge · 2015-01-11 04:54 · Score: 4, Insightful

In my experience, if you are upgrading legacy code that assumed straightforward ascii then utf8 is the
way to go. It was invented for the purpose by someone very smart (Ken Thompson). If there were a 'Neatest Hacks of All Time' competition utf8 would be my nomination.
The only real issues I've encountered are the usual ones of comparisons between equivalent characters and defining collating order. These stop being a problem (or more precisely 'your' problem) once you abandon the idea of rolling your own and use a decent utf8 string library.