mughi · Slashdot Mirror

← Back to Users

User: mughi

mughi's activity in the archive.

Stories: 0
Comments: 178
First seen: 1999-10-14
Last seen: 2006-06-24
Profile: (view on slashdot.org)

Comments · 178

Re:Use UTF-8 on Li18nux Effort Announced · 1999-10-15 07:50 · Score: 1

Not trolling, but discussing. That's always good. Plus you seem well informed, which is always nice.

Basically, if you go wchar_t or 16-bit (not always the same), the algorithms are simpler and the code itself is smaller. The only thing that should be any larger are static strings and memory use. Then again, you shouldn't have a lot of fixed strings in your program to begin with. :-)

If you went to purely UTF-8 internally instead, you'd end up with the problem of needing more complex handling code, and thus a larger and slower program. Also, if you use wchar_t then you can easily go to and from whatever the local encoding is. With UTF-8 internally you'd have to either write code for all that yourself, or do a conversion from local encoding to wchar_t and then from that to UTF-8.

strlen() is quite often used to count characters. There are times when this is the proper use, so wcslen() is used instead. Just as fast, or maybe even faster as on Linux wchar_t is int which by the C language specs is what is most efficient for the processor.

Your caution about combining characters is a good point. There still are issues, but in practice your code should encounter those fairly rarely, and thus should still execute quickly. But still, this needs to be accounted for. (And often just converting those to the pre-composed form is a viable option)

From working with stuff over the last 7 years including Chinese, Japanese and Korean, I can say in the stuff I've done and seen, Unicode makes it a whole lot easier. No, it doesn't solve everything, but it does make a lot of things much easier.

As always, the proper thing to do is carefully anylize your needs and then implement what makes sense for the specific application. I just recommend trying to stick with more straightforward code and only 'optimize' when needed as determined by actual performance measurements.
Re:i18n == international?! Please! on Li18nux Effort Announced · 1999-10-15 03:34 · Score: 1

In other words, I don't think i18n or L10N are well-established, but that's just me.

Well, for those doing any work at all in the field, those terms are very well established. Since the project is initially to coordinate all those working in the field to be unified on Linux, it seems like a very good choice of names. See their charter for why I have that impression.

For The Java stuff, try the Java documentation itself. http://java.sun.com/products/jdk/1.1/docs/guide/in tl/index.html
Re:Use UTF-8 on Li18nux Effort Announced · 1999-10-14 23:50 · Score: 1

Step 1: good

Step 2-4: Oh my. Big problems there. Major over-simplification

Step 5: Replace compile-time static sizeof() with run-time dynamic GetLengthOfNextUtf8CharInThisString() and get faster and smaller??? Easier to debug??? Replace pString++ and pString-- with pString = GetToNextUtf8CharInThisString( pString ) and pString = GetToPreviousUtf8String( pString ) and get smaller and faster programs that are easier to debug???

Just do proper translation to and from Unicode ( UTF-16) on entry and exit from your main program and then you can suddenly work in multiple locales via mbsrtowcs() and the like.

Yes, use UTF-8, but use it wisely.