When I started out programming I went to the example programs, and ever since then when I want to learn something new I find example programs. Have your programmers write example programs that demonstrate what they expect their code to do. This can also prove useful for unit testing.
Yeah, I always love it when a person comes to a public forum, being purposely vague for whatever reason and asks a question that really depends on the unmentioned.
For example, if your embedded device runs embedded Windows, I don't really see the problem. On the other hand, a Windows GUI app really can't be ported to the vast majority of embedded devices out there.
Speaking of embedded Windows, the subsystem is going to affect whatever it is you write.
Given that you are talking about a "next set of managers", I figure it isn't really a flying leap to consider the possibility that you don't really have a specific embedded device in mind, or perhaps your people have looked at a WinCE device and said "Hey these things pretty much come with 16MB standard nowadays, wouldn't it be cool if we could get our application onto one of these babies!"
The fact is that your app isn't going to be occupying that device alone, so look at how the rest of the programming treats the device.
Clusters and sectors are not details of the file system, and I don't see how they could be read into what was meant when "details of the file system" was first said in the article.
What's with this "enjoy your broken functionality" stuff. It's not my functionality, it's the functionality the article asked for that supposedly distinguished a file system from a type system.
Besides I've found that all the stupid "Type managers" that were supposedly unique from file managers were just as broken.
The problem is that the article makes a distinction beween a file and type manager and doesn't actually draw a line between the two.
Type Managers need to remove the user from the details of the file system. They need to present the files in a hierarchy that best suits the files
"A hierarchy that best suits the files"... sounds like a file system to me. What "details of the file system" is the user removed from again?
Type Managers must provide the means for viewing and editing that file meta information
File managers do the same thing. Right click on a music file, click properties and then click Summary. The functionality varies a bit and is broken, but it's there.
You are the first non Anonymous Coward to post this, so you were the first I saw as I hide all AC's.
Now back to my point. You forgot to mention that he's clearly the antichrist as the antichrist is the only one levelheaded enough to be saying what he is saying.
In 32-bit mode you can still accees the upper and lower halves of the lower 16-bits as [abcd]h and [abcd]l. There is also a command to swap the lower 16 bits with the upper 16-bits
Have you ever heard the saying, "premature optimization is the root of all evil?"
Yes, and the first hit for"premature optimization is the root of all evil" demonstrates my point exactly. To paraphrase, a good software developer will have developed a feel for where performance issues will cause problems. Making it easy to hand optimize can only help one to develop the feel.
You say, "The point of C is to let the compiler do the stupid little architecture optimizations for you." and you also say "Quite often this sort of hand optimization makes code ugly, and why make your code ugly for no reason?"
C has a conflict of interest. If it has structures to allow you to write beautiful hand-optimization it loses its reason for existing, so guess what, it doesn't. Ugly hand-optimization is a fault of C, not of hand-optimization.
The statement, "There's no implied packing in C" is a bit inaccurate. It's more accurate to say that there's implied non-packing in C, which gets in the way of beginners trying to write to a file data format.
My main point is more like, "The C compiler is not an agent of my will. It has a mind of its own and isn't interested in telling me what's on its mind."
When I programmed in QuickBASIC, I could depend on the fact that a string was coded as a pointer and a length and a memory area described by the two. I could depend on the fact that arrays were coded as all of the dimension entries and a pointer to a memory area. I knew exactly how long a structure was, and if I wanted to write a routine that accepted a variable of a different structure than the one passed to it, I could do that simply by putting a different declaration of the routine in the calling file.
I would know, for example, that:
TYPE PalDef1 a as string *1 r as string *1 g as string *1 b as string *1 END TYPE
and
TYPE PalDef2 PalEnt as LONG END Type
were identical in size and I could call the same routine, passing one or the other with
DECLARE SUB RoutDef1 ALIAS "MyRout" (A as PalDef1)
and
DECLARE SUB RoutDef1 ALIAS "MyRout" (A as PalDef2)
Your string optimization routine
Because QB always maintained the length of a string, I knew that the fastest way to find an unsorted string was to:
A=LEN(SearchString) FOR I = 1 to NumOfStrings IF LEN(StringList(I))=A then exit FOR NEXT
Interesting how that doesn't come up as a potential solution for you in your string performance scenario.
Lets say you have a 16*32 element array. If you have a 512 element cache, you don't have to have the overhead of logic to group instructions on the data. If you have a 256 element cache, to execute efficiently you would have to employ logic to break your instructions into three pools. One for instruction groups that can work the first half of data and then the second, one for the reverse, and one for instruction groups that thrash the cache, unless of course you can demonstrate that that's always avoidable without adding extra logic. A 128 bit cache incurs even more logic. I would really like to see a demonstration that I am wrong, but this is the point where the other person either doesn't reply or tells me to read the documentation without telling me what the documentation is that I should be reading, or tells me to read something that is highly technical that requires me to have done prior reading, and I can't figure out what those books are.
Done properly they can achieve the same goals. Think of the annotations as ways to improve your algoritm, and you might begin to see what I'm getting at here.
I don't see any reason why the CPU can't see the register as both 1 32-bit register and 2 16-bit registers. After all, MMX reused the floating point registers.
The problem with writing portable code as things now stand is that it is oblivious to fitting things into cache, as it must remain cache-size independent. Since current tools are built with that sort of attitude about portable code, the designers refuse to implement features to allow the coder to code to cache sizes.
You shouldn't be hand optimizing at all unless you've determined that something is too large or too slow.
That's not true at all. There is nothing inherently wrong with hand-optimizing just because you feel like it.
You also say that size and speed are mutually exclusive. While that is generally the case on current x86 architectures, that doesn't always have to be the case. I don't know what causes the penalty for unaligned reads, but Intel could redo its architecture to grab 32 or 64 bits at a time from any base byte, but the current tools that blithly accept the current limitation and don't let coders explore how their code might be different if such a barrier was removed doesn't give Intel an incentive to do so, and that's one of my points.
...so you'd have to know your architecture well...
That's another one of my points. The current focus of portability results in programmers not knowing their hardware well. There's plenty of room for compilers to explain to the coder why the compiler thinks that a given optimization is best suited for the machine, but the current focus has the coder blindly accept whatever the compiler thinks is best.
The way it is implemented currently, it makes it so that code in no way reflects the computing archetecture. It's like having the abstraction of functional languages without the benefits of functional languages. One portability implementaion can result in code that is equally suitable for callee popped arguements and caller popped arguements, but if the algorithm favors leaving parameters on the stack for several procedures to access, well sorry, that functionality is not generic enough, so you can't specify that in your solution.
Also, this code is currently impossible: routine32bit{ do 32-bit stuff 16bit segment border start 32-bit land call to 16 bit code Jump overroutine16code1
routine16code1{ 16-bit routines 16-bit return } overroutine16code1: More32bitstuff re t32
preserveargs funct1(arg1, arg2,arg3)
preserveargs funct2(arg1, arg2,arg3)
preserveargs funct3(arg1, arg2,arg3)
flushargs funct4(arg1, arg2,arg3)
and be able to call any combination of funct1,2,3 in any order and finalize with 4 instead of depending on whether or not the compiler will figure out that doing this will result in faster code. It doesn't hurt for the compiler to pass speculations up to me, or even to generate potentially more efficient sample source code, but I want to have the final decision on the result of my code, and to have optimizations reflected in the code. That way, no matter what compiler I use, I can be sure to get the same optimizations even if one compiler guesses better than another. This also enables me to pass the code through different compilers and adopt the best optimization results from both into my code. Got a new platform? Run it through a compiler for that platform and have it explain to you why optimizations that were better on another platform are now not so good on the new platform. This helps you be knowledgeable about the different systems you work on which can be used to write better code.
Current ideas that fall into the current portability mindset has more to do with making the program know as little as possible about its environment. The result is a compiler munging your code and data structures into what it is perceived the processor is happiest with while getting the same apparent behavior across machines instead of switching the processor into different modes to deal with code that is more efficient one way or another.
16-bit code is code written with 16-bit addressing. 16-bit code is slow on processors designed to perform reads on 32-bit or 64-bit alignment boundaries. 32-bit code has 32-bit addressing. The Intel processors that do 32-bit addressing are designed to read memory 32-bits at a time on 32-bit alignments. For some reason, they can't read 32-bits from the second, third or fourth byte positions. I haven't progressed my understanding beyond this, but there are probably other mechanisms in play. 16-bit addresses means smaller code. Smaller code means less flushes to disk, more calcs per read, and less calcs per instruction.
A 16-bit memory access instruction can only access 16-bits of memory, period. It can't trash more than that. That's a rather trivial benefit, but it exists and if it exists there might still be others which would require experimentation. Here's a better one: The instruction is smaller so you can fit more instructions in RAM which means less flushes to disk. Attacking problems from a "every byte counts" perspective can help you decide what you want to do when every byte doesn't count. Besides, all things being equal, why not go for the smaller code size?
I used to code for QuickBasic. It didn't have routine pointers and a friend wrote a routine that checked the return address on the stack, scanned for the next CALL assembly instruction, put the pointer for the routine into DX:AX, popped the return address and jumped to the instruction after the call to the next routine. You could declare two names for a routine, one with no parameters and one with, and set the pretend call to the parameters name after the address finding routines. It seems that the tools today are setup to make such poking around impossible.
Oh, and there's also this code: SuperPUT replaces the innards of QB's PUT
That's my point. They're rare only because the tools to make code are designed to make them rare.
Converting a 32bit application to 64 but will mean nothing, unless it's a special purpose program that can take advantage of the expanded address space.
Accesses to hard drives make 64-bit addressing more useful. It's too early for exploration of 64-bit architecture to have yielded applications that run best in 64-bit mode.
In 32-bit protected mode, there are 32-bit segments and 16-bit segments. The determination of which is which is stored in a flag in a descriptor stored in a descriptor table. In 32-bit segments, 16-bit instructions require a prefix and in 16-bit segments, 32-bit instructions require a prefix. However, both segments can and do exist side-by-side.
You can fit two 16 bit integers in the space of a 32-bit register or any other memory device. Existing 16 bit code shows that you can code useful routines that fit in 64k. Also, it's not like 16-bit code and 32-bit code can't communicate with each other. 32-bit code can have several 16-bit routines within its space.
No. The Hitchhiker's Guide To The Galaxy posits the possibility of a finite probability drive generating an infinite probability drive, but the math seems to indicate that the probability of a probability drive of greater range than a given probability drive is just out of range of a given drive.
When I started out programming I went to the example programs, and ever since then when I want to learn something new I find example programs. Have your programmers write example programs that demonstrate what they expect their code to do. This can also prove useful for unit testing.
Yeah, I always love it when a person comes to a public forum, being purposely vague for whatever reason and asks a question that really depends on the unmentioned.
For example, if your embedded device runs embedded Windows, I don't really see the problem. On the other hand, a Windows GUI app really can't be ported to the vast majority of embedded devices out there.
Speaking of embedded Windows, the subsystem is going to affect whatever it is you write.
Given that you are talking about a "next set of managers", I figure it isn't really a flying leap to consider the possibility that you don't really have a specific embedded device in mind, or perhaps your people have looked at a WinCE device and said "Hey these things pretty much come with 16MB standard nowadays, wouldn't it be cool if we could get our application onto one of these babies!"
The fact is that your app isn't going to be occupying that device alone, so look at how the rest of the programming treats the device.
I'm bored now, so bye!
Real company with real potential to provide a real service... so come again?
Though if you bring up vaporware, you might have a point on that one.
Microsoft is clearly the Pusher robot here.
http://www.kilna.com/music/terrible_stairs
http://www.kilna.com/music/terrible_protected
I mean, come on... what rock have you been hiding under? These aren't even illegal free downloads!
Hmm... Maybe that's the problem.
Yeah, fashon these daya... I mean, can you believe somebody buying a black hat for $10m to celebrate a season wrap?
The study wasn't designed to measure the things that Microsoft is talking about. Microsoft is just using the Chewbacca defense.
Clusters and sectors are not details of the file system, and I don't see how they could be read into what was meant when "details of the file system" was first said in the article.
What's with this "enjoy your broken functionality" stuff. It's not my functionality, it's the functionality the article asked for that supposedly distinguished a file system from a type system.
Besides I've found that all the stupid "Type managers" that were supposedly unique from file managers were just as broken.
Dude! You were so not paying attention!
The problem is that the article makes a distinction beween a file and type manager and doesn't actually draw a line between the two.
Type Managers need to remove the user from the details of the file system. They need to present the files in a hierarchy that best suits the files
"A hierarchy that best suits the files"... sounds like a file system to me. What "details of the file system" is the user removed from again?
Type Managers must provide the means for viewing and editing that file meta information
File managers do the same thing. Right click on a music file, click properties and then click Summary. The functionality varies a bit and is broken, but it's there.
You are the first non Anonymous Coward to post this, so you were the first I saw as I hide all AC's.
Now back to my point. You forgot to mention that he's clearly the antichrist as the antichrist is the only one levelheaded enough to be saying what he is saying.
Wait... You mean this isn't a Senate sub-committe? What other kind of subcommitte is there?
In 32-bit mode you can still accees the upper and lower halves of the lower 16-bits as [abcd]h and [abcd]l. There is also a command to swap the lower 16 bits with the upper 16-bits
Yes, and the first hit for "premature optimization is the root of all evil" demonstrates my point exactly. To paraphrase, a good software developer will have developed a feel for where performance issues will cause problems. Making it easy to hand optimize can only help one to develop the feel.
You say, "The point of C is to let the compiler do the stupid little architecture optimizations for you." and you also say "Quite often this sort of hand optimization makes code ugly, and why make your code ugly for no reason?"
C has a conflict of interest. If it has structures to allow you to write beautiful hand-optimization it loses its reason for existing, so guess what, it doesn't. Ugly hand-optimization is a fault of C, not of hand-optimization.
The statement, "There's no implied packing in C" is a bit inaccurate. It's more accurate to say that there's implied non-packing in C, which gets in the way of beginners trying to write to a file data format.
My main point is more like, "The C compiler is not an agent of my will. It has a mind of its own and isn't interested in telling me what's on its mind."
When I programmed in QuickBASIC, I could depend on the fact that a string was coded as a pointer and a length and a memory area described by the two. I could depend on the fact that arrays were coded as all of the dimension entries and a pointer to a memory area. I knew exactly how long a structure was, and if I wanted to write a routine that accepted a variable of a different structure than the one passed to it, I could do that simply by putting a different declaration of the routine in the calling file.
I would know, for example, that:
were identical in size and I could call the same routine, passing one or the other with
DECLARE SUB RoutDef1 ALIAS "MyRout" (A as PalDef1)
and
DECLARE SUB RoutDef1 ALIAS "MyRout" (A as PalDef2)
Your string optimization routine
Because QB always maintained the length of a string, I knew that the fastest way to find an unsorted string was to:
Interesting how that doesn't come up as a potential solution for you in your string performance scenario.
Lets say you have a 16*32 element array. If you have a 512 element cache, you don't have to have the overhead of logic to group instructions on the data. If you have a 256 element cache, to execute efficiently you would have to employ logic to break your instructions into three pools. One for instruction groups that can work the first half of data and then the second, one for the reverse, and one for instruction groups that thrash the cache, unless of course you can demonstrate that that's always avoidable without adding extra logic. A 128 bit cache incurs even more logic. I would really like to see a demonstration that I am wrong, but this is the point where the other person either doesn't reply or tells me to read the documentation without telling me what the documentation is that I should be reading, or tells me to read something that is highly technical that requires me to have done prior reading, and I can't figure out what those books are.
Done properly they can achieve the same goals. Think of the annotations as ways to improve your algoritm, and you might begin to see what I'm getting at here.
I don't see any reason why the CPU can't see the register as both 1 32-bit register and 2 16-bit registers. After all, MMX reused the floating point registers.
The problem with writing portable code as things now stand is that it is oblivious to fitting things into cache, as it must remain cache-size independent. Since current tools are built with that sort of attitude about portable code, the designers refuse to implement features to allow the coder to code to cache sizes.
You shouldn't be hand optimizing at all unless you've determined that something is too large or too slow.
...so you'd have to know your architecture well...
That's not true at all. There is nothing inherently wrong with hand-optimizing just because you feel like it.
You also say that size and speed are mutually exclusive. While that is generally the case on current x86 architectures, that doesn't always have to be the case. I don't know what causes the penalty for unaligned reads, but Intel could redo its architecture to grab 32 or 64 bits at a time from any base byte, but the current tools that blithly accept the current limitation and don't let coders explore how their code might be different if such a barrier was removed doesn't give Intel an incentive to do so, and that's one of my points.
That's another one of my points. The current focus of portability results in programmers not knowing their hardware well. There's plenty of room for compilers to explain to the coder why the compiler thinks that a given optimization is best suited for the machine, but the current focus has the coder blindly accept whatever the compiler thinks is best.
The way it is implemented currently, it makes it so that code in no way reflects the computing archetecture. It's like having the abstraction of functional languages without the benefits of functional languages. One portability implementaion can result in code that is equally suitable for callee popped arguements and caller popped arguements, but if the algorithm favors leaving parameters on the stack for several procedures to access, well sorry, that functionality is not generic enough, so you can't specify that in your solution.
e t32
Also, this code is currently impossible:
routine32bit{
do 32-bit stuff
16bit segment border start
32-bit land call to 16 bit code
Jump overroutine16code1
routine16code1{
16-bit routines
16-bit return
}
overroutine16code1:
More32bitstuff
r
I want to be able to tell the compiler:
preserveargs funct1(arg1, arg2,arg3)
preserveargs funct2(arg1, arg2,arg3)
preserveargs funct3(arg1, arg2,arg3)
flushargs funct4(arg1, arg2,arg3)
and be able to call any combination of funct1,2,3 in any order and finalize with 4 instead of depending on whether or not the compiler will figure out that doing this will result in faster code.
It doesn't hurt for the compiler to pass speculations up to me, or even to generate potentially more efficient sample source code, but I want to have the final decision on the result of my code, and to have optimizations reflected in the code. That way, no matter what compiler I use, I can be sure to get the same optimizations even if one compiler guesses better than another. This also enables me to pass the code through different compilers and adopt the best optimization results from both into my code. Got a new platform? Run it through a compiler for that platform and have it explain to you why optimizations that were better on another platform are now not so good on the new platform. This helps you be knowledgeable about the different systems you work on which can be used to write better code.
Current ideas that fall into the current portability mindset has more to do with making the program know as little as possible about its environment. The result is a compiler munging your code and data structures into what it is perceived the processor is happiest with while getting the same apparent behavior across machines instead of switching the processor into different modes to deal with code that is more efficient one way or another.
16-bit code is code written with 16-bit addressing. 16-bit code is slow on processors designed to perform reads on 32-bit or 64-bit alignment boundaries. 32-bit code has 32-bit addressing. The Intel processors that do 32-bit addressing are designed to read memory 32-bits at a time on 32-bit alignments. For some reason, they can't read 32-bits from the second, third or fourth byte positions. I haven't progressed my understanding beyond this, but there are probably other mechanisms in play. 16-bit addresses means smaller code. Smaller code means less flushes to disk, more calcs per read, and less calcs per instruction.
A 16-bit memory access instruction can only access 16-bits of memory, period. It can't trash more than that. That's a rather trivial benefit, but it exists and if it exists there might still be others which would require experimentation. Here's a better one: The instruction is smaller so you can fit more instructions in RAM which means less flushes to disk. Attacking problems from a "every byte counts" perspective can help you decide what you want to do when every byte doesn't count. Besides, all things being equal, why not go for the smaller code size?
I used to code for QuickBasic. It didn't have routine pointers and a friend wrote a routine that checked the return address on the stack, scanned for the next CALL assembly instruction, put the pointer for the routine into DX:AX, popped the return address and jumped to the instruction after the call to the next routine. You could declare two names for a routine, one with no parameters and one with, and set the pretend call to the parameters name after the address finding routines. It seems that the tools today are setup to make such poking around impossible.
Oh, and there's also this code: SuperPUT replaces the innards of QB's PUT
a combination of 16/32 bit is amazingly rare.
That's my point. They're rare only because the tools to make code are designed to make them rare.
Converting a 32bit application to 64 but will mean nothing, unless it's a special purpose program that can take advantage of the expanded address space.
Accesses to hard drives make 64-bit addressing more useful. It's too early for exploration of 64-bit architecture to have yielded applications that run best in 64-bit mode.
In 32-bit protected mode, there are 32-bit segments and 16-bit segments. The determination of which is which is stored in a flag in a descriptor stored in a descriptor table. In 32-bit segments, 16-bit instructions require a prefix and in 16-bit segments, 32-bit instructions require a prefix. However, both segments can and do exist side-by-side.
You can fit two 16 bit integers in the space of a 32-bit register or any other memory device. Existing 16 bit code shows that you can code useful routines that fit in 64k. Also, it's not like 16-bit code and 32-bit code can't communicate with each other. 32-bit code can have several 16-bit routines within its space.
No. The Hitchhiker's Guide To The Galaxy posits the possibility of a finite probability drive generating an infinite probability drive, but the math seems to indicate that the probability of a probability drive of greater range than a given probability drive is just out of range of a given drive.