Compiler technology has improved over the years. So much so that the “wisdom on the street” is that using a compiled language, such as C, is the norm for the overwhelming majority of embedded code that is placed into production systems these days. I have little doubt that most of this sentiment is true, but I suspect the “last mile” challenge for compilers is far from being solved – which prevents compiled languages from completely removing the need for developers that are expert at assembly language programming.
In this case, I think the largest last mile candidate for compilers is managing and allocating memory outside of the processor’s register space. This is a critical distinction because most processors, except the very small and slower ones, do not provide a flat memory space where every memory access possible takes a single clock cycle to complete. The register file, level 1 cache, and tightly coupled memories represent the fastest memory on most processors – and those memories represent the smallest portion of the memory subsystem. The majority of a system’s memory is implemented in slower and less expensive circuits – which when used indiscriminately, can introduce latency and delays when executing program code.
The largest reason for using cache in a system is to hide as much of the latency in the memory accesses as possible so as to be able to keep the processor core from stalling. If there was no time cost for accessing anywhere in memory, there would be no need to use a cache.
I have not seen any standard mechanism in compiled languages to layout and allocate an application’s storage elements into a memory hierarchy. One problem is that such a mechanism would make the code less portable – but maybe we are reaching a point in compiler technology where that type of portability should be segmented away from code portability. Program code could consist of a portable code portion and a target-specific portion that enables a developer to tell a compiler and linker how to organize the entire memory subsystem.
A possible result of this type of separation is the appearance of many more tools that actually help developers focus on the memory architecture and find the optimum way to organize it for a specific application. Additional tools might arise that would enable developers to develop application-specific policies for managing the memory subsystem in the presence of other applications.
The production alternate at this time seems to be systems that either accept the consequences of sub-optimally automated memory allocation or to impose policies that prevent loading applications onto the system that have not been run through a certification process that makes sure each program behaves to some set of memory usage rules. Think of running Flash programs on the iPhone (I think the issue of Flash on these devices is driven more by memory issues – which affect system reliability – than by dislike of another company).
Assembly language programming seems to continue to reign supreme for time sensitive portions of code that rely on using a processor’s specialized circuits in an esoteric fashion and/or rely on an intimate knowledge of how to organize the storage of data within the target’s memory architecture to extract the optimum performance from the system from a time and/or energy perspective. Is this an accurate assessment? Is assembly language programming a dying skillset? Are you still using assembly language programming in your production systems? If so, in what capacity?