C-language programs are usually developed by teams of engineers who are often geographically dispersed, leading to redundant code and inconsistent variable declarations between modules. Traditional compilation technology compiles each module separately with no regard for what goes on in the other modules and with no information about how pointers, stacks, variables and functions are used throughout the whole program. Traditional compilers tend to over-allocate memory space for pointers, stacks and registers. They are unable to fully exploit the wide variety of memory maps, and register and stack configurations available in today’s micro-controllers. All too often, the programmer must resort to manual optimizations that compromise portability (Figure 1).
The engineer, too, must often specify an MCU with more resources and higher cost than are really required. In a worst-case situation, the design may have to be migrated from an 8-bit MCU to a more complex, expensive, and power-hungry 32-bit device because an otherwise ideal micro-controller does not have enough memory resources to accommodate the final program. Increasing on-chip flash from 8 KBytes to 16 KBytes adds 20 cents to 50 cents to the BOM.
“Omniscient Code Generation”
A new compiler technology, called Omniscient Code Generation (OCG), cuts the code footprint by as much as half and frees up 10% to 15% of SRAM resources.
Rather than relying on the linker to uncover errors between independently compiled modules, an OCG compiler completes the initial stages of compilation for each module separately. Then it compiles each module to an intermediate code file that represents a more abstract view of each module before producing any machine instructions or allocating any registers (Figure 2).
The intermediate code files generated by the OCG compiler are loaded into a call graph structure. Library functions referenced by the program are also located and extracted (Figure 3).
Traditional compilers assume any function being called will use all possible registers according to set calling conventions, resulting in sub-optimal use of registers that wastes stack memory and increases latency. The OCG code generator uses the Call Graph to look at all the program modules and identify any functions that are called recursively or re-entrantly from main-line code and interrupt functions. It allocates dynamic stack space for storage of local variables to prevent reentrant function calls from overwriting existing data and searches the call graph for any functions that are never called and removes them.
Omniscient Code Generation technology supports reentrant function calls in MCUs that do not have hardware stacks by building separate call graphs for both main-line and interrupt code. Any functions that appear in more than one call graph can be replicated, each with its own local variable area. Using this technique, reentrancy can be implemented without a conventional stack.
For non-reentrant, non-recursive functions, the OCG compiler generates optimally sized function stacks with exactly enough memory to accommodate each function’s maximum depth. Because the call graph for all functions has already been determined, functions which are executed at different times can share the same SRAM space for their static compiled stack. This feature reduces stack space to the absolute minimum required, frees up more SRAM for data, and reduces the likelihood of stack overflows that occur when a dynamic stack expands into the data variables’ space in the SRAM.
At the end of this optimization, compiled stack space is allocated before any machine code has been generated. OCG knows exactly how big the stack needs to be and where it is located before it generates any code.
Pointer Reference Graph Once the stacks have been optimized, an OCG algorithm uses each instance of a variable having its address taken, plus each instance of an assignment of a pointer value to a pointer (either directly, via function return, function parameter passing, or indirectly via another pointer) and builds a “pointer reference graph” that identifies all objects that can possibly be referenced by each pointer. This information is used to determine which memory space each pointer will be required to access.
The compiler identifies conflicting declarations of the same object from different modules and issues an error message. Variables that are never referenced are deleted.
Once the set of used variables and pointers is complete, OCG allocates memory of the compiled stack based on the call graph, and allocates global and static variables based on the pointer reference graph. In architectures with banked RAM, the variables accessed most often in the program are allocated to the memory spaces that are cheapest to access (e.g. internal, directly addressable RAM in an 8051, rather than external RAM which requires pointer).
The OCG code generator also sizes and encodes each pointer in a way that is optimally efficient for the target architecture and automatically assigns them an optimized set of address spaces (Figure 4).
Non-contiguous Memory Addresses
Standard C assumes a single linear address space. Many embedded processors have complex, non-linear memory spaces, often with different address widths. Conventional compilers frequently assign variables to pages that require cycle-intensive access bank selection instructions, particularly during interrupts. Since the OCG compiler knows which variables will be used in interrupt routines, it assigns them to the appropriate memory space, relieving the programmer of assembly programming often required to accommodate non-linear memory architectures (Figure 5).
Bottom-up Code Generation
Once the pointers and variables are assigned memory space, machine code generation begins at the bottom of the call graph, starting with those functions that do not call any other functions, which may be automatically in-lined. Code generation then proceeds up the call graph so that for each function the code generator knows exactly which functions are called by the current function, without rigid calling conventions. The code generator knows exactly which registers and other resources are available at each point. Calling conventions can be tailored to the register usage and argument type of a function, instead of following a blind set pattern.
Customized Library Functions
The OCG generator can analyze and optimize predefined C-code libraries used for routine functions, such as sprintf() and printf() functions, used to format text strings or output. These functions can take up 5 KBytes of memory, even though most programs need only a fraction of the available options. The OCG code generator analyzes all the format strings in the program, determines the exact subset of format specifiers and modifiers required, and creates a customized version that can be as small as 20 or 30 Bytes for simple string copying.
Newer embedded compilers provide canned startup code that clears uninitialized static and global variables to zero. This code is often much larger than necessary. For example, if the program has no uninitialized global variables, there is no need to include code to clear them. The omniscient code generator knows the state of all global variables and can create the smallest possible runtime startup code. In a minimal case, the startup code may be completely empty.
No Hand Crafting
OCG compilers perform an analysis of the whole program every time the program is recompiled and makes optimal decisions about memory placement, pointer scoping, and stack allocation, without any special directives or language extensions from the programmer. OCG allows highly optimized embedded C programs to be written without resorting to hand-crafted assembly code.
OCG compilers are currently available for Microchip’s PIC10/12/16, PIC-18 and Cypress’ PSoC Mixed Signal Array, and additional OCG-enabled compilers are planned for other 8-, 16- and 32-bit MCUs and DSCs.
When compared to conventional compilers, OCG code size is frequently 40% to 50% percent smaller. By combining all program modules into one large program, the compiler can identify and consolidate re-entrant code, identify inconsistent variable declarations between modules, eliminate redundant code that may exist in more than one module and optimize all the stacks, registers, pointers and memories to exploit the unique advantages of the MCU architecture.
Denser code results in faster execution and provides a cost advantage because it allows programmers to use devices with smaller flash and SRAM memories. It also extends the life of existing 8- and 16-bit architectures by allowing them to accommodate up to twice as much code and by freeing up SRAM resources. Equally important, it eliminates the need for hand-crafted assembly code.