copyright 1999 by Daniel Webb (_email)
AnyC comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it under certain conditions; see gnu.txt for details. There is no warranty, expressed or implied, with this software. This program in its current state is intended for hobbyists. It is not to be used in any situation where property could be damaged by an error in this software. It is definitely not to be used in any situation where safety depends on this program.
AnyC is a C compiler that generates assembler code for 8-bit microcontrollers
Any system with the GNU C compiler should be able to compile AnyC.
This early version of AnyC is intended for programmers only, since it doesn't generate code that is very useful yet. Please email me (_email) and let me know what should be added to the documentation next.
This c compiler was developed with the following priorities, ranked in order:
This means portability of the compiler to any computer which can run the GNU c compiler, and easy retargetting of the assembler output for any microcontroller (especially 8 bit RISC microcontrollers).
AnyC relies heavily on ``internal library'' routines nested up to 6 levels deep to decrease code size.
I want to keep the source simple so that other people can change the compiler if they need to without too much hassle. If you have ever looked at the GNU gcc source, you know what I am talking about.
Since this is the lowest priority, the code output by this compiler will be slower than commercial c compilers.
All my developement tools are from GNU (GNU make, GNU gcc) or have a GNU-type license (cextract). The program cextract is needed if you want to change the program and re-make. I have included it in the directory cextract.
If you want to change the parser (gra1.y), you will need GNU bison (a yacc clone).
If you want to change the lexer (lexer.l), you will need GNU flex (a lex clone).
If you are using Windows, you can get the GNU tools (gcc, bison, flex) from www.delorie.com.
These are the versions of the GNU tools I am using:
Before compiling code with AnyC, make sure it compiles with gcc with no errors! There is almost no error checking in AnyC right now.
The only option is to run AnyC with the input filename and output filename as the only arguments:
anyc myfile.c output.asm
Try the sample .c file included with the package: input.c
anyc input.c output.asm
will produce an output file output.asm and a library file, lib.asm.
By the way, this program is slow as molasses! Guess my coding isn't very sophisticated.
Also, if you want to run the program using stdin and stdout, use:
More documentation for general users will come later.
There are many layers of abstraction in AnyC. From highest to lowest they are:
The highest level of abstraction, the grammar describes the logical structure of a C file, and how to respond to each part. The C grammar is implemented by the bison1 parser, whose input file is gra1.y. The documentation for bison is at http://www.gnu.org/manual/manual.html.
The parser calls the lexer, and the lexer then returns the next token in the C input file. Each time the parser needs the next token, it calls the lexer. The lexer is implemented by the flex2 lexer. The documentation for flex is at http://www.gnu.org/manual/manual.html.
These routines are mostly devoted to taking care of the expression stack and performing operations on the expression stack. The expression stack routines found here call the assembler libraries in the next section to perform memory moves and math operations. Type conversion routines are also here.
The assembler libraries provide a level of abstraction between raw assembler instructions and instructions available in C. This is done because the 8-bit microcontrollers can't do even a fraction of the things required for a full ANSI C implementation. To get around this, the compiler calls these assembler ``internal library'' functions to get things done. For example, adding two double precision number together is obviously not something an 8-bit microcontroller can do. There are two ways to accomplish the addition:
The lowest layer is the hardware layer, implemented in the hardware.c file. This file contains all the functions which generate assembler code. These functions are the ONLY place in the entire package that assembler code is generated! To retarget the compiler, just switch the assembler statements in each of the functions in hardware.c. Please send me any other ports you create, so I can include them in the collection.
This is the memory that is accounted for at compile time:
The heap is where dynamic memory allocation comes from. Not implemented yet.
AnyC uses three ``registers'' which are large enough for the largest data type used in a given program. I call them registers because the internal library functions operate on them as if they were registers. All math and comparison operations occur in these registers.
The following are stored on the stack:
One thing that is blantantly obvious from the first time you look at 8-bit microcontrollers is this: they can't do much! The least expensive ones can basically only add two 8-bit numbers or do extremely simple comparisons. However, a useful implementation of ANSI C has much more complicated requirements, such as double-precision multiplication and division. Luckily, any mathematical operation can be reduced to operations an 8-bit controller can handle. Because this often requires many simple 8-bit instructions to accomplish, most operations that are not native to the microcontroller are separated into subroutines. These subroutines are contained in the files lib_mem.c and lib_math.c. Often, these subroutines are nested. An example is integer multiplication: the multiplication subroutine in lib_math.c calls other subroutines such as the integer addition subroutine to accomplish the multiplication. I call these routines ``internal library'' routines to differentiate them from the standard C library. These internal libraries are hidden behind the scenes in the compiler and are not accessible to user programs.
However, since microcontroller memory is so small, you would prefer to only generate a library of subroutines that will actually be used in a given program, instead of creating a fixed library that contains all possible operations. Double precision division is a lot of code, and you don't want to include it if you don't use it. My solution is to call all internal library operations through the function caller(). caller() is given the internal library function and its arguments. It keeps track of which internal library functions have been called, and then during the second pass of the compile, generates the library so that only the needed functions are in it.
Internal library functions can be called in the following contexts:
The ss_expression_stack is a structure which contains information about expression evaluations which will be done. ss_expression_stack is a list of symbols along with their location.
In the parser, each symbol is pushed to the expression stack as it is encountered in an expression. The operators are called by calling evaluate1() and evaluate2() for 1 and 2 argument operators, respectively. The parser basically converts a standard expression to reverse polish expression. Reverse polish is an easier system to work with when trying to optimize expressions and manage memory movement. If you haven't used it before, it is just basically that you load the arguments to an operator *before* the operator. For example, to add "1+2", the reverse polish expression is "1 2 +". A reverse polish expression is read left to right, and the operators as performed as they are encountered (as opposed to conventional expressions where "1+2*3" means do the 2*3 first).
All type conversions are done in MATHB - conversions from one whole number type to a larger whole number type: Take most significant bit in variable in copy left until big enough - conversions from a whole number type to a smaller whole number type: Throw away extra bytes
First I should mention the details about my makefile:
I put all interface in the .c files. I do this by extracting the symbol interface and the function interface separately. The symbol interface is accomplished by putting all symbols which need to be interfaced within an "interface" block, and the rest of the program outside it:
extern int global1;
extern char global2;
#elseif /* __INTERFACE__ */
** local variables and functions here
#endif /* __INTERFACE__ */
Then, to get the function interface out of the .c file, I use a program called cextract, by Adam Bryant. This program extracts function prototypes from your .c code and puts them in a file. To prevent it from making prototypes for all the functions which are #included in my .c program, I put the following around my #includes:
The cextract program is included in the AnyC package exactly as I have found it in the directory /cextract. The configuration files needed to use cextract with AnyC are included in the Anyc distribution as cextract.cfg, and cextrac2.cfg, in the main directory. The makefile settings for the location of cextract must be changed depending on where you put cextract. I couldn't get it to work on Windows without putting the cextract executable and configuration files all in the AnyC directory. Also, the installation files included with cextract didn't work at all for me. However, I compiled the program with 'gcc -o cextract.exe main.c io.c parse.c' and it runs fine.
The compiler is divided into several layers:
universal data system libraries
(u_data.c, u_list.c, u_stack.c, u_string.c, u_mem.c)
This is a library I wrote to take care of basic data structures I needed, like stacks, lists, and strings.
See u_data.doc for info on this part of the system.
A 2-dimension linked list holding all the symbols in the program. Global variables, function symbols, and struct symbols are appended FORWARD from the origin of the list. Function parameters and local variables are appended DOWN from the function symbol in which they belong.
When parsing the program, the parser pushes declarations onto this stack. At some point (before the symbols are needed), a routine in gra_util.c pops the symbols off ss_pending stack and loads them onto ll_symbol_table in the correct places.
When parsing an expression, symbols are pushed to ss_expression_stack as the parser reads them. When the parser decides to reduce and perform a math operation, ss_expression_stack is used to see what symbols to operate on.
symbol_type* - defined in symbols.c
This is a struct which holds all the info on each symbol in the program. I use this for almost everything.
library_type* - defined in caller.c
a struct which holds info on a library routine which has been used.
Communication between lexer and parser:
The lexer is called by the parser and returns a token. In the case of a symbol, the lexer also sets gg_lexer_symbol. The parser can then run the routine recognize_symbol(), which looks for the symbol in the symbol table.
Alphabetical list of .c files
The function caller() is very important in AnyC. All library functions in lib_mem.c and lib_math.c are called indirectly, using caller(). The parameters to caller() are the same as the parameters to the math or memory routine, except that the name of the library routine is inserted at the beginning of the arguments list, and the last argument is NULL.
The other functions in caller.c keep track of the library routines which have been called, and generate an assembler library for your program.
This is the grammar file. It is used by the parser-generater program bison, which is a standard GNU tool.
The following command is used to make the parser program from gram.y:
bison -d gra1.y
All the utility functions used by gra1.y parser. Symbol table generation, expression evaluation, etc.
Keeps track of assembler labels.
This is the lexer file. It is used by the lexer-generater program flex, which is a GNU tool. A lexer is a tool that performs pattern-matching on text. It converts patterns in the c code into "tokens", which it then passes on the the parser. An example:
// c code:
test = 0;
// What the lexer sees and sends to the parser:
INT SYMBOL SEMICOLON SYMBOL EQUALS CONSTANT SEMICOLON
This is just an example for illustration. The parser then takes this information and figures out how it fits into the c grammar. The following command is used to make the lexer program from lexer.y:
Memory library routines
Math library routines
contains main() and a few #defines. Mostly just the framework for calling the parser.
All the global variables are declared and defined here. Several routines to show register and type sizes. Also, the C kernel and all the C system variables are initialized here in initialize_kernel().
The declaration of symbol_type and the support routines to use it with the data structure libraries.
u_data.c, u_list.c, u_stack.c, u_string.c, u_mem.c
Data collection routines. See u_data.doc for more info.
This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir32429kDQOMc/lyx_tmpbuf32429F60DSt/anyc.tex
The translation was initiated by Daniel Webb on 2002-02-08