AnyC
A compiler for all 8-bit microprocessors

AnyC comes with ABSOLUTELY NO WARRANTY!

This is free software, and you are welcome to redistribute it under certain conditions; see gnu.txt for details. There is no warranty, expressed or implied, with this software. This program in its current state is intended for hobbyists. It is not to be used in any situation where property could be damaged by an error in this software. It is definitely not to be used in any situation where safety depends on this program.

1 User Documentation

1 Quick-Start

AnyC is a C compiler that generates assembler code for 8-bit microcontrollers

1.0.0.1 System Requirements

Any system with the GNU C compiler should be able to compile AnyC.

2 Introduction

This early version of AnyC is intended for programmers only, since it doesn't generate code that is very useful yet. Please email me (_email) and let me know what should be added to the documentation next.

2.1 Project Values

This c compiler was developed with the following priorities, ranked in order:

Portability
This means portability of the compiler to any computer which can run the GNU c compiler, and easy retargetting of the assembler output for any microcontroller (especially 8 bit RISC microcontrollers).
Code Size
AnyC relies heavily on ``internal library'' routines nested up to 6 levels deep to decrease code size.
Simplicity
I want to keep the source simple so that other people can change the compiler if they need to without too much hassle. If you have ever looked at the GNU gcc source, you know what I am talking about.
Execution speed
Since this is the lowest priority, the code output by this compiler will be slower than commercial c compilers.

3 Building AnyC

All my developement tools are from GNU (GNU make, GNU gcc) or have a GNU-type license (cextract). The program cextract is needed if you want to change the program and re-make. I have included it in the directory cextract.

If you want to change the parser (gra1.y), you will need GNU bison (a yacc clone).

If you want to change the lexer (lexer.l), you will need GNU flex (a lex clone).

If you are using Windows, you can get the GNU tools (gcc, bison, flex) from www.delorie.com.

These are the versions of the GNU tools I am using:

make: version 3.77
gcc: version 2.80
bison: version 1.25
flex: version 2.5-4

Hopefully anyc will compile on any version of any of the above.

4 Running AnyC

Before compiling code with AnyC, make sure it compiles with gcc with no errors! There is almost no error checking in AnyC right now.

The only option is to run AnyC with the input filename and output filename as the only arguments:

anyc myfile.c output.asm

Try the sample .c file included with the package: input.c

The command:

anyc input.c output.asm

will produce an output file output.asm and a library file, lib.asm.

By the way, this program is slow as molasses! Guess my coding isn't very sophisticated.

Also, if you want to run the program using stdin and stdout, use:

anyc debug

More documentation for general users will come later.

2 Technical Documentation

5 The Big Picture

5.1 Overall Organization

There are many layers of abstraction in AnyC. From highest to lowest they are:

5.1.1 Grammar (or parser)

The highest level of abstraction, the grammar describes the logical structure of a C file, and how to respond to each part. The C grammar is implemented by the bison¹ parser, whose input file is gra1.y. The documentation for bison is at http://www.gnu.org/manual/manual.html.

5.1.2 Lexer

The parser calls the lexer, and the lexer then returns the next token in the C input file. Each time the parser needs the next token, it calls the lexer. The lexer is implemented by the flex² lexer. The documentation for flex is at http://www.gnu.org/manual/manual.html.

5.1.3 Parser utility functions

These routines are mostly devoted to taking care of the expression stack and performing operations on the expression stack. The expression stack routines found here call the assembler libraries in the next section to perform memory moves and math operations. Type conversion routines are also here.

5.1.4 Assembler Libraries (lib_asm.c, lib_mem.c, lib_math.c)

The assembler libraries provide a level of abstraction between raw assembler instructions and instructions available in C. This is done because the 8-bit microcontrollers can't do even a fraction of the things required for a full ANSI C implementation. To get around this, the compiler calls these assembler ``internal library'' functions to get things done. For example, adding two double precision number together is obviously not something an 8-bit microcontroller can do. There are two ways to accomplish the addition:

create the code to accomplish the addition inline
create a function in assembler that can accomplish a generic double precision addition and call this function

The advantage of the second case is that if you need to do this kind of addition more than once, you create the assembler library function once, then call it each time it is needed. This does have some drawbacks, though:

it takes overhead to push things to the stack and call a function
because of the architecture 8-bit microcontrollers, it is much easier to write math and memory routines that work on specific memory locations, as opposed to passing the routine a pointer. This is the reason for the dedicated math registers. The downside of this is that the contents of the math registers must be pushed to the stack when interrupts occur to make sure the program is reentrant.

5.1.5 Hardware functions

The lowest layer is the hardware layer, implemented in the hardware.c file. This file contains all the functions which generate assembler code. These functions are the ONLY place in the entire package that assembler code is generated! To retarget the compiler, just switch the assembler statements in each of the functions in hardware.c. Please send me any other ports you create, so I can include them in the collection.

6 AnyC in Detail

6.1 Data Memory Organization (for the target program, not the compiler)

Fixed Global Memory
This is the memory that is accounted for at compile time:
1. Literals in the code (like printf(``test'');, where ``test'' is the literal)
2. Global variables (declared outside any function)
Heap
The heap is where dynamic memory allocation comes from. Not implemented yet.
Virtual Registers (MATHA, MATHB, MATHC)
AnyC uses three ``registers'' which are large enough for the largest data type used in a given program. I call them registers because the internal library functions operate on them as if they were registers. All math and comparison operations occur in these registers.
Stack
The following are stored on the stack:
1. Function return addresses
2. Function parameters
3. Function local variables
4. Expressions
The only way I know of to create code that is reentrant is to put all these things on the stack. Any time an interrupt occurs, all the interrupting function has to do is push the virtual registers to the stack along with the return address and the complete state of the machine is stored. It can then do anything it needs to do, including call any math routine, without disturbing the function that was interrupted.

6.2 Dynamic Internal Library Scheme

One thing that is blantantly obvious from the first time you look at 8-bit microcontrollers is this: they can't do much! The least expensive ones can basically only add two 8-bit numbers or do extremely simple comparisons. However, a useful implementation of ANSI C has much more complicated requirements, such as double-precision multiplication and division. Luckily, any mathematical operation can be reduced to operations an 8-bit controller can handle. Because this often requires many simple 8-bit instructions to accomplish, most operations that are not native to the microcontroller are separated into subroutines. These subroutines are contained in the files lib_mem.c and lib_math.c. Often, these subroutines are nested. An example is integer multiplication: the multiplication subroutine in lib_math.c calls other subroutines such as the integer addition subroutine to accomplish the multiplication. I call these routines ``internal library'' routines to differentiate them from the standard C library. These internal libraries are hidden behind the scenes in the compiler and are not accessible to user programs.

However, since microcontroller memory is so small, you would prefer to only generate a library of subroutines that will actually be used in a given program, instead of creating a fixed library that contains all possible operations. Double precision division is a lot of code, and you don't want to include it if you don't use it. My solution is to call all internal library operations through the function caller(). caller() is given the internal library function and its arguments. It keeps track of which internal library functions have been called, and then during the second pass of the compile, generates the library so that only the needed functions are in it.

Internal library functions can be called in the following contexts:

INLINE mode expands the library subroutine inline at the current point in the code
CALL mode generates assembler instructions for calling the subroutine in the library
LIBRARY mode generates the library assembler routine. This mode is used when generating the library during the second pass of the compile.

6.3 The Expression Stack

The ss_expression_stack is a structure which contains information about expression evaluations which will be done. ss_expression_stack is a list of symbols along with their location.

In the parser, each symbol is pushed to the expression stack as it is encountered in an expression. The operators are called by calling evaluate1() and evaluate2() for 1 and 2 argument operators, respectively. The parser basically converts a standard expression to reverse polish expression. Reverse polish is an easier system to work with when trying to optimize expressions and manage memory movement. If you haven't used it before, it is just basically that you load the arguments to an operator *before* the operator. For example, to add "1+2", the reverse polish expression is "1 2 +". A reverse polish expression is read left to right, and the operators as performed as they are encountered (as opposed to conventional expressions where "1+2*3" means do the 2*3 first).

6.4 Type conversion

All type conversions are done in MATHB - conversions from one whole number type to a larger whole number type: Take most significant bit in variable in copy left until big enough - conversions from a whole number type to a smaller whole number type: Throw away extra bytes

7 Compiling AnyC

First I should mention the details about my makefile:

I put all interface in the .c files. I do this by extracting the symbol interface and the function interface separately. The symbol interface is accomplished by putting all symbols which need to be interfaced within an "interface" block, and the rest of the program outside it:

#ifdef __INTERFACE__

extern int global1;

extern char global2;

#elseif /* __INTERFACE__ */

int global1;

char global2;

** local variables and functions here

#endif /* __INTERFACE__ */

I then use the C preprocessor to define __INTERFACE__ and generate a file ending with .hs. Only the stuff in the __INTERFACE__ block goes in the .hs file.

Then, to get the function interface out of the .c file, I use a program called cextract, by Adam Bryant. This program extracts function prototypes from your .c code and puts them in a file. To prevent it from making prototypes for all the functions which are #included in my .c program, I put the following around my #includes:

#ifndef __CEXTRACT__

#include <stdio.h>

#include "whatever.h"

#endif

The function prototypes are put in a file ending with .hf. The .hs and .hf files are then combined to make a .h file. If you are not using Windows, you will need to change the line in the makefile that adds the .hs and .hf files together! Now you have an automatically generated interface file that can be included in your other files. The nice thing about this is that you can completely ignore everything except .c files. Also, I have it set up to make a simple documentation for each .c file. cextract has a documentation mode which extracts the first comments in a file, and the first comments before each function. I make a .txt file with this output for each .c file.

The cextract program is included in the AnyC package exactly as I have found it in the directory /cextract. The configuration files needed to use cextract with AnyC are included in the Anyc distribution as cextract.cfg, and cextrac2.cfg, in the main directory. The makefile settings for the location of cextract must be changed depending on where you put cextract. I couldn't get it to work on Windows without putting the cextract executable and configuration files all in the AnyC directory. Also, the installation files included with cextract didn't work at all for me. However, I compiled the program with 'gcc -o cextract.exe main.c io.c parse.c' and it runs fine.

The compiler is divided into several layers:

7.0.1 hardware

universal data system libraries

(u_data.c, u_list.c, u_stack.c, u_string.c, u_mem.c)

This is a library I wrote to take care of basic data structures I needed, like stacks, lists, and strings.

See u_data.doc for info on this part of the system.

--------

Data structures:

--------

ll_symbol_table -

A 2-dimension linked list holding all the symbols in the program. Global variables, function symbols, and struct symbols are appended FORWARD from the origin of the list. Function parameters and local variables are appended DOWN from the function symbol in which they belong.

ss_pending -

When parsing the program, the parser pushes declarations onto this stack. At some point (before the symbols are needed), a routine in gra_util.c pops the symbols off ss_pending stack and loads them onto ll_symbol_table in the correct places.

ss_expression_stack -

When parsing an expression, symbols are pushed to ss_expression_stack as the parser reads them. When the parser decides to reduce and perform a math operation, ss_expression_stack is used to see what symbols to operate on.

------

Data types:

------

symbol_type* - defined in symbols.c

This is a struct which holds all the info on each symbol in the program. I use this for almost everything.

library_type* - defined in caller.c

a struct which holds info on a library routine which has been used.

Communication between lexer and parser:

--------------------

The lexer is called by the parser and returns a token. In the case of a symbol, the lexer also sets gg_lexer_symbol. The parser can then run the routine recognize_symbol(), which looks for the symbol in the symbol table.

*****************************

Alphabetical list of .c files

*****************************

caller.c

------

The function caller() is very important in AnyC. All library functions in lib_mem.c and lib_math.c are called indirectly, using caller(). The parameters to caller() are the same as the parameters to the math or memory routine, except that the name of the library routine is inserted at the beginning of the arguments list, and the last argument is NULL.

The other functions in caller.c keep track of the library routines which have been called, and generate an assembler library for your program.

gra1.y

------

This is the grammar file. It is used by the parser-generater program bison, which is a standard GNU tool.

The following command is used to make the parser program from gram.y:

bison -d gra1.y

gra_util.c

-----

All the utility functions used by gra1.y parser. Symbol table generation, expression evaluation, etc.

labels.c

------

Keeps track of assembler labels.

lexer.l

------

This is the lexer file. It is used by the lexer-generater program flex, which is a GNU tool. A lexer is a tool that performs pattern-matching on text. It converts patterns in the c code into "tokens", which it then passes on the the parser. An example:

// c code:

int test;

test = 0;

// What the lexer sees and sends to the parser:

INT SYMBOL SEMICOLON SYMBOL EQUALS CONSTANT SEMICOLON

This is just an example for illustration. The parser then takes this information and figures out how it fits into the c grammar. The following command is used to make the lexer program from lexer.y:

flex lexer.l

lib_mem.c

------

Memory library routines

lib_math.c

------

Math library routines

main.c

------

contains main() and a few #defines. Mostly just the framework for calling the parser.

settings.c

-----

All the global variables are declared and defined here. Several routines to show register and type sizes. Also, the C kernel and all the C system variables are initialized here in initialize_kernel().

symbols.c

------

The declaration of symbol_type and the support routines to use it with the data structure libraries.

u_data.c, u_list.c, u_stack.c, u_string.c, u_mem.c

-------------------------

Data collection routines. See u_data.doc for more info.

About this document ...

AnyC
A compiler for all 8-bit microprocessors

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir32429kDQOMc/lyx_tmpbuf32429F60DSt/anyc.tex

The translation was initiated by Daniel Webb on 2002-02-08

Footnotes

...bison ¹: Bison is a free clone of the parser generator program yacc.
...flex ²: Flex is a free clone of the lexer generator program lex.

Daniel Webb 2002-02-08

AnyC A compiler for all 8-bit microprocessors

Footnotes

AnyC
A compiler for all 8-bit microprocessors