Proposal: combine qemu's tcg with tcc to produce new embedded compiler (qcc).


Rob Landley
 

The QEMU project has a fairly general purpose "Tiny Code Generator" which is
capable of producing machine code for every target QEMU supports. This code
generator is well maintained (by the qemu development community), operates
extremely rapidly (producing code "on the fly"), and supports a large and
increasing number of platforms, even distinguishing many specific variants
within each platform (the qemu -cpu options).


Before QEMU, Fabrice Bellard's previous open source project was the Tiny C
Compiler (tcc), which was notable for its small size (approximately 100k for a
combined compiler/assembler/linker), its self-contained nature (not requiring
external packages such as binutils), its speed of compilation (millions of
lines of source code per second even on low-end hardware), and its "-run" mode
(allowing use of C as a scripting language by starting a source file with
"#!/usr/bin/tcc -run" and setting the executable bit on the source file).

Tinycc provides almost full c99 support (most notably missing complex number
support and variable extent arrays). In 2004, tinycc became the only open
source compiler other than gcc to compile a working LInux kernel (albeit in
limited circumstances). Fabrice Bellard created "tccboot", a proof-of-concept
project in which tcc was used to boot a Linux kernel directly from source
code. The tccboot ISO image booted directly into a modified tcc binary bundled
with a modified subset of the 2.4 Linux kernel source. It compiled this source
to create a vmlinux, which it then executed.

QEMU actually started as an offshoot of tcc. When fabrice looked into
providing multiple output formats for tcc (to support targets other than 32-
bit x86), he started playing with multiple input formats as well, such as
pages of existing machine code. The result was qemu, which is actually a
"dynamic recompiler" rather than an emulator.

The TCC project stalled when QEMU expanded to take up all Fabrice's time, and
the project remained moribund for several years. (Recently the original tcc
has been relaunched as a windows-centric project, but its current maintainer
has shown little to no interest in Linux or non-x86 targets.)

The tcc codebase as Fabrice left it provides an almost complete c99 compiler.
Combined with qemu's code generator, this could provide a small fast compiler
capable of running on and producing output for a wide range of embedded
hardware.

Creating a "qcc" from tcc and tcg would involve:

1) Turn tcc into a "swiss army knife" executable (like busybox) so it its
individual functions could be called as cc, ld, as, strip, cpp, and so on.

1A) optional - use the Firmware Linux ccwrap.c code to increase understanding
of gcc command line options.

1B) optional - add missing utilities such as readelf, objdump, objcopy...

2) Refactor the code to untangle preprocessor, compiler, assembler, and linker
functions.

3) Replace existing target code generation with qemu's tcg.

4) Add support infrastructure for targets supported by tcg (assembly code
parser, ELF header information)

5) Add missing functionality needed to build unmodified Linux 2.6, BusyBox, and
uClibc. (The linux kernel needs variable extent arrays, simple dead code
elimination, assembly output (at least via objdump) to generate
asm-offsets.h...)

Rob

P.S. See attached for some design work I did earlier this year.
--
Latency is more important than throughput. It's that simple. - Linus Torvalds


Tim Bird <tim.bird@...>
 

Rob Landley wrote:
...
Creating a "qcc" from tcc and tcg would involve:
Thanks. Proposal page is at:
http://elinux.org/CELF_Project_Proposal/Combine_tcg_with_tcc
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================