Compiler Principles
Undergraduate course, Ben-Gurion University, 1900
Lecturer 2018-2021; TA 2017-2018; Grader 2015-2016.
Course Objectives
- Gain additional insight into programming languages.
- Understand the major components of the compiler: Syntactic analysis, semantic analysis, code generation, and the run-time environment. Gain hands-on experience in crafting these components.
- Learn about compiler optimizations: What compilers do to generate code that is faster, shorter, and performs better. Implement many of these optimizations, and see how they improve the code.
- Be able to apply information & skills learned in the compilers course to other areas in computer science where syntactic and semantic analysis, code generation, and translation are needed.
Detailed Syllabus
Introduction to Compiler Construction
- The algebraic relationship between compilation & interpretation.
- Cross-compilation, boot strapping a compiler, de-compilation.
- The stages of the compiler: What work is done in each, what kinds of errors can and cannot be detected at each, the basic algorithms that are implemented at each stage.
- Dynamic vs statically-typed languages. Early binding vs late binding. The information available to the compiler for translation, error detection, and optimizations.
Scanning & Parsing Theory
- Scanner: DFA, NDFA, NDFA with ϵ-transitions
- Parsing: Top-down, recursive descent parsers, parsing combinators, bottom-up parsers
- Hand-coding various parsers
- Using parser-generation tools in C & Java
- Macro expansion: Syntactic transformations, reduction to core forms in the language, variables, meta-variables and syntactic hygiene.
Programming Languagues
- Functional vs Imperative programming: How change & side-effects are understood & modeled in the functional view of programming.
- Parameter-passing mechanisms: Call-by-value, call-by-reference, callby-sharing/object, call-by-name, call-by-need. Features of the various parameter-passing mechanisms, their motivation, history & implementation.
- Scope & its implementation: Dynamic scope (deep binding, shallow binding), lexical scope. Dynamic scope and the implementation of exception handling.
- The structure of the lexical environment, and the implications for data sharing & side effects.
- Object-oriented vs functional programming Languagues. The structure of the closure compared to that of the object. Mapping of lambdaexpressions to objects. The virtual method table.
- Monads & monadic programming.
Continuation-Passing Style (CPS)
- CPS as a programming technique (multiple return values, multiple continuations, co-routines, implementation of threads.
- CPS as an approach to writing a compiler: CPS, defunctionalization of the continuation, stack machine.
- CPS as an intermediate language for the compiler: Optimizations that are simpler in CPS.
Semantic Analysis
- Lexical addressing, deBruijn numbering
- Identification of tail calls
- Boxing, data indirection, and motion from the stack to the heap: A comparison between quasi-functional programming languages (Scheme, LISP) and object oriented programming languages (Java).
Code Generation
- Layout of Scheme objects in memory. Run-time type information. Comparison with the situation in object-oriented programming languages.
- An overview of the proof of correctness of the compiler, and how it is constructed along with the code generator.
- Optimization of tail calls
- Code generation to native x86 instructions for the various expressions in our language
- The primitive procedures & support code that are provided with the compiler
The Run-Time Environment
- The top level: n-LISP – value cells, function cells, property cells, etc.
- Dynamic memory management:
- Reference counting
- Garbage collection: mark & sweep, stop & copy, generational garbage collection
- Namespaces, modules, and their implementation
Compiler Optimizations
References: 1, 2, 3, and notes
- The tail-recursion & tail-call optimizations
- Loop optimizations & transformations
- Array optimizations
- Strength reduction optimizations
- Dead-code removal, write-after-write optimizations
- Common Sub-expression Elimination, both as a high-level and lowlevel optimization
- Optimizations for super-pipelined and parallel architectures