User Tools

Site Tools


cc18:assignment_4

Name Analysis

Introduction

This assignment is a part of the semantic analysis phase of the compiler that you construct in this course. You will add name analysis to your Extended MiniJava compiler in this assignment. Name analyzer can significantly simplify implementing the type checker. Part of your tasks for this assignment is to fix any problems that you still have in your previous phases (if any). The goal of name analysis is twofold: we want to reject programs which contain certain types of errors, and we want to associate symbols to all identifiers. If you have any questions about this assignment please do not hesitate to ask the instructor.

Symbols

Symbols are values which uniquely identify all class, method and variable names in a program, by mapping their (multiple) occurrences to their (unique) definition. Identifiers are already present in the AST and contain names as well, but these are not sufficient to distinguish e.g. between a class member and a local variable with the same name. This week you will add this missing information to the ASTs.

Rejecting erroneous programs

In the process of mapping occurrences to definitions, the compiler will be able to detect the following kinds of errors in faulty programs:

  • a class is defined more than once
  • two class members in a class have the same name
  • a class member has the same name as one in an inherited class (field overriding is forbidden in Extended MiniJava)
  • a method is overloaded (forbidden in Extended MiniJava)
  • a method overrides another with a different number of parameters
  • in a method, two parameters/local variables have the same name
  • a class name is used (as parent class or type, for instance) but is not declared
  • an identifier is used as a variable but not declared
  • the inheritance graph has a cycle (eg. “class A extends ; B {} class B extends A {}”)
  • this is referenced from the main function (this is disallowed in a static method)

Note: We cannot check for the following errors yet, because we need type checking for them:

  • method calls correspond to methods defined in the proper class
  • each overriding method has compatible argument and return types with the method it is overriding

Symbols as scopes

Scopes in Extended MiniJava are only of three kinds:

  1. the global scope (the set of declared classes)
  2. the class scopes (all members and methods within a class, plus the global scope)
  3. the method scopes (the parameters and local variables, plus the corresponding class scope)

This in fact defines a hierarchy among symbols:

  • all class symbols are defined in the global scope
  • all methods are defined in a class scope
  • variables are defined either in a method scope, as arguments or locals, or in a class scope

Pretty-Printing

As your compiler grows, testing your compiler can become a challenge. Sometimes you may encounter a bug in your compiler implementation that you cannot immediately understand from which phase the incorrect behavior comes from. A good way to debug your compiler is to write a pretty-printer for you AST. This can be done in a similar way as the pretty printing worked for the while language in the first assignment. After your compiler constructs an AST for a given input and attaches symbols to its identifiers, a call to the pretty pretty-printer will print the AST as a eMiniJava program. As we discussed in Lecture 19, in static scoping we can rename variables to avoid any shadowing (make all vars unique). So when pretty printing the current program:

class Example {
  public static void main(String [] args) {
    System.out.println(new B().foo());
  }
}
 
class B extends A {
  public int foo() {
    val = 42;
    return val;
  }
}
 
class A {
  int val;
  public int foo() {
    boolean val;
    val = false;
    return 41;
  }
}

The analyzer should print an output like this:

class Example_0_ {
  public static void main(String[] args_1_) {
    System.out.println((new B_2_().foo_#error#_()));
  }
}
 
class B_2_ extends A_3_ {
  public int foo_7_() {
    val_4_ = 42;
    return val_4_;
  }
}
 
class A_3_ {
  int val_4_;
  public int foo_5_() {
    boolean val_6_;
    val_6_ = false;
    return 41;
  }
}

Note that:

  • Overriding methods have a different symbol than their overridden counterparts.
  • Method names in method calls are unresolved symbols.

Your task

Here is how we recommend you proceed for the implementation:

  1. Collect all symbols: create the symbol class instances, and attach them to definitions of the main object, classes, methods, variables and formal parameters.
  2. Attach the proper symbol to the occurrences of the identifiers and this. Make sure to attach symbols to all identifiers, whether they are in expressions, statements, or type trees.
  3. Make sure that you emit errors and warnings when appropriate. Unlike the parser, where it is hard to continue parsing when there is an error in the program, name analyzer can almost always continue when it encounters a user error, and possibly report more errors later in the program (much like during lexing). So do not exit with fatal error in this phase; the name analyzer should report as many number of errors as it can.

Constraints

Here are all the constraints that your analyzer should enforce:1)

Variable declarations
  • No two variables can have the same name in the same scope, unless one of the two cases of shadowing occurs.
  • All variables used must be declared.
Shadowing

Shadowing can occur in two different situations:

  1. a local variable in a method can shadow a class member
  2. a method parameter can shadow a class member

All other types of shadowing are not allowed in Extended MiniJava.

Classes
  • Classes must be defined only once.
  • When a class is declared as extending another one, the other class must be declared and cannot be the main class.
  • The transitive closure of the “extends” relation must be irreflexive (no cycles in the inheritance graph).
  • When a class name is used as a type, the class must be declared. The main class cannot be used as a type.
Overloading
  • Overloading is not permitted:
    • In a given class, no two methods can have the same name.
    • In a given class, no method can have the same name as another method defined in a super class, unless overriding applies.
Overriding
  • A method in a given class overrides another one in a super class if they have the same name and the same number of arguments.2)
  • Fields can not be overridden.
Reference to This
  • this is only referred to from a class method (as opposed to the main method)

Interface

Since we are not imposing any code structure for your programs, it is extremely important for your compiler to exactly follow a fixed interface as described here. This allows us to uniformly run and test all the projects from different groups of students. Command-line is the primary interface for your users to interact with your compiler. A general form for the command-line interface is as follows:

emjc [options] <source file>

For this phase of assignment, there are five possible options. If the user does not provide a correct option, the compiler treats it the same as the –help option.

  1. ––help: Prints a synopsis of options
  2. ––pp: Pretty-prints the input file to the standard output
  3. ––lex: Generates output from lexical analysis as described in Assignment 2.
  4. ––ast: Generates output from syntactic analysis as described in Assignment 3.
  5. ––name: Generates output from name analysis

After executing the compiler with the option ––name for the source file filename.emj the compiler either accepts the program as a good eMiniJava program with printing out the following line:

Valid eMiniJava Program

Or, it prints out a set of errors (such as unknown identifier). Like the previous phases the format of an error message is the following:

<line>:<column> error:<description>

where <line> and <column> indicate the beginning position of the error, and <description> details the error.

Benchmarks Corpus

Please ensure to test your code extensively before submitting your code. You can compare the outputs of your submission with other groups, but sharing code is forbidden among groups. The benchmarks are available in the following folder which is accessible from the CS Department Linux systems (e.g., queeg.cs.rit.edu and ICLs 1,2, and 3).

/usr/local/pub/hh/cc/benchmarks

Deadline and Deliverables

The deadline of the this assignment is March 21st at 11:59pm (https://mycourses.rit.edu). Please upload all your files as a single zip file. You should include a readme.txt file in your package to describe how your source code can be compiled and built. Ideally your project should include a user-friendly compiling technique with Makefile, Ant or similar tools.

When grading, we are serious about simple programming errors. You are writing a compiler to check the programs of other people, so your compiler should not have simple errors itself! Your compiler should never crash for an incorrect input. We expect your compiler to give a comprehensive error when the user does not provide a valid input to it.

1)
Note that this is simply a reformulation of the types of errors we want to catch.
2)
Of course this constraint will be tightened once we start checking types.
cc18/assignment_4.txt · Last modified: 2018/02/28 21:22 by hossein