### Site Tools

cc18:assignment_2

# Lexical Analysis

## Introduction

This assignment is the first phase of the compiler that you construct in this course. Extended MiniJava is the language for which you will construct a compiler. Make sure to first understand the language and its constructs and only then start working on implementing the project. If you have any questions about the language please do not hesitate to ask the instructor.

### Benchmarks Corpus

All the benchmarks will be shared in the following folder. This folder is accessible from the CS Department Linux systems (e.g., queeg.cs.rit.edu and ICLs 1,2, and 3). There are a number of benchmarks in this folder from the previous students of the compiler construction course, you can also look at those programs to get an idea of how the Extended MiniJava (eMiniJava) programs are written.

/usr/local/pub/hh/cc/benchmarks

Please submit your benchmark files (eMiniJava programs) with the extension .emj. Try to use meaningful names for your test files, not test.emj or mybenchmark.emj. For example, if you implement bubble sort try using bubblesort.emj or bubble-sort.emj or something like that. Please put a comment at the first line of any of your test files with the names of the people in your group. This will help us to contact you if there is a problem in any of your programs.

//author: Jack Sparrow & Alice Wonderland

Write the lexer for eMiniJava. As we discussed in Lecture 3, the main approach to describe the tokens of a language is by using regular expressions. After writing the regular expression description of the tokens, you can manually (Lecture 4) or automatically (Lecture 5) construct the lexer. In this phase you are allowed to choose any of the following techniques.

• Manual: convert your regular expressions to programs directly. Use the $\text{FIRST}$ set of regular expressions if there are different choices in token description.
• Automatic: give your regular expressions to JFlex and let the tool automatically generate a lexer for you.

Although we highly recommend using the official language of the course (Java) to implement this phase, however, if you are more productive with another language you are allowed to pick that language.

### Interface

Since we are not imposing any code structure for your programs, it is extremely important for your compiler to exactly follow a fixed interface as described here. This allows us to uniformly run and test all the projects from different groups of students. Command-line is the primary interface for your users to interact with your compiler. As your compiler matures gradually, your command-line interface will support more possible options. A general form for the command-line interface is as follows:

emjc [options] <source file>

For this phase of assignment, the only possible option is ––lex. For example,

emjc ––lex filename.emj

After executing the command above, an output file named filename.lexed is generated to provide the result of lexing the source file. Each line in the output file corresponds to each token in the source file in the following format: <line>:<column> <token-type> where <line> and <column> indicate the beginning position of the token, and <token-type> is one of the token types of eMiniJava. For example, if the input file contains only the following line:

x = this.func(x - 1);

The content of the generated lexed file is the following:

1:1 ID(x)
1:3 EQSIGN()
1:5 THIS()
1:9 DOT()
1:10 ID(func)
1:14 LPAREN()
1:15 ID(x)
1:17 MINUS()
1:19 INTLIT(1)
1:20 RPAREN()
1:21 SEMICOLON()
1:22 EOF()

When reporting the positions of tokens, consider tab (\t) as 4 spaces.

### Reference Compiler

There is a reference compiler (compiler.jar) in the following folder written by the grader of the course (Bhavin Navin Shah bns8487@rit.edu).

/usr/local/pub/hh/cc/

You can e.g. run this compiler as the following to check the validity of your benchmarks for the first task. If you found a behavior from the reference compiler that you think does not match the description of Extended MiniJava (eMiniJava) please feel free to send Bhavin an email.

java -jar compiler.jar --type test.emj