Assignment 2: Pascal

Background

The symbol table is a central data structure in compilers. It associates names (arbitrary strings) with values (in compilers, the value includes whether the name is a type, a reserved word, a constant, a variable, or a procedure, and significant other information as well). The standard organization of a symbol table is a hash table. The standard organization of the hash table is external chaining, in which the table is represented as an array of linked lists. All names that hash to the same value are placed on the same linked list.

The assignment

Write a Pascal program that reads an integer n followed by n space-delimited strings. Store these strings in a symbol table. The information to store with each name (that is, its value) is:

  1. The serial number of the string. The first string you read has serial number 1. If you see the same string more than once, the stored serial number does not change. However, even duplicate strings advance the serial number for strings that appear later in the input.
  2. The number of times the string has appeared.

Strings can be of arbitrary length, but the total size of all unique strings will not exceed 10000 characters. Your program should be space-efficient; I suggest you set aside a character array of length 10000 as a ``string space'' and point symbol-table entries to that space instead of storing strings directly in the symbol-table entries.

After reading in the data, your program should print this information:

  1. How many distinct entries there are in the symbol table.
  2. A list of all strings you have seen, along with the serial number and the count of times you have seen each string. The list should be sorted primarily in decreasing value of count and secondarily (to break ties) in increasing serial number.

Test your program both on your own data and on the data in http://www.cs.uky.edu/~raphael/courses/CS450/asg.pascal.data.

Logistics

The Pascal you will use is called p2c. It is available in the Multilab and the CSLab. It is a preprocessor that converts Pascal programs into C, which you then compile by a C compiler. Assuming you have built a program in prog2.p, here are commands that will compile and run your program:

 	p2c prog2.p
 	gcc prog2.c -o prog2 -lp2c
 	./prog2
The -o flag tells the compiler where to put its output. The -lp2c flag asks the compiler to tell the linker to include the library that p2c needs. You should embed such commands in your Makefile.

Restrict your code to classic Pascal. If you have access to the gpc compiler, use it with these flags:

-Wall -g --no-extended-syntax --classic-pascal

Extra credit

  1. Construct an efficient way to find all strings that have been seen j times, for arbitrary j.
  2. Remove the restriction that the total string space is limited to 10000 characters.

Due date

This assignment is due at the start of class time on the day indicated in the syllabus. See the syllabus for the late policy. Submit the assignment by email to raphael @cs.uky.edu.