Projects are designed to test your mastery of course material as well as your programming skills; think of them as “take-home exams” and don’t communicate with anyone about possible solutions. This project focuses on development tools for the SCRAM architecture.
gcov, etc. as part of your development process.
Your first job is to write a SCRAM
dis that can be used to comfortably examine a SCRAM object file.
You can see what this means by looking at the example files in the archive
posted on Piazza.
loop.scram object file contains a sequence of “raw” SCRAM instructions,
the very bit patterns that would be in the memory (and thus the instruction
register) of a SCRAM machine.
So that’s a mess of zeros and ones:
Without a specialized tool such as
dis the only sensible way to look at this
file is using a hexdump tool like
$ ls -la loop.scram -rw-rw-r-- 1 phf phf 6 May 26 2015 loop.scram $ xxd -g 1 loop.scram 0000000: 14 55 34 70 00 01 .U4p.. $ xxd -b loop.scram 0000000: 00010100 01010101 00110100 01110000 00000000 00000001 .U4p..
As you can see, the object file contains only 6 bytes.
xxd command displays those 6 bytes in hexademical notation, the
second in binary notation.
Now look at that first byte carefully:
It’s actually an LDA instruction that loads from address 4!
(Recall how the instruction encoding of the SCRAM works: upper 4 bits for the
opcode, lower 4 bits for the address.)
The SCRAM disassembler translates those bytes back into SCRAM assembly
language, at least approximately:
$ ./dis <loop.scram 0: LDA 4 1: ADD 5 2: STA 4 3: JMP 0 4: HLT 0 5: HLT 1
Don’t get confused by the
The disassembler is allowed to be a bit simplistic:
It doesn’t have to figure out that addresses 4 and 5 are not actually used as
instructions in this program, they only hold data.
(Note that the presence of indirect addressing on the SCRAM would indeed make
that very hard to figure out in general.)
Instead it always assumes an instruction even if the SCRAM never fetches
that location for execution.
As you saw above, the disassembler reads SCRAM object code from standard input and writes the disassembly to standard output. Of course it has no way of knowing whether it is really looking at SCRAM object code or not:
$ ./dis Hi! 0: STI 8 1: SUB 9 2: LDI 1 3: HLT a
Here the user typed
Hi! followed by the
RETURN key and
CTRL-D to end the
The disassembler reads byte after byte and dutifully prints the equivalent
It’s all just zeros and ones after all!
Of course there might be inputs that don’t actually correspond to any valid
SCRAM instruction, but
dis simply prints
??? for an unknown opcode:
$ ./dis Hö! 0: STI 8 1: ??? 3 2: ??? 6 3: LDI 1 4: HLT a
(If you want to understand why
ö results in two bytes, you’ll need to learn
about the UTF-8 encoding of
Unicode, something that has nothing
to do with this project.)
For the empty input
dis should print nothing at all.
There is only one “error condition” for
dis namely that the input is longer
than 16 bytes, the capacity of the SCRAM:
$ ./dis This will be too long for dis! 0: ADD 4 1: SUB 8 2: SUB 9 3: JMP 3 4: LDI 0 5: JMP 7 6: SUB 9 7: SUB c 8: SUB c 9: LDI 0 a: SUB 2 b: SUB 5 c: LDI 0 d: JMP 4 e: SUB f f: SUB f dis: Program too long, truncated to 16 bytes.
That’s all you need to know to write your
dis.c implementation, always
assuming you also read the source code we gave you.
Just be careful not to simply clone
sim.c because it does way more than
what is needed here.
If you hand in a program that’s overly complicated we might take points off
Nobody wants to write SCRAM programs by hand-crafting byte sequences like the
Instead, we’d like to at least be able to write something like this (see
loop.z in the archive):
LDA 4 ADD 5 STA 4 JMP 0 DAT 0 DAT 1
The assembler is a program that translates a textual description like the above into an “equivalent” 6-byte SCRAM object file. It does so by
Note the pseudo-instruction
This is not a SCRAM instruction, rather it’s a way for the programmer to
tell the assembler that a given byte is supposed to hold a certain data value.
While the addresses after
JMP, … can only be 4 bits long, the value
DAT can be up to 8 bits long since that value is written directly
into the object file without a 4 bit opcode before it!
However, even that notation is not really comfortable because the
programmer still has to manually track the addresses of all the various
instructions and data bytes.
What we would really like the assembler to take as input is a file like
loop.s in the archive):
# Simple counter program in SCRAM assembly. start: LDA count ADD one STA count JMP start count: DAT 0 # counter variable one: DAT 1 # constant 1
There are two innovations here: comments that can be used to explain pieces of a program and labels that can be used to automate address computations. Most assembly languages, including this one, are line-oriented which means that they are processed line-by-line by the assembler. In general, the structure of a line is as follows:
label: instruction # comment
All of the components are optional: we can have lines that are empty, lines that are only comments, lines that only define a label, and lines that only contain an instruction; or any combination thereof.
Comments start with the character “#” and continue to the end of the current
line; anything between “#” and the end of the line is ignored by the
Instructions consist of
OPCODE address pairs separated by whitespace; to the
list of actual SCRAM opcodes we add the
DAT pseudo-opcode as described above;
opcodes are always fully capitalized; addresses can be non-negative integers or
(Note that for the
HLT instruction we could leave out the address part
(why?); however, in the interest of making the assembler a little easier to
write, we still require an address even for
Label definitions are sequences of letters (upper or lower case, but case
matters!) that end with a colon; if a label definition is present, it has to
come before the instruction in the same line (if any).
The way addresses are assigned to labels is straightforward:
We start at address 0 for the first instruction; each (pseudo-)instruction will
advance the current address by 1 since each of them occupies 1 byte of memory.
So in the program above, the start label is 0 because there are no
instructions preceeding it.
The instructions following (
JMP) are each 1 byte long,
meaning that the count label is address 4 whereas the one label is
The only problem with all of this is that a label may be referenced before it
has been defined as is the case in the example program:
When we process “
LDA count” for the first time, we don’t know yet that
count is actually 4.
Most assemblers use a two-pass process to get around this:
In the first pass they only process the addresses and not the instructions
themselves, meaning the first pass determines what all the labels will be but
doesn’t actually generate the finished object file yet.
The second pass then uses this information to fill in the correct bit patterns
for all instructions and write the final object file.
Your job is to write the SCRAM assembler.
So you need to write a program that given something like
loop.s on the
standard input will produce a SCRAM object file like
loop.scram on the
Nothing more and nothing less.
We suggest that you first write a version of the assembler that works only for
loop.z and then extend it with comments and labels (and empty
lines!) so it can also process inputs like
(If you’re feeling particularly mighty, you can of course also try to
immediately hack the full version; but that’s not recommended.)
Please call your executable assembler
sas (short for “Scram ASsembler”).
Of course you actually submit the complete source code, not the executable.
Note on Error Messages: Your assembler should do error checking for the input program. So if a label is used that has never been defined, there should be an error. If a label is defined twice, there should be an error. If a label is out of range, or if the program is too long for the SCRAM, or if a number is too large (4-bit unsigned for addresses, 8-bit unsigned for data) there should be an error. If an unknown opcode is used or if an address is missing, there should be an error. And so on, and so forth! Please include a line number (starting at 1) with your error message to help the programmer correct their code, and please make sure you print error messages to standard error and not to standard output! No input, however cleverly crafted, should make your assembler crash!
Note on Line/String Lengths:
As a special concession to using C as the implementation language for this
assignment, you may assume that a line of input has at most 128 characters,
excluding the final
LF (line feed) character.
Similarly, you may assume that a label has at most 32 characters, excluding
: (colon) in case of a definition.
However, you still need to stop with an error message if there is a longer
line/label, your assembler may not crash!
bashscripts and tools such as
diff. If you’re not able to run a lot of test cases automatically your code will almost certainly be full of bugs.
disis mostly a warmup problem, you should be able to do this in an hour or so after reading over
opc.cfrom the archive on Piazza. Yes, you should use (but not modify!) the
Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no object files, no executable files, etc.), but allows building all required derived files. Make sure to include a Makefile that sets the appropriate compiler flags as detailed on Piazza and builds all programs by default.
Include a plain text
README.docx or whatnot) that briefly explains what
your programs do and contains any other notes you want us to check out before
Your answers to written problems should be in your
README file as well!
Make sure to include explanatory notes and detailed derivations that
tell us how you solved the problem in question (and convince us that you really
did the work).
Finally, make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!
For reference, here is a short explanation of the grading criteria; some of the criteria don’t apply to all problems, and not all of the criteria are used on all projects.
Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for projects on Piazza.
Style refers to both programming and presentation style.
Programming style includes things like consistent indentation, appropriate
identifier names, useful comments, suitable documentation, etc.
Style also includes proper modularization of your code (into functions,
modules, etc.), proper use of
Simple, clean, readable code is what you should be aiming for.
For C (and, if allowed, C++) programs, make sure you follow the style guide
posted on Piazza!
Presentation style refers to your
README file and (possibly!) your PDF files
Your presentation should be clear, structured problem-by-problem, broken into
sections (and paragraphs!) as appropriate.
Lines should be at most 80 characters in length, broken by UNIX linefeeds.
(You may use Markdown format if you so choose, but everything must still be
perfectly readable without rendering Markdown to another format.)
Diagrams should be clearly labeled, cleanly layed out, and generally a
pleasure to look at.
Performance refers to how fast/with how little memory your programs or circuits can produce the required results compared to other submissions.
Functionality refers to your programs or circuits being able to do what they should according to the specification given above; if the specification is ambiguous, ask for clarification! (It also refers to you simply doing the required work, beyond programming or circuit design!)
If your programs cannot be built you will get no points whatsoever.
If your programs cannot be built without warnings using the required compiler
options given on Piazza we will take off 10% (except if you
document a very good reason).
If your programs cannot be built using
make we will take off 10%.
valgrind detects memory errors in your programs, we will take off 10%.
If your programs fail miserably even once, i.e. terminate with an exception of
any kind or dump core, we will take off 10% (for each such case).