[Tilbage til Værktøjer og Vejledninger]
The IJVM tools consist of an assembler and an interpreter for the subset of Java bytecode introduced in Structured Computer Organization (Tanenbaum, 1998), section 4. The assembler, ijvm-asm, translates symbolic IJVM instructions into IJVM bytecode. The bytecode produced, in turn, serves as input to the interpreter, ijvm, which executes the bytecode and gives a detailed execution trace.
Consider the following assembly language IJVM program:
.method main // int main .args 3 // ( int a, int b ) .define a = 1 .define b = 2 // { bipush 88 // Push object reference. iload a iload b invokevirtual min ireturn // return min ( a, b ); // } .method min // int min .args 3 // ( int a, int b ){ .define a = 1 .define b = 2 .locals 1 // int r; .define r = 3 iload a // if ( a >= b ) iload b isub // stack = a - b, ... ; a - b < 0 => a < b iflt else iload b // r = b; istore r goto end_if else: iload a // r = a; istore r end_if: iload r // return r; ireturn // }
It consists of two methods main and min. They both take two arguments of type integer. This is specified by the directive .args 3, two integers plus the implicit object reference, SCO, p. 223. The method min has a single local variable specified by the directive .locals 1. Symbolic constants are introduced in both methods to ease readability of the index addressed access to the arguments (a and b) and the local variable (r).
Suppose the file test.j contains this program. To translated test.j into bytecode and save the bytecode in the file test.bc use the assembler ijvm-asm as follows:
ijvm-asm test.j test.bc
The result is a file containing the bytecode represented as:
main index: 0 method area: 40 bytes 00 03 00 00 10 58 15 01 15 02 b6 00 01 ac 00 03 00 01 15 01 15 02 64 9b 00 0a 15 02 36 03 a7 00 07 15 01 36 03 15 03 ac constant pool: 2 words 00000000 0000000e
The bytecode contains three regions; main index, method area and the constant pool. The main index specifies the index in the constant pool of the address of the main method initially invoked by the interpreter. The method area holds the bytecode generated for the methods in the program.The constant pool contains the constants used in the program and for each method an entry with the start address of the method in the method area.
The bytecode program can be executed by the IJVM interpreter ijvm e.g. as follows:
ijvm test.bc 77 43
In this case the two arguments 77 and 43 are passed as actual parameters to the initial invokation of main. The result is a detailed execution trace on standard output:
IJVM Trace of foo Mon Sep 18 18:29:26 2000 stack = 0, 1, 43, 77, 15 bipush 88 [10 58] stack = 88, 0, 1, 43, 77, 15 iload 1 [15 01] stack = 77, 88, 0, 1, 43, 77, 15 iload 2 [15 02] stack = 43, 77, 88, 0, 1, 43, 77, 15 invokevirtual 1 [b6 00 01] stack = 12, 13, 0, 43, 77, 21, 0, 1 iload 1 [15 01] stack = 77, 12, 13, 0, 43, 77, 21, 0 iload 2 [15 02] stack = 43, 77, 12, 13, 0, 43, 77, 21 isub [64] stack = 34, 12, 13, 0, 43, 77, 21, 0 iflt 10 [9b 00 0a] stack = 12, 13, 0, 43, 77, 21, 0, 1 iload 2 [15 02] stack = 43, 12, 13, 0, 43, 77, 21, 0 istore 3 [36 03] stack = 12, 13, 43, 43, 77, 21, 0, 1 goto 7 [a7 00 07] stack = 12, 13, 43, 43, 77, 21, 0, 1 iload 3 [15 03] stack = 43, 12, 13, 43, 43, 77, 21, 0 ireturn [ac] stack = 43, 0, 1, 43, 77, 15 ireturn [ac] stack = 43 return value: 43
The execution trace shows the disassembled bytecode instructions in the left column and the raw bytecodes in the middle column. The right column displays the top words of the stack (at most the eight top words are displayed). The first line shows the initial stack content after main has been invoked by the interpreter with the actual arguments taken from the command line i.e. 43 and 77. When the execution terminates the value returned from main is printed.
This section describes the syntax for the IJVM assembly language in a modified Backus-Naur form. In particular the notation method+ means one or more occurrences of method and directive* means zero or more occurrences or directive. Certain restrictions and features that are not directly visible from the syntax, are summarised here:
- All literals are case-sensitive.
- An arbitrary number of ``white-space'' characters (ie. space, tab and newline) are allowed between terminals and non-terminals on the right side of productions.
- A symbol can be arbitrarily long, but must start with a letter, possibly followed by more letters, digits or "_". These are examples of valid symbols: fibonacci, then23 and MyMult_32.
- An integer is either specified in decimal notation (eg. 143, 45 or -31) or hexadecimal notation (eg. 0xf000, 0xbeef or -0x34).
- Comments start with "//" and extend to the end of the line.
- A method with the name main has to be one of the methods in the program.
- Arguments to invokevirtual must be names of methods declared using the .method directive.
In the following literals are written in boldface:
program : method+ method : .method symbol directive* insn+ directive : .args expr | .locals expr | .define symbol = expr insn : bipush expr | dup | goto symbol | iadd | iand | ifeq symbol | iflt symbol | if_icmpeq symbol | iinc expr , expr | iload expr | invokevirtual symbol | ior | ireturn | istore expr | isub | ldc_w expr | nop | pop | swap | symbol : expr : integer | symbol | expr + expr | expr - expr | ( expr )
The contents of an IJVM assembler language program is a set of methods, each declared using the .method directive. For each method the number of arguments the method takes can be specified (.args) and the number of local variables to be allocated upon invocation (.locals). If nothing is specified the number of arguments will default to 1, since an object reference must always be passed. The default number of local variables is 0.
Using the .define directive symbolic constants can be introduced. The scope of these definitions is limited to the current method. A label is declared by writing its name followed by a colon in front of the instruction they refer to, ie.:
while: iload i bipush 1 ...
As with symbolic constants, the scope of labels is limited to the current method. That is, only goto, ifeq, iflt and if_cmpeq instructions within the same method can use the label as jump target.
A simple program copy.j that copies an input stream of characters to an output stream:
When an IJVM program produces output on the screen through the method putchar , this output is interleaved with the execution trace. To avoid the execution trace a silent activation of the interpreter can be performed as follows:// An input stream of characters is copied from standard input // to a stream of characters on standard output until the character // 'f' is encountered. .method newline // int newline() .define nl = 10 // { .define OBJREF = 44 bipush OBJREF bipush nl invokevirtual putchar // putchar(nl); ireturn // return nl; // } .method main // int main(){ .locals 1 // int c; .define c = 1 .define asciif = 102 .define OBJREF = 44 bipush OBJREF invokevirtual getchar istore c // c = getchar(); while: iload c // while ( c!='f') { bipush asciif isub ifeq end_while bipush OBJREF iload c invokevirtual putchar // putchar(c); pop // discard return value bipush OBJREF invokevirtual getchar istore c // c = getchar(); goto while // } end_while: bipush OBJREF invokevirtual newline // newline(); pop iload c ireturn // return c; // }
The output of the program copy.bc, ie. ggtsj678, is then the only output that will appear on the screen together with the returned value 102.ijvm -s copy.bc ggtsj678 return value: 102
The 16 bit unsigned integer indices used for the predefined methods getchar and putchar in invokevirtual are 32768(0x8000) and 32769(0x8001).
Opcode | Mnemonic | Description |
---|---|---|
0x10 | BIPUSH byte_exp | Push a byte onto stack |
0x59 | DUP | Copy top word on stack and push onto stack |
0xA7 | GOTO label | Unconditional jump |
0x60 | IADD | Pop two words from stack; push their sum |
0x7E | IAND | Pop two words from stack; push Boolean AND |
0x99 | IFEQ label | Pop word from stack and branch if it is zero |
0x9B | IFLT label | Pop word from stack and branch if it is less than zero |
0x9F | IF_ICMPEQ label | Pop two words from stack and branch if they are equal |
0x84 | IINC varnum_exp, byte_exp | Add a constant value to a local variable |
0x15 | ILOAD varnum_exp | Push local variable onto stack |
0xB6 | INVOKEVIRTUAL method | Invoke a method |
0x80 | IOR | Pop two words from stack; push Boolean OR |
0xAC | IRETURN | Return from method with integer value |
0x36 | ISTORE varnum_exp | Pop word from stack and store in local variable |
0x64 | ISUB | Pop two words from stack; push their difference |
0x13 | LDC_W constant_exp | Push constant from constant pool onto stack |
0x00 | NOP | Do nothing |
0x57 | POP | Delete word from top of stack |
0x5F | SWAP | Swap the two top words on the stack |
0xC4 | WIDE | Prefix instruction; next instruction has a 16-bit index |
The IJVM tools are supported on the Irix, Solaris and Linux platforms but should work on most other UNIX derivatives also. To install the tools a working C compiler must be available on the system.
Note: on the DAIMI machines, the IJVM tools have already been installed. To use them, type at the command prompt:
daimi-setup -a /users/kursus/dArk/dArk-toolsFor the changes to take effect, log out and log in again.
First, download the distribution file: ijvm-tools-0.7.tar.gz. Extract the files contained in the compressed archive, using the command:
gzip -cd ijvm-tools-0.7.tar.gz | tar xThis creates a directory called ijvm-tools-0.7 where all the source files reside.
Next, configure the package for the system in question. This is done by changing to the ijvm-tools-0.7 directory and running the configure script, by issuing the command:
./configureThe script will check that a C compiler and other tools are available and then create makefiles. It will also determine installation directories. Binary files are installed in /usr/local/bin as a default, but this can be changed using the --prefix option; see the file INSTALL in the distribution for details.
If the configure script ran successfully, then build the tools, using the command:
makeIf the build succeeded, the final installation of the tools is accomplished by:
make installThis will copy the binaries, ijvm-asm and ijvm to /usr/local/bin (unless something else is specified when configuring the tools).
At this point the tools should be installed and ready to use.
The IJVM tools are instantiated by means of a specification obtained from a file ijvm.spec. This file defines all the instructions that the IJVM tools implement: the assembler uses the specification to get a definition of the instructions ie. their names, opcodes and operandtypes; the simulator uses the specification to be able to disassemble the instructions.
The specification file defines the instructions using a fairly simple line-based format. Each line contains an instruction definition made up of three elements:
where opcode is the opcode in hexadecimal notation; mnemonic is the mnemonic code of the instruction (ie. the name of the instruction, eg. iadd) and the operand list specifies what types of operands the instruction expects. The following table explains the different types available and the resulting bytecode values that the assembler generate:
byte An operand that evaluates to an 8 bit signed integer value; the 8 bit signed value is the bytecode value generated. label A label defined within the current method; the 16 bit signed offset from the current opcode to the address of the label is the bytecode value generated. method The name of a method; the 16 bit unsigned index of the method in the constant pool is generated. varnum An 8 bit unsigned index, handled as the operand type byte. varnum-wide An 8 or 16 bit unsigned index; depending on the size of the index, either an 8 or 16 bit index is generated. Furthermore, if the index is greater than 255 a wide prefix opcode is generated before the current opcode. constant A 32 bit signed constant; the value is added to the constant pool, and the 16 bit unsigned index of the value in the constant pool is generated.
Thus, when bytecode for an instruction is generated, the assembler first inspects the actual operands to see if a wide prefix is needed. Then it generates the instruction opcode and then for each operand type it generates bytecode values as described above.
As an example the following is used as the default specification by the tools:
0x10 bipush byte 0x59 dup 0xA7 goto label 0x60 iadd 0x7E iand 0x99 ifeq label 0x9B iflt label 0x9F if_icmpeq label 0x84 iinc varnum, byte 0x15 iload varnum-wide 0xB6 invokevirtual method 0x80 ior 0xAC ireturn 0x36 istore varnum-wide 0x64 isub 0x13 ldc_w constant 0x00 nop 0x57 pop 0x5F swap 0xC4 wide
By default the tools will read a specification from the file ijvm.spec installed in /usr/local/share (this depends of course on the installation). Alternatively, by setting the environment variable IJVM_SPEC_FILE it is possible to make the tools read a different specification. If the shell is tcsh this is accomplished by:
If the shell is bash the following will do:setenv IJVM_SPEC_FILE my-ijvm.spec
Alternatively you can use the command line option -f to supply another specification file, eg.:export IJVM_SPEC_FILE=my-ijvm.spec
and likewise for the simulator.ijvm-asm -f my-ijvm.spec test-program.j
The specification file mechanism described above makes it easy to extend the IJVM instruction set. First, the specification is extended with definitions of the new instructions. Then, the simulator is extended to be able to execute the new instructions. This is done by adding a case-branch for each new instruction to the switch statement in the file ijvm.c and then rebuild the simulator. To make the new case-branch readable add a definition of a name for each opcode (by means of #define in the file ijvm-util.h) to be able to refer to the opcode using a symbolic name instead of a hex code.
To rebuild the simulator you will need either the full distribution or mini-ijvm.tar.gz, which is the minimum set of files required. To unpack the archive do:
tar xfz mini-ijvm.tar.gz
This will create a directory called mini-ijvm which contains the files:
Makefile ijvm-spec.c ijvm-spec.h ijvm-util.c ijvm-util.h ijvm.c ijvm.spec types.h
The file ijvm.c implement the fetch-decode-execute cycle of the simulator; ijvm-util.c, ijvm-util.h, ijvm-spec.c and ijvm-spec.h are auxiliary files that implement disassembling, output and specification file handling. The file Makefile controls how the simulator is built (see eg. the man page for make) and ijvm.spec is a copy of the default specification file, provided for convenience.
When the specification file, ijvm.c and ijvm-util.h have been changed, type make to rebuild the simulator. As a result, the new simulator is available as ijvm. To try the simulator it may be necessary to invoke it as ./ijvm, otherwise the shell will run the version installed on the system.
Here is a small example. Suppose we want to add the instruction iconst_0, which pushes the constant 0 onto the stack, ie. the equivalent of bipush 0. As opcode, 0x03 would be a reasonable choice, since this is the opcode in the JVM (see, the JVM Reference).
First, we add the line:
to the file ijvm.spec so that ijvm-asm will be able to handle iconst_0. Then we define a symbolic name for the opcode by adding the following line to ijvm-util.h:0x03 iconst_0
Finally, the actual implementation of iconst_0 is done by adding the following case-branch to ijvm.c:#define IJVM_OPCODE_ICONST_0 0x03
To test our implementation, we use this test program:case IJVM_OPCODE_ICONST_0: ijvm_push (i, 0); break;
Having set the environment variable to ./ijvm.spec as described above, we assemble the test program and get the following bytecode:.method main iconst_0 bipush 5 iadd ireturn
When we run this with the new simulator we get the result expected:main index: 0 method area: 9 bytes 00 01 00 00 03 10 05 60 ac constant pool: 1 words 00000000
IJVM Trace of - Thu Sep 30 22:12:01 1999 stack = 0, 0, 1 iconst_0 [03] stack = 0, 0, 0, 1 bipush 5 [10 05] stack = 5, 0, 0, 0, 1 iadd [60] stack = 5, 0, 0, 1 ireturn [ac] stack = 5 return value: 5