------------------------------- Page    i -------------------------------

                     UTS Assembler Reference Manual

------------------------------- Page   ii -------------------------------

                            TABLE OF CONTENTS


Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1

1.    Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1

2.    Lexical Conventions . . . . . . . . . . . . . . . . . . . . . .   1

3.    Segments  . . . . . . . . . . . . . . . . . . . . . . . . . . .   3

4.    Location Counter  . . . . . . . . . . . . . . . . . . . . . . .   3

5.    Statements  . . . . . . . . . . . . . . . . . . . . . . . . . .   3

6.    Expressions . . . . . . . . . . . . . . . . . . . . . . . . . .   4

7.    Pseudo-ops  . . . . . . . . . . . . . . . . . . . . . . . . . .   5

8.    Machine Instructions  . . . . . . . . . . . . . . . . . . . . .   6

9.    Differences from 370 Assembler  . . . . . . . . . . . . . . . .   8

References  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   9


                                                            Last Page   9

-------------------------------- Page  1 --------------------------------

ABSTRACT

This document describes the  UTS assembler.  The  language accepted is  a
hybrid of 370 assembler language and the UNIX- assembler language for the
PDP-11.  Much  knowledge of  UTS  and some  of assembling  in general  is
assumed here.  The assembler is called as under  UTS, and it is an  ordi-
nary two-pass assembler without macro capabilities.




1.    USAGE

As is invoked by

     as [-u] [-o file] file1 ...

The optional -u directs all undefined symbols to be treated as  external;
-o specifies a file  for the assembler  output.  The remaining  arguments
are the files to be  assembled.  They are  concatenated and assembled  as
one logical entity.  If -o is not specified, the output  is left in a.out
in the current directory.




2.    LEXICAL CONVENTIONS

Assembler tokens include:

Identifiers
     The usual sequence of  alphanumeric characters plus  period (.),  at
     sign (@), pound sign (#), dollar sign ($), and underscore  (_).  The
     first character may not be numeric and only the first eight  charac-
     ters are significant.

Temporary symbols
     A digit followed by  a f or  b, they are  discussed under labels  in
     section 5.

Constants and Literals
     Constants are specified with the following syntax:

          [mult factor] type [L length modifier] data
_______________
  -UNIX is a trademark of Bell Laboratories.

-------------------------------- Page  2 --------------------------------

     The multiple factor if non-numeric must be parenthesized.  The types
     are:

          f    full word numeric
          h    half word numeric
          b    binary
          x    hexadecimal
          c    character
          a    full word address
          y    half word address
          v    external address
          s    explicit base-displacement

     In the  first 4,  the  data is  a  number of  the  appropriate  base
     enclosed in single quotes.  For c type constants, the desired  char-
     acters are within single  quotes.  The character  pairs '\n',  '\t',
     '\r', '\b',  and  '\0'  are  translated to  newline,  tab,  carriage
     return, backspace, and null respectively.  For a, y, and v type con-
     stants, the data is an  expression in parentheses.  For s type  con-
     stants, the data is either  an expression in  parentheses or of  the
     form (displacement(base)).

     Literals are specified by an equals sign (=) followed by a constant.

Operators
     All machine operations (even diagnose) are recognized regardless  of
     input in upper or lower case, but they are not reserved words.

Comments
     A / character begins a comment, which continues until the end of the
     line.

Known symbols
     If not otherwise defined the symbols r0, r1, ..., r15 are defined as
     absolute quantities with values  0, 1, ..., 15 respectively.   Also,
     sp is absolute 13, and .text, .data, and .bss refer to the beginning
     of the text, data, and bss segments respectively.

Other
     Blanks and tabs can  appear anywhere except  inside of tokens;  they
     also are required to  separate identifiers.  A backslash (\)  forces
     the character to the assembler without being tokenized, so to get  a
     / for division one has to use \/.

-------------------------------- Page  3 --------------------------------

3.    SEGMENTS

The assembler generates code in  3 segments:  text,  data, and bss.   The
text segment  is the  one in  which the  assembler begins,  and is  where
instructions are generally placed.  UTS will, if told, trap write  opera-
tions into the text  segment.  This protection  can be done by the  link-
editor ld using the -n flag.  A single copy of the text segment is shared
by all processes executing such a program.

The data segment is for initialized data or for data or text which may be
modified.  If the text segment is not protected the data begins after the
text when loaded, otherwise it begins at the first 64K boundary after the
text segment.

The bss segment is for uninitialized data.   It may not contain any  code
or initialized data.  It begins  immediately after the data segment  when
the program is loaded, but  occupies no space in  the output file.   When
loaded, all of the bss segment is zeroed.




4.    LOCATION COUNTER

The symbol .  is the  location counter.   Its value  at any  time is  the
offset within the appropriate  segment of the  start of the statement  in
which it appears.  A  value may be  assigned to the  location counter  to
change its value, but the assignment must be within the current segment.




5.    STATEMENTS

A source program is composed of a sequence of statements.  Statements are
separated either by newlines or  by semicolons.  There are four kinds  of
statements:  null, define constants or storage, assignment, and operation
statements.  Any statement may be preceded by one or more labels.

Labels
     Labels are:

          identifier:
          n:

     where n is a  digit.  A  digit defines temporary  symbols which  are

-------------------------------- Page  4 --------------------------------

     referenced by nf or nb.   A label is assigned the current value  and
     type of the location counter.  Several numeric labels with the  same
     digit may be used within the same assembly.  References of  the form
     nf refer to the first label named n forward from the reference while
     nb refers to the  first label named  n backward from the  reference.
     Temporary symbols are not generated in the output symbol table.  For
     both types  of  labels,  the  value is  aligned  appropriately  with
     respect to the next non-null statement.

Null statements
     An empty statement with or without labels.

Define constants or storage
     The dc pseudo-op followed by one or more constants separated by com-
     mas places the assembled constants (properly aligned) into the  out-
     put.  The ds pseudo-op followed by a constant specification  without
     the data portion  places the  correct number of  zeroed bytes  (also
     properly aligned) into the output.

Assignment
     An identifier, an equals sign (=), and an expression.  The value and
     type of the expression are assigned to the identifier.

Operation
     a machine or pseudo opcode followed by the appropriate arguments.




6.    EXPRESSIONS

Binary operators (usual precedence)
     +    addition
     -    subtraction
     *    multiplication
     \/   division

Unary operators
     +    plus
     -    minus
     l'   length of

Types
     The assembler sees the following types in expressions:

     undefined
          not specified in the program, usually an error

-------------------------------- Page  5 --------------------------------

     undefined external
          declared external with entry, extrn or v type constant

     absolute
          a constant or a symbol ultimately assigned to a constant

     text
          a text segment symbol

     data
          a data segment symbol

     bss
          a bss segment symbol

     external text, data, bss
          symbols declared as entries or in v type constants

Rules of combination
     Below are the legal combinations and the type they yield.  All other
     combinations generate an error.

          absolute [+-*/] absolute    --> absolute
          absolute [+-*/] anything    --> anything
          anything - same anything    --> absolute




7.    PSEUDO-OPS

cnop n1,n2
     Forces alignment equivalent to .=(.+n2-1)/n2*n2+n1 where n2 must  be
     4 or 8, and n1 must be 0, 2, 4, or 6.

using exp0, exp1[, exp2]*
     Exp0 becomes a usable base address, exp1 is an absolute quantity  in
     the range 0-15 which specifies the register to be used.   Similarly,
     exp2 specifies a register for exp0+4096, exp3 for exp0+8192, etc.

drop [exp [, exp] ]
     Undoes a previous  using for  the registers denoted  by the  expres-
     sions.  If no registers are given, all registers are so dropped.

ltorg
     Forces a dump of the literal pool at the current location.

end

-------------------------------- Page  6 --------------------------------

     Does a ltorg and changes to the text segment.

entry, extrn name [, name]*
     Sets the external bit of the specified name's type.

.text, .data, .bss
     These three cause assembling  to begin in  the specified segment  at
     the location where it  previously left that segment.  Initially  the
     assembler assumes it's in the text segment.

.comm name, exp
     Name is given type undefined external,  and exp as it's value.   The
     link-editor takes undefined externals, if not defined elsewhere  and
     having a non-zero value, and  defines them at  the beginning of  the
     bss segment of the output file taking their value as a length.

ccw exp1, exp2, exp3, exp4
     Generates an aligned double word with one byte exp1, three bytes for
     exp2, one byte for exp3, one byte null, and two bytes for exp4.  All
     expressions except exp2 must be absolute.




8.    MACHINE INSTRUCTIONS

There are 12 instruction types.  Each  is listed below with it's  operand
syntax and a list of  the instructions of that type.  For the operands  r
is a constant ranging from 0-15, c1 from 1-16, c2 from 1-256, c3 from  0-
9; e is any  expression; and a is an  address, short for either e,  e(r),
e(r,r), or e(,r).

RR      r
     spm     svc

RR      r,r
     adr     bctr    hdv     lner    lter    or
     aer     cdr     her     lnr     ltr     sdr
     alr     cer     isk     lpdr    mdv     ser
     ar      clcl    lcdr    lper    mer     slr
     aur     clr     lcer    lpr     mr      sr
     awr     cr      lcr     lr      mvcl    ssk
     axr     ddr     ldr     lrdr    mxdr    sur
     balr    der     ler     lrer    mxr     swr
     bcr     dr      lndr    ltdr    nr      sxr

RX      r,a
     a       bc      cvd     ld      mxd     st

-------------------------------- Page  7 --------------------------------

     ad      bct     d       le      mxr     stc
     ae      c       dd      lh      n       std
     ah      cd      de      lra     o       ste
     al      ce      ex      m       s       sth
     au      ch      ic      md      sd      su
     aw      cl      l       me      se      sw
     bal     cvb     la      mh      sl      x

RS1     r,r,a
     bxh     cs      lm      stm
     bxle    diag    sigp
     cds     icm     stcm
     clm     lctl    stctl

RS2     r,a
     sla     sra
     slda    srda
     sldl    srdl
     sll     srl

SI1     a,c2
     cli     oi      tm
     mc      rdd     wrd
     mvi     stnsm   xi
     ni      stosm

SI2     a
     lpsw    ts
     ssm

S       [a]
     clrio   rrb     spka    stckc   tch
     hdv     sck     spt     stidc   tio
     hio     sckc    spx     stidp
     ipk     sio     stap    stpt
     ptlb    siof    stck    stpx

SS1     e(c1,r),e(c1,r)
        e(c1),e(c1)
     ap      mp      sp
     cp      mvo     unpk
     dp      pack    zap

SS2     e(c2,r),a
        e(c2),a
     clc     mvn     tr
     ed      mvz     trt
     edmk    nc      xc
     mvc     oc

-------------------------------- Page  8 --------------------------------

SS3     e(c1,r),a,c3
        e(c1),a,c3
        a,a,c3
     srp

BRANCH  a
     b       bm      bnm     bo
     be      bne     bno     bp
     bh      bnh     bnp     bz
     bl      bnl     bnz     nop

BRANCHR r
     ber     bner    bnor    bpr
     bhr     bnhr    bnpr    br
     blr     bnlr    bnzr    bzr
     bmr     bnmr    bor     nopr




9.    DIFFERENCES FROM 370 ASSEMBLER

Below are listed the  characteristics of the  new assembler which  differ
from the  standard 370  assembler language.   No mention is  made of  new
features which have no counterpart in the old.

  *  input can be in either upper or lower case

  *  labels must be identifiers followed by colons and need not begin  in
     column 1

  *  statements may begin in column 1

  *  comments always begin with a / and continue to the end of the line

  *  No conditional assembly or macro processing is done

  *  the assignment operator = replaces equ

  *  the location  counter is  .  rather than  * and  can  be  explicitly
     assigned to, thereby performing the function of an org

  *  the pseudo-ops dxd, cxd,  org, eject, copy,  com, equ,  opsyn,  pop,
     print, punch, push, repro, start, and title do not exist.

  *  q, e, d, l, p, and z type constants do not exist

  *  name csect is written .csect name; similarly for dsect

-------------------------------- Page  9 --------------------------------

  *  an end statement is not required, when  an EOF is found, the  assem-
     bler organizes the outstanding literals at the end of the  text seg-
     ment.

  *  implicit lengths (as in mvc) don't work, assemble to 0.




REFERENCES

 [1]  D. M. Richie, UNIX Assembler Reference Manual.
 [2]  as(1), cc(1), ld(1), a.out(3f), in UTS Programmer's Manual.
 [3]  IBM 370 Principles of Operation.
