FPU assembler programming

 FPU assembler programming




By Erik H. Bakke




Written  13/10-93






--- I ------------------------- INTRODUCTION ---------------------- I ---




1.1 Introduction




Many people have asked me to explain how to program the 68881/68882/040


floating point coprocessors, and here it is, a guide in the "magic art"


I have tried to keep this text as system neutral as possible, but it


may, as the other articles in this series be influenced by the fact


that I usually program at Amiga computers.


If you need more information about the topics discussed herein, please


contact the author.




1.2 Index




  Chapter I--------Introduction--------------------


  1.1  Introduction


  1.2  Index


  Chapter II-----The Coprocessor interface---------


  2.1  The Interface mechanics


  2.2  The Floating Point Coprocessor


  Chapter III----Floating Point Programming--------


  3.1  Floating Point Data Formats


  3.2  Floating Point Constant ROM


  3.3  Floating Point Instructions


       1...Data transfer instructions


       2...Dyadic operations


       3...Monadic operations


       4...Program control instructions


       5...System control instructions


  Chapter IV-------The 68040 FPU-------------------


  4.1  Differences


  4.2  Instruction set


  Chapter V------------Sources---------------------


  5.1  Sourcecodes








--- II -------------------THE COPROCESSOR INTERFACE -------------- II ---




2.1  The Interface Mechanics




A coprocessor may be thought of as an extension to the main CPU,


extending its register set and instructions.


Different coprocessors that can be interfaced to the 68020+ CPU's


are the 68881/2 FPU and 68851 MMU.


Coprocessor instructions are placed inline with ordinary CPU codes,


all recognized by being LINE-F instructions.  (Having the op-code


format of $Fxxx)  In assembler, they are generally noted as cpXXXX


instructions.




The coprocessors require a communication protocol with the CPU for


various reasons:




-----------------------------------------------------------


1.  The CPU must recognize that a coprocessor is to receive


    the LINE-F op-code, and establish contact with that


    coprocessor.




2.  The coprocessor may need to signal it's internal status


    to the CPU.




3.  The coprocessor may need to read/write data to/from


    system memory or CPU registers.




4.  The coprocessor may have to inform the CPU of error


    conditions, such as an illegal instruction or divide by


    zero.  The CPU will have to process the corresponding


    exceptions.


-----------------------------------------------------------




This protocol is called the MC68000 coprocessor interface.  Knowledge


of this interface is not required of a programmer who wishes to


utilize a coprocessor, therefore I will not go into specific detail


about the interface, but briefly sum up the main mechanisms.




  The coprocessor instructions are F-line instructions that have


  all bits in the upper nibble set to generate $Fxxx op-codes.


  Up to 8 coprocessors may reside on the bus (The Amiga coprocessors


  are NOT part of this system, and should not be counted in).


  Each of these co-processors have their own 3-bit address.


  Two such addresses are reserved by Motorola:


                         %000   MC68851 PMMU


                         %001   MC68881/2 FPU


  


  It is perfectly possible to install 6 FPU's in the same


  system.


  


  The general format of a coprocessor op-code is shown below:


  


  15    1211  9 8   6 5         0


  ================================


  1 1 1 1 Cp-ID Type  Instruction dependent


  


  Followed by a number of coprocessor defined extension words and


  effective address extension words.


  


  If the instruction is not accepted by the coprocessor it is


  addressed to (if the CP is not present) the CPU will take


  an F-line exception.




2.2 The Floating Point Coprocessor




  The Motorola floating point coprocessor has the number 68881 or


  68882.  The 68882 is considerably faster than the 881, due


  to optimized internal design.  In addition, the 68040 CPU has an


  internal FPU has is even faster than the 882.  There are other


  differences between the 881/2 and the 040, but I'll return to


  those later.  The 68881 and 68882 are pin compatible, and available


  in speeds up to 20Mhz and 50MHz respectively


  The FPU implements IEEE compatible floating point formats, and


  implements instructions to perform arithmetics on these formats,


  as well as several trancendental functions, such as SIN(x),E^x


  and so on.  In addition the FPU has an on-chip constant ROM where


  different mathematical constants are stored.




2.2.1  The floating point registers




  The FPU has 8 floating point registers, each 80 bits wide.


  These are named FP0-FP7, just as D0-D7 in the CPU.  In addition


  the FPU have 3 32-bit registers:


  


  Control Register       FPCR


  


  31.................15..........7..........0


                       Exeception    Mode


                        Enable      Control


  


  Status Register       FPSR


  


  31.......23........15..........7..........0


  Condition  Quotient   Accrued   Exception


    Codes              Exception   Status


  


  Instruction Address Register    FPIAR




  31.......23........15..........7..........0




2.2.1.1  Floating point data registers




  The data registers always contain an 80 bit wide extended precision


  floating point number.  Before any floating point data is used in


  calculation, it is converted to extended-precision.


  For example, the instruction   FMOVE.L #10,FP3  converts the


  longword #10 to extended precision before transferring it to register


  FP3.  All calculations with the FPU uses the internal registers


  as either source or destination, or both.


  


2.2.1.2  Floating Point Status Register




  This register is split in two bytes, the exception enable byte, and


  the mode control byte.




2.2.1.2.1  Exception Enable byte




  This register contains a bit for each of the possible eight


  exceptions that may be generated by the FPU.  Setting or clearing


  one of these bits will enable/disable the corresponding exception.




  The exception bytes are organized this way:


  


  Bit   Name    Meaning


  ========================


  7     BSUN    Branch/Set on UNordered


  6     SNAN    Signalling Not A Number


  5     OPERR   OPerand ERRor


  4     OVFL    OVerFLow


  3     UNFL    UNderFLow


  2     DZ      Divide by Zero


  1     INEX2   INEXact operation


  0     INEX1   INEXact decimal input




2.2.1.2.2  Mode control byte




  This register controls the rounding modes and rounding precisions.


  A result may be rounded or chopped to either double, single or


  extended precision.  For most usage of the FPU, however, this


  register could be set to all zeroes, which will round the result


  to the nearest extended precision value.


  Mode control byte:


  


  Bit   Name    Meaning


  ========================


  7     PREC1   Precision bit 1


  6     PREC0   Precision bit 0


  5     RND1    Rounding bit 1


  4     RND0    Rounding bit 0


  3     ----    -----


  2     ----    -----


  1     ----    -----


  0     ----    -----




  PREC=00  Round to extended precision


  PREC=01  Round to single precision


  PREC=10  Round to double precision


  


  RND=00   Round toward nearest possible number


  RND=01   Round toward zero


  RND=10   Round toward the smallest number


  RND=11   Round toward the highest number


  


2.2.1.3  Floating point Status Register




  This register is just what you may think it is, the parallell to


  the CPU CCR register, and reflects the status of the last


  floating point computation.  The quotient byte is used with


  floating point remaindering operations.  The exception status


  byte tells what exceptions that occured during the last operation.


  The accrued exception byte contains a bitmask of the exceptions


  that have occurred since the last time this field was cleared.




  Status bits:


  


  Bit   Name    Meaning


  =============================


  7     ----    -----


  6     ----    -----


  5     ----    -----


  4     ----    -----


  3     N       Negative


  2     Z       Zero


  1     I       Infinity


  0     NAN     Not A Number




2.2.1.4  Floating point Instruction Address Register




  This register contains the address of the instruction currently


  executing.  The FPU can execute instructions in parallell with


  the CPU, so that time-consuming instrcutions, such as division


  and multiplication don't tie up the CPU unnecessary.  This means


  that if an exception occurs in the floating point operation,


  the address that are pushed on to the stack is not necessarily


  the address of the instruction that caused the exception.


  The exception handler would have to read this register to find


  the address of the offending instruction.








--- III ----------------FLOATING POINT PROGRAMMING ------------- III ---






3.1  Floating Point Data Formats




  The FPU can handle 3 integer formats, and 2 IEEE compatible formats.


  In addition, it has an extended-precision format and can handle a


  Packed-Decimal Real format.




3.1.1  Integer Formats




  The 3 integer formats that are supported by the FPU are compatible


  with the formats used by the 68000 CPU's.  They are Byte (8 bits),


  Word (16 bits), and Longword (32 bits).




3.1.2  Real Formats




  The FPU supports 4 real formats, the Single precision (32 bits),


  Double precision (64 bits), Extended precision (80 bits), and


  Packed-decimal string (80 bits)




3.1.2.1 Single Precision




  The single precision format is indicated with the extension .S


  It consists of a 23-bit fraction, an 8-bit exponent, and 1 bit


  indicating the sign of the fraction.  The Single Precision format


  is defined by IEEE and uses excess-127 notation for the exponent.


  A hidden 1 is assumed as the most significant digit of the fraction.


  The format is defined as follows:


  


        30       22                    0


      S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF


      


      S=Sign of fraction


      E=Exponent


      F=Fraction


  


  The single precision format takes 4 bytes when written to memory.




3.1.2.2 Double Precision




  The double precision format is indicated with the extension .D


  It consists of a 52-bit fraction, an 11-bit exponent, and 1 bit


  indicating the sign of the fraction.  As the single precision,


  this format is also defined by the IEEE, and uses excess-1023


  notation for the exponent.  A hidden 1 is assumed as the most


  significant digit of the fraction.  The format is defined as


  follows:


  


        62          51                      0


      S EEEEEEEEEEE FFFFFFF........FFFFFFFFFF


      


      S=Sign of fraction


      E=Exponent


      F=Fraction


      


  The double precision format takes 8 bytes when written to memory.




3.1.2.3 Extended precision




  The extended precision is indicated with the extension .X


  This is the format that is used in all computations, and


  consists of a 64-bit mantissa, a 15-bit exponent and 1 bit


  indicating the sign of the mantissa.  A hidden 1 is not assumed,


  so all digits of the mantissa are present.  Excess-16383


  is used for the exponent.  When data of this format is written


  to memory, it is "exploded" by 16 zero-bits between the mantissa


  and the exponent in order to make it longword aligned.


  The extended-precision format is defined as follows:


  


    79              63                             0


  S EEEEEEEEEEEEEEE MMMMMMMMMMMMMMM............MMMMM


  


  When written to memory, it looks like this:


    94            80          63                   0


  S EEEEEEEEEEEEEEE 000...000 MMMMMMMM...........MMM


  


  When written to memory, this format takes 12 bytes when written


  to memory




3.1.2.4  Packed-decimal string




  To simplify input/output of floating point numbers, a special


  96-bit packed-decimal format.


  This format consists of 17 BCD digits mantissa, some padding


  bits, 4 BCD digits exponent, 2 control bits, 1 bit indicating


  the sign of the exponent, and 1 indicating the sign of the


  mantissa.


  Bits 68-79 are stored as zero bits unless an overflow occurs


  during the conversion.


  Positive and negative infinity is represented by numbers that are


  outside the range of the floating point representation used.


  If the result of an operation has no mathematical meaning, a NAN


  is produced.  In the case of a NAN or infinity, bits 92 and 93


  are both 1.




3.2  Floating point Constant ROM




  The FPU have an on-chip ROM where frequently used mathematical


  instructions are stored.  How to retrieve these constants will be


  discussed below.  Each constant has its own address in the ROM:


  


  Offset     Constant


  =============================


  $00        Pi


  $0b        Log10(2)


  $0c        e


  $0d        Log2(e)


  $0e        Log10(e)


  $0f        0.0


  $30        ln(2)


  $31        ln(10)


  $32        10^0


  $33        10^1


  $34        10^2


  $35        10^4


  .


  .


  .


  $3e        10^2048


  $3f        10^4096




3.3  Floating Point Instructions




  The FPU provides an extension to the normal 68000 instruction set.


  The floating point instructions can be divided into 5 groups:


  


     1...Data transfer instructions


     2...Dyadic instructions (two operands)


     3...Monadic instructions (one operand)


     4...Program control instructions


     5...System control instructions


     


  The syntax and addressing modes for these instructions are the same


  as those of the ordinary 68000 instructions.




3.3.1  Data Transfer Instructions




  Instruction Syntax     Op. Sizes      Operation


  ==================================================================


  FMOVE  FPm,FPn      | X             | The FMOVE instruction


  FMOVE  <ea>,FPn     | B/W/L/S/D/X/P | copies data from the


  FMOVE  FPm,<ea>     | B/W/L/S/D/X   | source operand to the


                      |               | destination operand


  ==================================================================


  FMOVE  FPm,<ea>(#k) | P             | When writing a .P real, an


  FMOVE  FPm,<ea>(Dn) | P             | optional rounding precision


                      |               | may be specified as a constant


                      |               | or in a data register


                      |               | See below for details


  ==================================================================


  FMOVE  <ea>,FPcr    | L             | These two FMOVE's copies


  FMOVE  FPcr,<ea>    | L             | data to/from control registers


  ==================================================================


  FMOVECR #ccc,FPn    | X             | This instruction retrieves a


                      |               | constant from the ROM, where


                      |               | ccc is the offset to be read


  ==================================================================


  FMOVEM <ea>,<list>  | L/X           | Moves multiple FP registers


  FMOVEM <ea>,Dn      | X             | The register list may be


  FMOVEM <list>,<ea>  | L/X           | specified as in 68000 assembler,


  FMOVEM Dn,<ea>      | X             | or be contained as a bitmask


                      |               | in a data register


  ==================================================================


  


  When writing floating point numbers to the memory using the .P


  format, an optional precision may be specified as a constant or


  in a data register.


  


  Meaning of the precision:


  


  -64<=k<=0  Rounded to |k| decimal places


  0<k<=17    Mantissa is rounded to k places


  


  Register bit mask of the MOVEM instruction:


  


  Addressing mode    Bit 7   Bit 6 -------- Bit 1   Bit 0


  =========================================================


  -(An)           |   FP7  |  FP6  |------|  FP1  |  FP0  |


  All other modes |   FP0  |  FP1  |------|  FP6  |  FP7  |


  =========================================================




3.3.2  Dyadic Operations




  Dyadic operations are operations computing with two operands.


  The first operand may be addressed with any addressing mode,


  while the second operand (destination) must be an FPU register.


  


  Instruction  Function


  ===================================================


  FADD       | Add two FP numbers


  FCMP       | Compare two FP numbers


  FDIV       | FP Division


  FMOD       | FP Modulo


  FMUL       | Multiply two FP numbers


  FREM       | Get IEEE remainder


  FSCALE     | Scale exponent


  FSGLDIV    | Single-precision Division


  FSGLMUL    | Single-precision Multiplication


  FSUB       | FP Subtraction


  ===================================================


  


  FSCALE adds the first operand to the exponent of the second


  operand.


  FREM returns the remainder of a division as specified by the IEEE


  definition.




3.3.3  Monadic Operations




  Monadic operations are operations computing with only one operand.


  The operand may be addressed with any addressing mode.


  The syntax of monadic operations are:


  


    Instruction.Size  <ea>,FPn


  


  Instruction  Function


  ====================================================


  FABS       | Absolute value


  FACOS      | FP Arc Cosine


  FASIN      | FP Arc Sine


  FATAN      | FP Arc Tangent


  FATANH     | FP Hyperbolic Arc Tangent


  FCOS       | FP Cosine


  FCOSH      | FP Hyperbolic Cosine


  FETOX      | FP e^x


  FETOXM1    | FP e^(x-1)


  FGETEXP    | Get exponent


  FGETMAN    | Get mantissa


  FINT       | FP Integer


  FINTRZ     | Get integer and round down


  FLOGN      | FP Ln(n)


  FLOGNP1    | FP Ln(n+1)


  FLOG10     | FP Log10(n)


  FLOG2      | FP Log2(n)


  FNEG       | Negate a floating point number


  FSIN       | FP Sine


  FSINH      | FP Hyperbolic Sine


  FSQRT      | FP Square Root


  FTAN       | FP Tangent


  FTANH      | FP Hyperbolic Tangent


  FTENTOX    | FP 10^x


  FTWOTOX    | FP 2^x


  ====================================================




  There is one more monadic operation that uses a double


  destination operand, the FSINCOS instruction:




  FSINCOS <ea>,FPc:FPs        Calculates sine and cosine


  FSINCOS FPm,FPc:FPs         of the same argument




  All trigonometric operations operate on values in


  radians.




3.3.4 Program Control Instructions




3.3.4.1  Instructions


  This group of instructions allows control of program flow based


  on condition codes generated by the FPU.  These instructions are


  parallells to the 68000 instructions with the same names.




  Instruction      Formats  Operation


  =====================================================================


  FBcc <Label>    | W/L   | Branch on FPU conditions


  FDBcc Dn,<Label>| W     | Test FPU conditions, decrement and branch


  FNOP            |       | No Operation   (Like NOP)


  FScc <ea>       | B     | Set on FPU conditions


  FTST <ea>       | All   | Test FP number at <ea>


  FTST FPn        | X     | Test FP number in FPn


  =====================================================================




3.3.4.2 Condition codes




  The FPU condition codes that may be acted upon are divided into two


  groups, with and without NAN exception.




3.3.4.2.1  Condition codes with NAN




  Symbol  Meaning


  ===========================================


  GE    | Greater or equal


  GL    | Greater or less


  GLE   | Greater, less or equal


  LE    | Less or equal


  LT    | Less


  NGE   | Not (greater or equal)


  NGL   | Not (greater or less)


  NGLE  | Not (greater, less or equal)


  NGT   | Not greater


  NLE   | Not (less or equal)


  NLT   | Not less


  SEQ   | Signalling equal


  SNE   | Signalling unequal


  SF    | Signalling Always FALSE


  ST    | Signalling Always TRUE


  ===========================================




3.3.4.2.2  Condition codes without NAN




  Symbol  Meaning


  ===========================================


  OGE   | Ordered and greater or equal


  OGL   | Ordered and greater or less


  OR    | Ordered


  OGT   | Ordered and greater


  OLE   | Ordered and less or equal


  OLT   | Ordered less


  UGE   | Unordered or greater or equal


  UEQ   | Unordered or equal


  UN    | Unordered


  UGT   | Unordered or greater


  ULE   | Unordered or less or equal


  ULT   | Unordered or less


  EQ    | Equal


  NE    | Unequal


  F     | Always FALSE


  T     | Always TRUE




3.3.4.2.3  Notes




  You might wonder why there are different symbols for (unequal) and


  (greater or less).


  Floating point numbers may represent an infinity, something that is


  impossible in the 68000 CPU.  In addition, they may represent illegal


  values (NAN).


  For detailed information on how these condition code symbols relate to the


  condition code flags, consult programming references for the 68881 FPU.




3.3.5  System Control Instructions




  This group consists of 3 instructions, FSAVE, FRESTORE and FTRAPcc.


  FSAVE and FRESTORE are priviliged instructions and are used to save


  and restore the FPU state frame to memory.




  FSAVE <ea>     Copies the internal registers to the specified state frame


  FRESTORE <ea>  Loads the internal registers with the specified state frame.




  The TRAPcc instruction can generate an exception dependant on the condition


  codes.




  FTRAPcc                        If the specified condition is true, TRAP.


  FTRAPcc #<data>.(W/L)






--- IV --------------------------- THE 68040 FPU ----------------- IV ---






4.1  Differences




  The FPU that is built-in in the 040, and indeed in the yet to come 060


  are highly optimized.  It omits some instructions that are found on


  the 68881/2, but the ones that are unimplemented, are usually emulated


  in software.  By avoiding the emulated instructions, a program can get


  a multiple speed increase when run on the 040.


  The 040 also omits the Packed Decimal format (.P)  When it is attempted


  written, the 040 will respond with an illegal format exception.




4.2  Instruction set of the FPU-40  (Or whatever it is called)




4.2.1  68881/2 instructions that are unimplemented




  Type        Instructions


  =================================================================


  Monadic    | FACOS,FASIN,FATAN,FATANH,FCOS,FCOSH,FETOX,FETOXM1,


             | FGETEXP,FGETMAN,FINT,FINTRZ,FLOG10,FLOG2,FLOGN,


             | FLOGNP1,FSIN,FSINCOS,FSINH,FTAN,FTANH,FTENTOX,


             | FTWOTOX


  Dyadic     | FMOD,FREM,FSCAL,FSGLDIV,FSGLMUL


  Transfer   | FMOVECR


  =================================================================




4.2.2  68881/2 instructions that WORK on an 040




  FABS     ADD       FBcc      FCMP      FDBcc     FDIV      FMOVE


  FMOVEM   FMUL      FNEG      FNOP      FRESTORE  FSAVE     FScc


  FSQRT    FSUB      FTRAPcc   FTST






--- V ----------------------------- SOURCES ----------------------- V ---




5.1  Sourcecodes




  There are no sourcecodes in this text, but in the same archive as you


  got this file there should be an assembler source for a julia fractal


  using the FPU.


  The program is a very simple one, and doesn't use the most advanced


  operations, but illustrates clearly how to program for the FPU.


  The program is written for Amiga computers with AGA chipset and at


  least OS 2.0, but it should be easy to degrade to earlier Amiga


  versions and to other platforms.






************************************************************************************************


*             This text was written by Erik H. Bakke 14/10-93   © 1993 Bakke SoftDev           *


*                                                                                              *


* This text is freely redistributable as long as all files are kept together and in unmodified *


* form.                                                                                        *


*             Permission is granted to include this text in the HowToCode archive              *


************************************************************************************************


*  Error corrections, comments and questions should be directed to the author:                 *


*                                                                                              *


*                               e-mail:   db36@hp825.bih.no                                    *


*                               phone:    +47-5630-5537   (13:00-21:00 GMT)                    *


*                               Post:     Erik H. Bakke                                        *


*                                         Bjørnen                                              *


*                                         N-5227 SØRE NESET                                    *


*                                         NORWAY                                               *


************************************************************************************************


  

Comments

Popular posts from this blog

BOTTOM LIVE script

Evidence supporting quantum information processing in animals

ARMIES OF CHAOS