FPU assembler programming

By Erik H. Bakke

Written 13/10-93

--- I ------------------------- INTRODUCTION ---------------------- I ---

1.1 Introduction

Many people have asked me to explain how to program the 68881/68882/040

floating point coprocessors, and here it is, a guide in the "magic art"

I have tried to keep this text as system neutral as possible, but it

may, as the other articles in this series be influenced by the fact

that I usually program at Amiga computers.

If you need more information about the topics discussed herein, please

contact the author.

1.2 Index

Chapter I--------Introduction--------------------

1.1 Introduction

1.2 Index

Chapter II-----The Coprocessor interface---------

2.1 The Interface mechanics

2.2 The Floating Point Coprocessor

Chapter III----Floating Point Programming--------

3.1 Floating Point Data Formats

3.2 Floating Point Constant ROM

3.3 Floating Point Instructions

1...Data transfer instructions

2...Dyadic operations

3...Monadic operations

4...Program control instructions

5...System control instructions

Chapter IV-------The 68040 FPU-------------------

4.1 Differences

4.2 Instruction set

Chapter V------------Sources---------------------

5.1 Sourcecodes

--- II -------------------THE COPROCESSOR INTERFACE -------------- II ---

2.1 The Interface Mechanics

A coprocessor may be thought of as an extension to the main CPU,

extending its register set and instructions.

Different coprocessors that can be interfaced to the 68020+ CPU's

are the 68881/2 FPU and 68851 MMU.

Coprocessor instructions are placed inline with ordinary CPU codes,

all recognized by being LINE-F instructions. (Having the op-code

format of $Fxxx) In assembler, they are generally noted as cpXXXX

instructions.

The coprocessors require a communication protocol with the CPU for

various reasons:

-----------------------------------------------------------

1. The CPU must recognize that a coprocessor is to receive

the LINE-F op-code, and establish contact with that

coprocessor.

2. The coprocessor may need to signal it's internal status

to the CPU.

3. The coprocessor may need to read/write data to/from

system memory or CPU registers.

4. The coprocessor may have to inform the CPU of error

conditions, such as an illegal instruction or divide by

zero. The CPU will have to process the corresponding

exceptions.

-----------------------------------------------------------

This protocol is called the MC68000 coprocessor interface. Knowledge

of this interface is not required of a programmer who wishes to

utilize a coprocessor, therefore I will not go into specific detail

about the interface, but briefly sum up the main mechanisms.

The coprocessor instructions are F-line instructions that have

all bits in the upper nibble set to generate $Fxxx op-codes.

Up to 8 coprocessors may reside on the bus (The Amiga coprocessors

are NOT part of this system, and should not be counted in).

Each of these co-processors have their own 3-bit address.

Two such addresses are reserved by Motorola:

%000 MC68851 PMMU

%001 MC68881/2 FPU

It is perfectly possible to install 6 FPU's in the same

system.

The general format of a coprocessor op-code is shown below:

15 1211 9 8 6 5 0

================================

1 1 1 1 Cp-ID Type Instruction dependent

Followed by a number of coprocessor defined extension words and

effective address extension words.

If the instruction is not accepted by the coprocessor it is

addressed to (if the CP is not present) the CPU will take

an F-line exception.

2.2 The Floating Point Coprocessor

The Motorola floating point coprocessor has the number 68881 or

68882. The 68882 is considerably faster than the 881, due

to optimized internal design. In addition, the 68040 CPU has an

internal FPU has is even faster than the 882. There are other

differences between the 881/2 and the 040, but I'll return to

those later. The 68881 and 68882 are pin compatible, and available

in speeds up to 20Mhz and 50MHz respectively

The FPU implements IEEE compatible floating point formats, and

implements instructions to perform arithmetics on these formats,

as well as several trancendental functions, such as SIN(x),E^x

and so on. In addition the FPU has an on-chip constant ROM where

different mathematical constants are stored.

2.2.1 The floating point registers

The FPU has 8 floating point registers, each 80 bits wide.

These are named FP0-FP7, just as D0-D7 in the CPU. In addition

the FPU have 3 32-bit registers:

Control Register FPCR

31.................15..........7..........0

Exeception Mode

Enable Control

Status Register FPSR

31.......23........15..........7..........0

Condition Quotient Accrued Exception

Codes Exception Status

Instruction Address Register FPIAR

31.......23........15..........7..........0

2.2.1.1 Floating point data registers

The data registers always contain an 80 bit wide extended precision

floating point number. Before any floating point data is used in

calculation, it is converted to extended-precision.

For example, the instruction FMOVE.L #10,FP3 converts the

longword #10 to extended precision before transferring it to register

FP3. All calculations with the FPU uses the internal registers

as either source or destination, or both.

2.2.1.2 Floating Point Status Register

This register is split in two bytes, the exception enable byte, and

the mode control byte.

2.2.1.2.1 Exception Enable byte

This register contains a bit for each of the possible eight

exceptions that may be generated by the FPU. Setting or clearing

one of these bits will enable/disable the corresponding exception.

The exception bytes are organized this way:

Bit Name Meaning

========================

7 BSUN Branch/Set on UNordered

6 SNAN Signalling Not A Number

5 OPERR OPerand ERRor

4 OVFL OVerFLow

3 UNFL UNderFLow

2 DZ Divide by Zero

1 INEX2 INEXact operation

0 INEX1 INEXact decimal input

2.2.1.2.2 Mode control byte

This register controls the rounding modes and rounding precisions.

A result may be rounded or chopped to either double, single or

extended precision. For most usage of the FPU, however, this

to the nearest extended precision value.

Mode control byte:

Bit Name Meaning

========================

7 PREC1 Precision bit 1

6 PREC0 Precision bit 0

5 RND1 Rounding bit 1

4 RND0 Rounding bit 0

3 ---- -----

2 ---- -----

1 ---- -----

0 ---- -----

PREC=00 Round to extended precision

PREC=01 Round to single precision

PREC=10 Round to double precision

RND=00 Round toward nearest possible number

RND=01 Round toward zero

RND=10 Round toward the smallest number

RND=11 Round toward the highest number

2.2.1.3 Floating point Status Register

This register is just what you may think it is, the parallell to

the CPU CCR register, and reflects the status of the last

floating point computation. The quotient byte is used with

floating point remaindering operations. The exception status

byte tells what exceptions that occured during the last operation.

The accrued exception byte contains a bitmask of the exceptions

that have occurred since the last time this field was cleared.

Status bits:

Bit Name Meaning

=============================

7 ---- -----

6 ---- -----

5 ---- -----

4 ---- -----

3 N Negative

2 Z Zero

1 I Infinity

0 NAN Not A Number

2.2.1.4 Floating point Instruction Address Register

This register contains the address of the instruction currently

executing. The FPU can execute instructions in parallell with

the CPU, so that time-consuming instrcutions, such as division

and multiplication don't tie up the CPU unnecessary. This means

that if an exception occurs in the floating point operation,

the address that are pushed on to the stack is not necessarily

the address of the instruction that caused the exception.

The exception handler would have to read this register to find

the address of the offending instruction.

--- III ----------------FLOATING POINT PROGRAMMING ------------- III ---

3.1 Floating Point Data Formats

The FPU can handle 3 integer formats, and 2 IEEE compatible formats.

In addition, it has an extended-precision format and can handle a

Packed-Decimal Real format.

3.1.1 Integer Formats

The 3 integer formats that are supported by the FPU are compatible

with the formats used by the 68000 CPU's. They are Byte (8 bits),

Word (16 bits), and Longword (32 bits).

3.1.2 Real Formats

The FPU supports 4 real formats, the Single precision (32 bits),

Double precision (64 bits), Extended precision (80 bits), and

Packed-decimal string (80 bits)

3.1.2.1 Single Precision

The single precision format is indicated with the extension .S

It consists of a 23-bit fraction, an 8-bit exponent, and 1 bit

indicating the sign of the fraction. The Single Precision format

is defined by IEEE and uses excess-127 notation for the exponent.

A hidden 1 is assumed as the most significant digit of the fraction.

The format is defined as follows:

30 22 0

S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF

S=Sign of fraction

E=Exponent

F=Fraction

The single precision format takes 4 bytes when written to memory.

3.1.2.2 Double Precision

The double precision format is indicated with the extension .D

It consists of a 52-bit fraction, an 11-bit exponent, and 1 bit

indicating the sign of the fraction. As the single precision,

this format is also defined by the IEEE, and uses excess-1023

notation for the exponent. A hidden 1 is assumed as the most

significant digit of the fraction. The format is defined as

follows:

62 51 0

S EEEEEEEEEEE FFFFFFF........FFFFFFFFFF

S=Sign of fraction

E=Exponent

F=Fraction

The double precision format takes 8 bytes when written to memory.

3.1.2.3 Extended precision

The extended precision is indicated with the extension .X

This is the format that is used in all computations, and

consists of a 64-bit mantissa, a 15-bit exponent and 1 bit

indicating the sign of the mantissa. A hidden 1 is not assumed,

so all digits of the mantissa are present. Excess-16383

is used for the exponent. When data of this format is written

to memory, it is "exploded" by 16 zero-bits between the mantissa

and the exponent in order to make it longword aligned.

The extended-precision format is defined as follows:

79 63 0

S EEEEEEEEEEEEEEE MMMMMMMMMMMMMMM............MMMMM

When written to memory, it looks like this:

94 80 63 0

S EEEEEEEEEEEEEEE 000...000 MMMMMMMM...........MMM

When written to memory, this format takes 12 bytes when written

to memory

3.1.2.4 Packed-decimal string

To simplify input/output of floating point numbers, a special

96-bit packed-decimal format.

This format consists of 17 BCD digits mantissa, some padding

bits, 4 BCD digits exponent, 2 control bits, 1 bit indicating

the sign of the exponent, and 1 indicating the sign of the

mantissa.

Bits 68-79 are stored as zero bits unless an overflow occurs

during the conversion.

Positive and negative infinity is represented by numbers that are

outside the range of the floating point representation used.

If the result of an operation has no mathematical meaning, a NAN

is produced. In the case of a NAN or infinity, bits 92 and 93

are both 1.

3.2 Floating point Constant ROM

The FPU have an on-chip ROM where frequently used mathematical

instructions are stored. How to retrieve these constants will be

discussed below. Each constant has its own address in the ROM:

Offset Constant

=============================

$00 Pi

$0b Log10(2)

$0c e

$0d Log2(e)

$0e Log10(e)

$0f 0.0

$30 ln(2)

$31 ln(10)

$32 10^0

$33 10^1

$34 10^2

$35 10^4

$3e 10^2048

$3f 10^4096

3.3 Floating Point Instructions

The FPU provides an extension to the normal 68000 instruction set.

The floating point instructions can be divided into 5 groups:

1...Data transfer instructions

2...Dyadic instructions (two operands)

3...Monadic instructions (one operand)

4...Program control instructions

5...System control instructions

The syntax and addressing modes for these instructions are the same

as those of the ordinary 68000 instructions.

3.3.1 Data Transfer Instructions

Instruction Syntax Op. Sizes Operation

==================================================================

FMOVE FPm,FPn | X | The FMOVE instruction

FMOVE <ea>,FPn | B/W/L/S/D/X/P | copies data from the

FMOVE FPm,<ea> | B/W/L/S/D/X | source operand to the

| | destination operand

==================================================================

FMOVE FPm,<ea>(#k) | P | When writing a .P real, an

FMOVE FPm,<ea>(Dn) | P | optional rounding precision

| | may be specified as a constant

| | or in a data register

| | See below for details

==================================================================

FMOVE <ea>,FPcr | L | These two FMOVE's copies

FMOVE FPcr,<ea> | L | data to/from control registers

==================================================================

FMOVECR #ccc,FPn | X | This instruction retrieves a

| | constant from the ROM, where

| | ccc is the offset to be read

==================================================================

FMOVEM <ea>,<list> | L/X | Moves multiple FP registers

FMOVEM <ea>,Dn | X | The register list may be

FMOVEM <list>,<ea> | L/X | specified as in 68000 assembler,

FMOVEM Dn,<ea> | X | or be contained as a bitmask

| | in a data register

==================================================================

When writing floating point numbers to the memory using the .P

format, an optional precision may be specified as a constant or

in a data register.

Meaning of the precision:

-64<=k<=0 Rounded to |k| decimal places

0<k<=17 Mantissa is rounded to k places

Addressing mode Bit 7 Bit 6 -------- Bit 1 Bit 0

=========================================================

-(An) | FP7 | FP6 |------| FP1 | FP0 |

All other modes | FP0 | FP1 |------| FP6 | FP7 |

=========================================================

3.3.2 Dyadic Operations

Dyadic operations are operations computing with two operands.

The first operand may be addressed with any addressing mode,

while the second operand (destination) must be an FPU register.

Instruction Function

===================================================

FADD | Add two FP numbers

FCMP | Compare two FP numbers

FDIV | FP Division

FMOD | FP Modulo

FMUL | Multiply two FP numbers

FREM | Get IEEE remainder

FSCALE | Scale exponent

FSGLDIV | Single-precision Division

FSGLMUL | Single-precision Multiplication

FSUB | FP Subtraction

===================================================

FSCALE adds the first operand to the exponent of the second

operand.

FREM returns the remainder of a division as specified by the IEEE

definition.

3.3.3 Monadic Operations

Monadic operations are operations computing with only one operand.

The operand may be addressed with any addressing mode.

The syntax of monadic operations are:

Instruction.Size <ea>,FPn

Instruction Function

====================================================

FABS | Absolute value

FACOS | FP Arc Cosine

FASIN | FP Arc Sine

FATAN | FP Arc Tangent

FATANH | FP Hyperbolic Arc Tangent

FCOS | FP Cosine

FCOSH | FP Hyperbolic Cosine

FETOX | FP e^x

FETOXM1 | FP e^(x-1)

FGETEXP | Get exponent

FGETMAN | Get mantissa

FINT | FP Integer

FINTRZ | Get integer and round down

FLOGN | FP Ln(n)

FLOGNP1 | FP Ln(n+1)

FLOG10 | FP Log10(n)

FLOG2 | FP Log2(n)

FNEG | Negate a floating point number

FSIN | FP Sine

FSINH | FP Hyperbolic Sine

FSQRT | FP Square Root

FTAN | FP Tangent

FTANH | FP Hyperbolic Tangent

FTENTOX | FP 10^x

FTWOTOX | FP 2^x

====================================================

There is one more monadic operation that uses a double

destination operand, the FSINCOS instruction:

FSINCOS <ea>,FPc:FPs Calculates sine and cosine

FSINCOS FPm,FPc:FPs of the same argument

All trigonometric operations operate on values in

radians.

3.3.4 Program Control Instructions

3.3.4.1 Instructions

This group of instructions allows control of program flow based

on condition codes generated by the FPU. These instructions are

parallells to the 68000 instructions with the same names.

Instruction Formats Operation

=====================================================================

FBcc <Label> | W/L | Branch on FPU conditions

FDBcc Dn,<Label>| W | Test FPU conditions, decrement and branch

FNOP | | No Operation (Like NOP)

FScc <ea> | B | Set on FPU conditions

FTST <ea> | All | Test FP number at <ea>

FTST FPn | X | Test FP number in FPn

=====================================================================

3.3.4.2 Condition codes

The FPU condition codes that may be acted upon are divided into two

groups, with and without NAN exception.

3.3.4.2.1 Condition codes with NAN

Symbol Meaning

===========================================

GE | Greater or equal

GL | Greater or less

GLE | Greater, less or equal

LE | Less or equal

LT | Less

NGE | Not (greater or equal)

NGL | Not (greater or less)

NGLE | Not (greater, less or equal)

NGT | Not greater

NLE | Not (less or equal)

NLT | Not less

SEQ | Signalling equal

SNE | Signalling unequal

SF | Signalling Always FALSE

ST | Signalling Always TRUE

===========================================

3.3.4.2.2 Condition codes without NAN

Symbol Meaning

===========================================

OGE | Ordered and greater or equal

OGL | Ordered and greater or less

OR | Ordered

OGT | Ordered and greater

OLE | Ordered and less or equal

OLT | Ordered less

UGE | Unordered or greater or equal

UEQ | Unordered or equal

UN | Unordered

UGT | Unordered or greater

ULE | Unordered or less or equal

ULT | Unordered or less

EQ | Equal

NE | Unequal

F | Always FALSE

T | Always TRUE

3.3.4.2.3 Notes

You might wonder why there are different symbols for (unequal) and

(greater or less).

Floating point numbers may represent an infinity, something that is

impossible in the 68000 CPU. In addition, they may represent illegal

values (NAN).

For detailed information on how these condition code symbols relate to the

condition code flags, consult programming references for the 68881 FPU.

3.3.5 System Control Instructions

This group consists of 3 instructions, FSAVE, FRESTORE and FTRAPcc.

FSAVE and FRESTORE are priviliged instructions and are used to save

and restore the FPU state frame to memory.

FSAVE <ea> Copies the internal registers to the specified state frame

FRESTORE <ea> Loads the internal registers with the specified state frame.

The TRAPcc instruction can generate an exception dependant on the condition

codes.

FTRAPcc If the specified condition is true, TRAP.

FTRAPcc #<data>.(W/L)

--- IV --------------------------- THE 68040 FPU ----------------- IV ---

4.1 Differences

The FPU that is built-in in the 040, and indeed in the yet to come 060

are highly optimized. It omits some instructions that are found on

the 68881/2, but the ones that are unimplemented, are usually emulated

in software. By avoiding the emulated instructions, a program can get

a multiple speed increase when run on the 040.

The 040 also omits the Packed Decimal format (.P) When it is attempted

written, the 040 will respond with an illegal format exception.

4.2 Instruction set of the FPU-40 (Or whatever it is called)

4.2.1 68881/2 instructions that are unimplemented

Type Instructions

=================================================================

Monadic | FACOS,FASIN,FATAN,FATANH,FCOS,FCOSH,FETOX,FETOXM1,

| FGETEXP,FGETMAN,FINT,FINTRZ,FLOG10,FLOG2,FLOGN,

| FLOGNP1,FSIN,FSINCOS,FSINH,FTAN,FTANH,FTENTOX,

| FTWOTOX

Dyadic | FMOD,FREM,FSCAL,FSGLDIV,FSGLMUL

Transfer | FMOVECR

=================================================================

4.2.2 68881/2 instructions that WORK on an 040

FABS ADD FBcc FCMP FDBcc FDIV FMOVE

FMOVEM FMUL FNEG FNOP FRESTORE FSAVE FScc

FSQRT FSUB FTRAPcc FTST

--- V ----------------------------- SOURCES ----------------------- V ---

5.1 Sourcecodes

There are no sourcecodes in this text, but in the same archive as you

got this file there should be an assembler source for a julia fractal

using the FPU.

The program is a very simple one, and doesn't use the most advanced

operations, but illustrates clearly how to program for the FPU.

The program is written for Amiga computers with AGA chipset and at

least OS 2.0, but it should be easy to degrade to earlier Amiga

versions and to other platforms.

************************************************************************************************

* *

* This text is freely redistributable as long as all files are kept together and in unmodified *

* form. *

* Permission is granted to include this text in the HowToCode archive *

************************************************************************************************

* Error corrections, comments and questions should be directed to the author: *

* *

* e-mail: db36@hp825.bih.no *

* phone: +47-5630-5537 (13:00-21:00 GMT) *

* Post: Erik H. Bakke *

* Bjørnen *

* N-5227 SØRE NESET *

* NORWAY *

************************************************************************************************

Search This Blog

Strange Texts

FPU assembler programming

Comments

Post a Comment

Popular posts from this blog

Evidence supporting quantum information processing in animals

The Book of Wisdom and Folly

ARMIES OF CHAOS