;   DESCRIPTION
;
;   This source file provides a simple, slightly slower 32-bit floating point
;   divide routine for the MSP430 processor family.  (It is slower because it
;   does not unroll the division loop.)
;
;   I have arranged things here to be similar to something a C compiler might
;   call and not necessarily how I may use it for an assembly-code-only TSB
;   system. But it is relatively easy to adapt and I had to pick some kind of
;   calling convention for illustration purposes.
;
;   I am also keeping closer to the IEEE 754 single precision format than the
;   HP 2116 floating point format here. This is just a starting concept point
;   for the purposes of asking a question here. It's not necessarily where I
;   will end up.
;
;
;   ROUNDING
;
;   A special note on rounding is in order.  There are several rounding modes
;   possible; such as round-towards-zero, round-down, round-up, and another
;   which is more commonly used ... round-to-nearest.  In the case of round-to-
;   nearest, this can be similar to what is often taught in school (where .5 is
;   always rounded up) or it can be a more statistically useful method called
;   round-to-nearest-even, which rounds towards the even value (up or down,
;   depending.)  The default assumption made by floating point libraries is to
;   use round-to-nearest-even.
;
;   This may sound a lot more complicated to implement than it really is, in a
;   subroutine like this one.  It turns out that it is impossible to have an
;   exact .5 to worry about.  Since it never can occur, a very simple rounding
;   choice is actually used in the following code without any need to refer to
;   the quotient.
;
;   For those interested in the mathematical argument about why, here it is.
;
;   There are two possible cases we need to cope with.  The division performed
;   is actually an integer one and conceptually shifts the dividend by either
;   24 or 25 bits to the left, depending on whether the divisor is larger or
;   smaller than the dividend.  The main idea here is that the leading bit out
;   of the division should be a '1' so that the quotient is already normalized.
;
;   All this hand-waving means we have two cases to contend with:
;
;       (1)    dividend * 2^24 / divisor = quotient + .5
;       (2)    dividend * 2^25 / divisor = quotient + .5
;
;   The .5 value shown in these is really the remainder/divisor, but where we
;   are assuming that it results in an exact .5 that we'd need to worry about,
;   if we are to round-to-nearest-even.  (We assume the .5 as an exact result
;   of computing remainder/divisor.)
;
;   This is then:
;
;       (1)    dividend * 2^24 = quotient * divisor + divisor/2
;       (2)    dividend * 2^25 = quotient * divisor + divisor/2
;
;   Now, these two can only be true if the divisor itself has its least
;   significant bit as '0', which we will assume is true for now.  Anyway, we
;   can multiply through by 2, to get:
;
;       (1)    dividend * 2^25 = 2 * quotient * divisor + divisor
;       (2)    dividend * 2^26 = 2 * quotient * divisor + divisor
;
;   Now, let's use the modulo function on both sides:
;
;       (1)    (2 * quotient * divisor + divisor) mod 2^25 = 0
;       (2)    (2 * quotient * divisor + divisor) mod 2^26 = 0
;
;   Now, just to reduce the cases to one, let's use instead mod 2^25 for the
;   second case.  It's still true that it will equal 0 and this helps to make
;   only one case to worry about.  The other one comes along for free, then:
;
;       (2 * quotient * divisor + divisor) mod 2^25 = 0
;
;   This must be true in both cases.  We can now extract the divisor, this way:
;
;       (divisor * [2 * quotient + 1]) mod 2^25 = 0
;
;   This is a product and it requires that the product produce at least 25 '0'
;   bits in the least significant bits of the product result.  This can only be
;   true if and only if there are >=25 least significant '0's present in the
;   divisor's least significant bits plus those also found in [2*quotient + 1].
;   But we can easily see that [2*quotient + 1] has no least signficant bits of
;   '0' -- shifting the quotient by one bit to the left and adding one pretty
;   much guarantees that the least significant bit is a '1', making that
;   contribution nil.  So this means that the divisor must have at least 25
;   least significant '0' bits.  But this is impossible, since the divisor in
;   a 32-bit FP value is only 24 bits in size.  One cannot provide a divisor
;   that can cause this to occur.
;
;   Thus, having to deal with an exact .5 isn't needed here.  So a simpler
;   rounding method is applied in this code -- namely, rounding anything .5
;   and greater, upwards.  Doing that works out to exactly the same thing as
;   is required in the rounding specification for round-to-nearest-even.
;
;
;   SPECIAL THANKS
;
;   I actually owe thanks to a lot of places.  This includes books on VHDL,
;   various web sites, and so on.  But I also owe a special debt to W.
;   Dijkstra for his thoughts about an earlier version I'd written.


;   --------------------------------------------------------------------------
;   fpdiv           32-bit floating-point divide
;   --------------------------------------------------------------------------
;
;   This routine divides a 32-bit floating point dividend by a 32-bit floating
;   point divisor, producing the 32-bit floating point quotient.
;
;   No support for denormals at this time.
;
;   Inputs:
;
;       R15:R14         32-bit floating-point dividend
;       4(SP):2(SP)     32-bit floating-point divisor
;
;   Outputs:
;
;       R15:R14 32-bit floating-point quotient
;
;   Scratches:
;
;       none
;
;   Stack Handling:
;
;       The passed parameters must be cleaned up by this routine.  The caller
;       does not do it.  In this sense, this is unlike a c function.
;
;   Detailed Discussion:
;
;   This routine uses non-restoring division.  If you aren't familiar with it,
;   consider this idea:
;
;   Call each step of the trial subtraction T[n].  At each step we want to test
;   the result of 2 times the prior remainder minus the divisor.  If the trial
;   subtraction succeeds, we just shift the remainder one bit to the left and
;   do the subtraction again.  So at each step our successful trial subtraction
;   it is something like this: ... T[i]=2*T[i-1]-D, T[i+1]=2*T[i]-D ... and so
;   on.  But if the trial subtraction fails, the divisor needs to added back
;   to the remainder before before shifting and doing the next subtraction.
;   Instead, this is T[i+1]=2*(T[i]+D)-D.  But this is just T[i+1]=2*T[i]+D.
;   Comparing these two results, you can see that the shift of T[i] still takes
;   place, but that in this case addition of the divisor is used on the next
;   step instead of subtraction.  Non-restoring division is named that way
;   because it replaces the addition+shift+subtraction used when subtraction
;   from the remainder fails, with shift+addition.  It alternates between
;   shift+subtract and shift+add, depending on the quotient bit generated.
;
;   The division loop is not unrolled here.
;
;   This code needs to produce 24 good mantissa bits as a quotient, where the
;   leading bit is '1', and we then need to also produce one final bit for
;   rounding purposes.  (If the leading bit -- the most significant one --
;   isn't a '1', then we'd lose significance because of the fixed count of
;   generated quotient bits.)
;
;   The actual division code loop is best examined as a coroutine.  Imagine
;   the division routine something like the following diagram.  Since program
;   text is usually viewed under fixed-point fonts, it should render as a
;   readable picture.  I apologize for any difficulties in interpreting it:
;
;             start
;               |
;               v
;           ,-------,
;           |frstbit|
;           '-------'                            ,-------,
;       ,-----> |                                |       |
;       |       v                                v       |
;       |   ,-------,                        ,-------,   |
;       |   |  sub  |                        |  add  |   |
;       |   '-------'----------,  ,----------'-------'   |
;       |  c=1  |        c=0    \/    c=1        |  c=0  |
;       |       v               /\               v       |
;       |   ,-------, <--------'  '--------> ,-------,   |
;       |   |  rlc  |                        |  rla  |   |
;       |   '-------'                        '-------'   |
;       |       |                                |       |
;       |       v                                v       |
;       |   ,-------,                        ,-------,   |
;       '<--|  dec  |                        |  dec  |-->'
;      z=0  '-------'                        '-------'  z=0
;               |  z=1                      z=1  |
;               v                                v
;           ,-------,                        ,-------,
;           | round1|                        | round0|
;           '-------'                        '-------'
;               |                                |
;               v                                v
;               '-------------> end <------------'
;
;   The 'sub' code performs a trial subtraction step and the 'add' code per-
;   forms the trial addition step.  The 'rlc' code shifts in the carry in the
;   case where the quotient bit is a '1'.  The 'rla' code shifts in a '0' bit
;   for the quotient.  The 'dec' code checks for the loop end.  'frstbit'
;   represents the generation of the first quotient bit.  This is special
;   because we know something about the relationship of dividend and divisor
;   (the dividend and divisor both have their MSB=1 and are within a factor of
;   two of each other, which as soon as the first subtraction takes place is no
;   longer known) and we must ensure the leading bit is '1' and may need to
;   adjust the exponent, plus init the loop counter.  The 'round1' and 'round0'
;   parts deal with differences in interpreting the final remainder for
;   rounding, depending on what the last generated quotient bit was.
;   --------------------------------------------------------------------------

fpdiv::
            ;   ----
            ;   Save registers we'll use throughout, below.
            ;   ----

                    push    R11             ; will hold exponent and sign
                    push    R10             ; temporary & partial quotient
                    push    R9              ; temporary & partial quotient

            ;   ----
            ;   Get the divisor into R13:R12.
            ;   ----

                    push    R13
                    push    R12

                    mov     14(SP), R13
                    mov     12(SP), R12

            ;   ----
            ;   Compute the sign of the quotient and extract both exponents.
            ;
            ;       R11     dividend excess exponent (upper byte is garbage)
            ;       R10     divisor excess exponent  (upper byte is garbage)
            ;       R9      quotient sign in least significant bit
            ;   ----

                ;   Handle dividend exponent and extract its sign.

                    mov     R15, R11        ; shift dividend exponent to upper
                    rla     R11             ;   8 bits of R11 and compute the
                    mov     SR, R9          ;   quotient's sign into R9
                    swpb    R11             ; flip exponent to lower 8 bits

                ;   Handle divisor exponent and compute final sign of quotient.

                    mov     R13, R10        ; shift divisor exponent to upper
                    rla     R10             ;   8 bits of R10 and copy its
                    xor     SR, R9          ;   sign (now in the carry) to R9
                    swpb    R10             ; flip exponent to lower 8 bits

            ;   ----
            ;   Now check for early-out conditions based on the exponents of
            ;   the dividend and divisor.  Remember that the upper byte of
            ;   each is garbage.  The order of the following tests are
            ;   important.  The following tests treat denormals as zero.
            ;
            ;   The tests are handled this way so that normal floating point
            ;   operations are as fast as possible.  Jumps are expensive, so
            ;   arithmetic operations are preferred here.
            ;   ----

                ;   Upper byte of dividend exponent is 0 for valid exponents.

                    add.b   #1, R11
                    sub     #2, R11

                ;   Upper byte of divisor exponent is 0 for valid exponents.

                    add.b   #1, R10
                    sub     #2, R10

                ;   Sign bit is set in either, only for exception cases.

                    bit     R10, R11
                    jn      fp32divs_except

                ;   At this point, the exponents are offset by -1.

            ;   ----
            ;   Compute the quotient's binary exponent, early.  Since both
            ;   exponents include the floating point excess notation along
            ;   with any above coded biases (-1, at this time), subtracting
            ;   the two from each other removes this excess from the result.
            ;
            ;   The floating point excess will need to be added back to the
            ;   quotient's exponent, later.  For now, it will be in signed
            ;   binary form.
            ;
            ;   The code here also pack the sign into the signed exponent to
            ;   free up R9 for other use, later on.
            ;
            ;       R11     preliminary, binary exponent of quotient
            ;       R10     freed for use
            ;       R9      freed for use
            ;   ----

                    sub     R10, R11        ; compute binary exponent value
                    rrc     R9              ; quotient sign into carry and then
                    rlc     R11             ;   to low order bit of exponent

            ;   ----
            ;   Extract the normalized mantissas of the dividend and divisor.
            ;   This includes inserting the hidden bit implied by the notation.
            ;   ----

                ;   R15:R14 dividend mantissa in MMMM:MM00 format.
                    
                    bis.b   #0x80, R15      ; add the hidden bit
                    xor     R14, R15        ;   then shift the
                    mov.b   R14, R14        ;   dividend up by 8
                    xor     R14, R15        ;   bits, placing zeros
                    swpb    R14             ;   into the lower-
                    swpb    R15             ;   most 8 bits and

                ;   R13:R12 divisor mantissa in MMMM:MM00 format.

                    bis.b   #0x80, R13      ; add the hidden bit and
                    xor     R12, R13        ;   then shift the
                    mov.b   R12, R12        ;   divisor up by 8
                    xor     R12, R13        ;   bits, placing zeros
                    swpb    R12             ;   into the lower-
                    swpb    R13             ;   most 8 bits and

            ;   ----
            ;   Before the division loop proper starts, we need to handle the
            ;   first quotient bit step a little differently.
            ;
            ;   The division loop improves execution time by not having to
            ;   check for significant bits shifted into the carry before doing
            ;   a trial subtraction (or addition.)  This means that it must
            ;   work entirely within the registers, and that means that the
            ;   divisor has a leading zero bit (the dividend does not require
            ;   this.)  So we already know we must shift the divisor down by 1
            ;   bit before we even start.  If dividend >= divisor, then we also
            ;   need to shift down the dividend, too.  Otherwise, no.  But all
            ;   this can be combined a bit, as the following code shows.
            ;
            ;   If it turns out that the dividend < divisor, then we must also
            ;   shift the dividend left by one bit, so dividend >= divisor.  In
            ;   this case, there is a -1 adjustment to the exponent to account
            ;   for this.
            ;
            ;   Inputs:
            ;       R15:R14         normalized dividend
            ;       R13:R12         normalized divisor
            ;
            ;   Outputs:
            ;       R15:R14         adjusted remainder
            ;       R13:R12         divisor, denormalized right by 1 bit
            ;       R11             exponent is updated, 
            ;   ----

                ;   Do the first subtraction.  If dividend >= divisor, then
                ;   the remainder is good and all that is left to do is to set
                ;   up the loop and plug in the first quotient bit and go.

                    sub     R12, R14
                    subc    R13, R15
                    jc      fp32divs_0

                ;   Otherwise, dividend < divisor.  So fix up the remainder as
                ;   though we'd shifted the dividend up 1 bit before the trial.

                    rla     R14
                    rlc     R15
                    add     R12, R14
                    addc    R13, R15        ; (carry = 1 here)

                ;   Adjust the exponent, since we effectively left-shifted the
                ;   dividend.  Note that R11 also holds quotient sign, so the
                ;   value subtracted is left shifted 1 bit, first.

                    sub     #(1 << 1), R11

                ;   Now, shift the divisor down by one bit.

fp32divs_0:         clrc
                    rrc     R13
                    rrc     R12

                ;   The first quotient bit is always '1'.

                    mov     #1, R10

                ;   Prepare the loop counter and start the division loop.

                    mov     #23, R9          ; set up the counter -- 23 to go

                ;   Quotient bit = '1' portion of coroutine.

fp32divs_1:         sub     R12, R14        ; perform trial subtraction for
                    subc    R13, R15        ;   next quotient bit
                    jnc     fp32divs_2a     ; bit is "0", go make it so
fp32divs_1a:        rlc     R10             ; bit is "1", shift it in
                    rlc     R14
                    rlc     R15
                    dec     R9              ; another 8 bits to go?
                    jnz     fp32divs_1      ; yes -- unpack these 8 and repeat

                ;   Loop ends on '1' bit generation.  Perform rounding as if
                ;   a final division step.

                    mov.b   R14, R9
                    bic     R9, R14
                    sub     R12, R14        ; perform trial subtraction
                    subc    R13, R15        ;   for rounding purposes
                    jmp     fp32divs_3      ; waste two cycles for code space

                ;   Quotient bit = '0' portion of coroutine.

fp32divs_2:         add     R12, R14        ; perform trial addition for
                    addc    R13, R15        ;   next quotient bit
                    jc      fp32divs_1a     ; bit is "1", go make it so
fp32divs_2a:        rla     R10             ; bit is "0", shift it in
                    rlc     R14
                    rlc     R15
                    dec     R9              ; another 8 bits to go?
                    jnz     fp32divs_2      ; yes -- unpack these 8 and repeat

                ;   Loop ends on '0' bit generation.  Perform rounding as if
                ;   a final division step.

                    mov.b   R14, R9
                    bic     R9, R14
                    add     R12, R14        ; perform trial addition for
                    addc    R13, R15        ;   rounding purposes

                ;   Regardless of whether the loop ended on '0' or '1' bit
                ;   generation, the rounding bit is in the carry.  Add it
                ;   into the quotient and move the quotient into R15:R14.

fp32divs_3:         addc    #0, R10         ; add rounding bit from carry
                    addc.b  #0, R9          ;   into the quotient in R9:R10
                    mov     R9, R15
                    mov     R10, R14

            ;   ----
            ;   Add excess notation to exponent and validate it.
            ;
            ;   The exponent, on arrival here, is in binary form.  It should
            ;   range from the minimum allowed of -126 to the maximum of 127.
            ;   Because of the excess notation, these values will range from
            ;
            ;   The exponent bias of 127 needs to be added.  This is handled
            ;   as part of the test for the minimum value of -126, but leaving
            ;   it just short by 1.  This is because the mantissa has a leading
            ;   bit that is 1 and when the exponent is added to pack up the
            ;   floating point value, this extra 1 is added back.
            ;
            ;   (Note that R11 still holds a combined signed exponent and the
            ;   sign of the operation.)
            ;
            ;   Inputs:
            ;       R11     signed exponent + sign, form: EEEE EEEE EEEE EEES
            ;
            ;   Outputs:
            ;       R11     updated exponent + sign, form: 0000 000E EEEE EEES
            ;   ----

                ;   Validate that the exponent is >= -126.  If not, the value
                ;   underlows to zero.  (Could denormal, if supported.)

                    add     #126 << 1, R11
                    jn      fp32divs_z

                ;   127 is the largest allowable exponent.  We've already added
                ;   126 to it, so the maximum allowable value is 127+126.

                    cmp     #(127+126+1) << 1, R11
                    jhs     fp32divs_i

            ;   ----
            ;   Pack the sign, exponent, and mantissa into the final floating-
            ;   point result, such that:
            ;
            ;       R15:R14 final floating point quotient
            ;   ----

                    rrc     R11             ; place exponent into position
                    swpb    R11             ;   with the sign bit
                    rrc     R11
                    add     R11, R15        ; merge into final quotient

            ;   ----
            ;   Restore registers.
            ;   ----

fp32divs_r:         pop     R12
                    pop     R13
                    pop     R9
                    pop     R10
                    pop     R11
                    mov     @SP, 4(SP)      ; move return address
                    add     #4, SP          ; and get rid of parameters
                    ret

            ;   ----
            ;   Handle exception cases.  At this point, we've determined that
            ;   either the dividend or the divisor or both is one of zero or
            ;   infinity.  Denormals are treated as zero by earlier code that
            ;   branches here, so any denormal handling (currently none) would
            ;   start here.
            ;
            ;   The value of an exponent here is 0xFFFF for 0.0 values and is
            ;   0xFFFE for INF values.  If the exponent isn't for one of these
            ;   two special values, then it is the usual excess notation for
            ;   floating point exponent, less 1.  (The test that arrives here
            ;   subtracts one from the usual excess notated exponent as part of
            ;   the test.)
            ;
            ;   Inputs:
            ;       R11             biased, excess notated dividend exponent
            ;       R10             biased, excess notated divisor exponent
            ;   ----

fp32divs_except:    add     #1, R10         ; check for divisor = 0.0 and
                    jeq     fp32divs_i      ;   return signed Inf, if so
                    add     #2, R11         ; check for dividend = Inf and
                    jeq     fp32divs_i      ;   return signed Inf, if so

            ;   ----
            ;   Return 0 result as quotient.
            ;   ----

fp32divs_z:         clr     R15             ; set 0.0 exponent and mantissa
fp32divs_s:         clr     R14             ; clear the low order mantiaa
                    jmp     fp32divs_r

            ;   ----
            ;   Return INF result as quotient.
            ;   ----

fp32divs_i:         mov     #0x7F80, R15    ; INF return
                    jmp     fp32divs_s


	.dbend