; DESCRIPTION ; ; This source file provides a simple, slightly slower 32-bit floating point ; divide routine for the MSP430 processor family. (It is slower because it ; does not unroll the division loop.) ; ; I have arranged things here to be similar to something a C compiler might ; call and not necessarily how I may use it for an assembly-code-only TSB ; system. But it is relatively easy to adapt and I had to pick some kind of ; calling convention for illustration purposes. ; ; I am also keeping closer to the IEEE 754 single precision format than the ; HP 2116 floating point format here. This is just a starting concept point ; for the purposes of asking a question here. It's not necessarily where I ; will end up. ; ; ; ROUNDING ; ; A special note on rounding is in order. There are several rounding modes ; possible; such as round-towards-zero, round-down, round-up, and another ; which is more commonly used ... round-to-nearest. In the case of round-to- ; nearest, this can be similar to what is often taught in school (where .5 is ; always rounded up) or it can be a more statistically useful method called ; round-to-nearest-even, which rounds towards the even value (up or down, ; depending.) The default assumption made by floating point libraries is to ; use round-to-nearest-even. ; ; This may sound a lot more complicated to implement than it really is, in a ; subroutine like this one. It turns out that it is impossible to have an ; exact .5 to worry about. Since it never can occur, a very simple rounding ; choice is actually used in the following code without any need to refer to ; the quotient. ; ; For those interested in the mathematical argument about why, here it is. ; ; There are two possible cases we need to cope with. The division performed ; is actually an integer one and conceptually shifts the dividend by either ; 24 or 25 bits to the left, depending on whether the divisor is larger or ; smaller than the dividend. The main idea here is that the leading bit out ; of the division should be a '1' so that the quotient is already normalized. ; ; All this hand-waving means we have two cases to contend with: ; ; (1) dividend * 2^24 / divisor = quotient + .5 ; (2) dividend * 2^25 / divisor = quotient + .5 ; ; The .5 value shown in these is really the remainder/divisor, but where we ; are assuming that it results in an exact .5 that we'd need to worry about, ; if we are to round-to-nearest-even. (We assume the .5 as an exact result ; of computing remainder/divisor.) ; ; This is then: ; ; (1) dividend * 2^24 = quotient * divisor + divisor/2 ; (2) dividend * 2^25 = quotient * divisor + divisor/2 ; ; Now, these two can only be true if the divisor itself has its least ; significant bit as '0', which we will assume is true for now. Anyway, we ; can multiply through by 2, to get: ; ; (1) dividend * 2^25 = 2 * quotient * divisor + divisor ; (2) dividend * 2^26 = 2 * quotient * divisor + divisor ; ; Now, let's use the modulo function on both sides: ; ; (1) (2 * quotient * divisor + divisor) mod 2^25 = 0 ; (2) (2 * quotient * divisor + divisor) mod 2^26 = 0 ; ; Now, just to reduce the cases to one, let's use instead mod 2^25 for the ; second case. It's still true that it will equal 0 and this helps to make ; only one case to worry about. The other one comes along for free, then: ; ; (2 * quotient * divisor + divisor) mod 2^25 = 0 ; ; This must be true in both cases. We can now extract the divisor, this way: ; ; (divisor * [2 * quotient + 1]) mod 2^25 = 0 ; ; This is a product and it requires that the product produce at least 25 '0' ; bits in the least significant bits of the product result. This can only be ; true if and only if there are >=25 least significant '0's present in the ; divisor's least significant bits plus those also found in [2*quotient + 1]. ; But we can easily see that [2*quotient + 1] has no least signficant bits of ; '0' -- shifting the quotient by one bit to the left and adding one pretty ; much guarantees that the least significant bit is a '1', making that ; contribution nil. So this means that the divisor must have at least 25 ; least significant '0' bits. But this is impossible, since the divisor in ; a 32-bit FP value is only 24 bits in size. One cannot provide a divisor ; that can cause this to occur. ; ; Thus, having to deal with an exact .5 isn't needed here. So a simpler ; rounding method is applied in this code -- namely, rounding anything .5 ; and greater, upwards. Doing that works out to exactly the same thing as ; is required in the rounding specification for round-to-nearest-even. ; ; ; SPECIAL THANKS ; ; I actually owe thanks to a lot of places. This includes books on VHDL, ; various web sites, and so on. But I also owe a special debt to W. ; Dijkstra for his thoughts about an earlier version I'd written. ; -------------------------------------------------------------------------- ; fpdiv 32-bit floating-point divide ; -------------------------------------------------------------------------- ; ; This routine divides a 32-bit floating point dividend by a 32-bit floating ; point divisor, producing the 32-bit floating point quotient. ; ; No support for denormals at this time. ; ; Inputs: ; ; R15:R14 32-bit floating-point dividend ; 4(SP):2(SP) 32-bit floating-point divisor ; ; Outputs: ; ; R15:R14 32-bit floating-point quotient ; ; Scratches: ; ; none ; ; Stack Handling: ; ; The passed parameters must be cleaned up by this routine. The caller ; does not do it. In this sense, this is unlike a c function. ; ; Detailed Discussion: ; ; This routine uses non-restoring division. If you aren't familiar with it, ; consider this idea: ; ; Call each step of the trial subtraction T[n]. At each step we want to test ; the result of 2 times the prior remainder minus the divisor. If the trial ; subtraction succeeds, we just shift the remainder one bit to the left and ; do the subtraction again. So at each step our successful trial subtraction ; it is something like this: ... T[i]=2*T[i-1]-D, T[i+1]=2*T[i]-D ... and so ; on. But if the trial subtraction fails, the divisor needs to added back ; to the remainder before before shifting and doing the next subtraction. ; Instead, this is T[i+1]=2*(T[i]+D)-D. But this is just T[i+1]=2*T[i]+D. ; Comparing these two results, you can see that the shift of T[i] still takes ; place, but that in this case addition of the divisor is used on the next ; step instead of subtraction. Non-restoring division is named that way ; because it replaces the addition+shift+subtraction used when subtraction ; from the remainder fails, with shift+addition. It alternates between ; shift+subtract and shift+add, depending on the quotient bit generated. ; ; The division loop is not unrolled here. ; ; This code needs to produce 24 good mantissa bits as a quotient, where the ; leading bit is '1', and we then need to also produce one final bit for ; rounding purposes. (If the leading bit -- the most significant one -- ; isn't a '1', then we'd lose significance because of the fixed count of ; generated quotient bits.) ; ; The actual division code loop is best examined as a coroutine. Imagine ; the division routine something like the following diagram. Since program ; text is usually viewed under fixed-point fonts, it should render as a ; readable picture. I apologize for any difficulties in interpreting it: ; ; start ; | ; v ; ,-------, ; |frstbit| ; '-------' ,-------, ; ,-----> | | | ; | v v | ; | ,-------, ,-------, | ; | | sub | | add | | ; | '-------'----------, ,----------'-------' | ; | c=1 | c=0 \/ c=1 | c=0 | ; | v /\ v | ; | ,-------, <--------' '--------> ,-------, | ; | | rlc | | rla | | ; | '-------' '-------' | ; | | | | ; | v v | ; | ,-------, ,-------, | ; '<--| dec | | dec |-->' ; z=0 '-------' '-------' z=0 ; | z=1 z=1 | ; v v ; ,-------, ,-------, ; | round1| | round0| ; '-------' '-------' ; | | ; v v ; '-------------> end <------------' ; ; The 'sub' code performs a trial subtraction step and the 'add' code per- ; forms the trial addition step. The 'rlc' code shifts in the carry in the ; case where the quotient bit is a '1'. The 'rla' code shifts in a '0' bit ; for the quotient. The 'dec' code checks for the loop end. 'frstbit' ; represents the generation of the first quotient bit. This is special ; because we know something about the relationship of dividend and divisor ; (the dividend and divisor both have their MSB=1 and are within a factor of ; two of each other, which as soon as the first subtraction takes place is no ; longer known) and we must ensure the leading bit is '1' and may need to ; adjust the exponent, plus init the loop counter. The 'round1' and 'round0' ; parts deal with differences in interpreting the final remainder for ; rounding, depending on what the last generated quotient bit was. ; -------------------------------------------------------------------------- fpdiv:: ; ---- ; Save registers we'll use throughout, below. ; ---- push R11 ; will hold exponent and sign push R10 ; temporary & partial quotient push R9 ; temporary & partial quotient ; ---- ; Get the divisor into R13:R12. ; ---- push R13 push R12 mov 14(SP), R13 mov 12(SP), R12 ; ---- ; Compute the sign of the quotient and extract both exponents. ; ; R11 dividend excess exponent (upper byte is garbage) ; R10 divisor excess exponent (upper byte is garbage) ; R9 quotient sign in least significant bit ; ---- ; Handle dividend exponent and extract its sign. mov R15, R11 ; shift dividend exponent to upper rla R11 ; 8 bits of R11 and compute the mov SR, R9 ; quotient's sign into R9 swpb R11 ; flip exponent to lower 8 bits ; Handle divisor exponent and compute final sign of quotient. mov R13, R10 ; shift divisor exponent to upper rla R10 ; 8 bits of R10 and copy its xor SR, R9 ; sign (now in the carry) to R9 swpb R10 ; flip exponent to lower 8 bits ; ---- ; Now check for early-out conditions based on the exponents of ; the dividend and divisor. Remember that the upper byte of ; each is garbage. The order of the following tests are ; important. The following tests treat denormals as zero. ; ; The tests are handled this way so that normal floating point ; operations are as fast as possible. Jumps are expensive, so ; arithmetic operations are preferred here. ; ---- ; Upper byte of dividend exponent is 0 for valid exponents. add.b #1, R11 sub #2, R11 ; Upper byte of divisor exponent is 0 for valid exponents. add.b #1, R10 sub #2, R10 ; Sign bit is set in either, only for exception cases. bit R10, R11 jn fp32divs_except ; At this point, the exponents are offset by -1. ; ---- ; Compute the quotient's binary exponent, early. Since both ; exponents include the floating point excess notation along ; with any above coded biases (-1, at this time), subtracting ; the two from each other removes this excess from the result. ; ; The floating point excess will need to be added back to the ; quotient's exponent, later. For now, it will be in signed ; binary form. ; ; The code here also pack the sign into the signed exponent to ; free up R9 for other use, later on. ; ; R11 preliminary, binary exponent of quotient ; R10 freed for use ; R9 freed for use ; ---- sub R10, R11 ; compute binary exponent value rrc R9 ; quotient sign into carry and then rlc R11 ; to low order bit of exponent ; ---- ; Extract the normalized mantissas of the dividend and divisor. ; This includes inserting the hidden bit implied by the notation. ; ---- ; R15:R14 dividend mantissa in MMMM:MM00 format. bis.b #0x80, R15 ; add the hidden bit xor R14, R15 ; then shift the mov.b R14, R14 ; dividend up by 8 xor R14, R15 ; bits, placing zeros swpb R14 ; into the lower- swpb R15 ; most 8 bits and ; R13:R12 divisor mantissa in MMMM:MM00 format. bis.b #0x80, R13 ; add the hidden bit and xor R12, R13 ; then shift the mov.b R12, R12 ; divisor up by 8 xor R12, R13 ; bits, placing zeros swpb R12 ; into the lower- swpb R13 ; most 8 bits and ; ---- ; Before the division loop proper starts, we need to handle the ; first quotient bit step a little differently. ; ; The division loop improves execution time by not having to ; check for significant bits shifted into the carry before doing ; a trial subtraction (or addition.) This means that it must ; work entirely within the registers, and that means that the ; divisor has a leading zero bit (the dividend does not require ; this.) So we already know we must shift the divisor down by 1 ; bit before we even start. If dividend >= divisor, then we also ; need to shift down the dividend, too. Otherwise, no. But all ; this can be combined a bit, as the following code shows. ; ; If it turns out that the dividend < divisor, then we must also ; shift the dividend left by one bit, so dividend >= divisor. In ; this case, there is a -1 adjustment to the exponent to account ; for this. ; ; Inputs: ; R15:R14 normalized dividend ; R13:R12 normalized divisor ; ; Outputs: ; R15:R14 adjusted remainder ; R13:R12 divisor, denormalized right by 1 bit ; R11 exponent is updated, ; ---- ; Do the first subtraction. If dividend >= divisor, then ; the remainder is good and all that is left to do is to set ; up the loop and plug in the first quotient bit and go. sub R12, R14 subc R13, R15 jc fp32divs_0 ; Otherwise, dividend < divisor. So fix up the remainder as ; though we'd shifted the dividend up 1 bit before the trial. rla R14 rlc R15 add R12, R14 addc R13, R15 ; (carry = 1 here) ; Adjust the exponent, since we effectively left-shifted the ; dividend. Note that R11 also holds quotient sign, so the ; value subtracted is left shifted 1 bit, first. sub #(1 << 1), R11 ; Now, shift the divisor down by one bit. fp32divs_0: clrc rrc R13 rrc R12 ; The first quotient bit is always '1'. mov #1, R10 ; Prepare the loop counter and start the division loop. mov #23, R9 ; set up the counter -- 23 to go ; Quotient bit = '1' portion of coroutine. fp32divs_1: sub R12, R14 ; perform trial subtraction for subc R13, R15 ; next quotient bit jnc fp32divs_2a ; bit is "0", go make it so fp32divs_1a: rlc R10 ; bit is "1", shift it in rlc R14 rlc R15 dec R9 ; another 8 bits to go? jnz fp32divs_1 ; yes -- unpack these 8 and repeat ; Loop ends on '1' bit generation. Perform rounding as if ; a final division step. mov.b R14, R9 bic R9, R14 sub R12, R14 ; perform trial subtraction subc R13, R15 ; for rounding purposes jmp fp32divs_3 ; waste two cycles for code space ; Quotient bit = '0' portion of coroutine. fp32divs_2: add R12, R14 ; perform trial addition for addc R13, R15 ; next quotient bit jc fp32divs_1a ; bit is "1", go make it so fp32divs_2a: rla R10 ; bit is "0", shift it in rlc R14 rlc R15 dec R9 ; another 8 bits to go? jnz fp32divs_2 ; yes -- unpack these 8 and repeat ; Loop ends on '0' bit generation. Perform rounding as if ; a final division step. mov.b R14, R9 bic R9, R14 add R12, R14 ; perform trial addition for addc R13, R15 ; rounding purposes ; Regardless of whether the loop ended on '0' or '1' bit ; generation, the rounding bit is in the carry. Add it ; into the quotient and move the quotient into R15:R14. fp32divs_3: addc #0, R10 ; add rounding bit from carry addc.b #0, R9 ; into the quotient in R9:R10 mov R9, R15 mov R10, R14 ; ---- ; Add excess notation to exponent and validate it. ; ; The exponent, on arrival here, is in binary form. It should ; range from the minimum allowed of -126 to the maximum of 127. ; Because of the excess notation, these values will range from ; ; The exponent bias of 127 needs to be added. This is handled ; as part of the test for the minimum value of -126, but leaving ; it just short by 1. This is because the mantissa has a leading ; bit that is 1 and when the exponent is added to pack up the ; floating point value, this extra 1 is added back. ; ; (Note that R11 still holds a combined signed exponent and the ; sign of the operation.) ; ; Inputs: ; R11 signed exponent + sign, form: EEEE EEEE EEEE EEES ; ; Outputs: ; R11 updated exponent + sign, form: 0000 000E EEEE EEES ; ---- ; Validate that the exponent is >= -126. If not, the value ; underlows to zero. (Could denormal, if supported.) add #126 << 1, R11 jn fp32divs_z ; 127 is the largest allowable exponent. We've already added ; 126 to it, so the maximum allowable value is 127+126. cmp #(127+126+1) << 1, R11 jhs fp32divs_i ; ---- ; Pack the sign, exponent, and mantissa into the final floating- ; point result, such that: ; ; R15:R14 final floating point quotient ; ---- rrc R11 ; place exponent into position swpb R11 ; with the sign bit rrc R11 add R11, R15 ; merge into final quotient ; ---- ; Restore registers. ; ---- fp32divs_r: pop R12 pop R13 pop R9 pop R10 pop R11 mov @SP, 4(SP) ; move return address add #4, SP ; and get rid of parameters ret ; ---- ; Handle exception cases. At this point, we've determined that ; either the dividend or the divisor or both is one of zero or ; infinity. Denormals are treated as zero by earlier code that ; branches here, so any denormal handling (currently none) would ; start here. ; ; The value of an exponent here is 0xFFFF for 0.0 values and is ; 0xFFFE for INF values. If the exponent isn't for one of these ; two special values, then it is the usual excess notation for ; floating point exponent, less 1. (The test that arrives here ; subtracts one from the usual excess notated exponent as part of ; the test.) ; ; Inputs: ; R11 biased, excess notated dividend exponent ; R10 biased, excess notated divisor exponent ; ---- fp32divs_except: add #1, R10 ; check for divisor = 0.0 and jeq fp32divs_i ; return signed Inf, if so add #2, R11 ; check for dividend = Inf and jeq fp32divs_i ; return signed Inf, if so ; ---- ; Return 0 result as quotient. ; ---- fp32divs_z: clr R15 ; set 0.0 exponent and mantissa fp32divs_s: clr R14 ; clear the low order mantiaa jmp fp32divs_r ; ---- ; Return INF result as quotient. ; ---- fp32divs_i: mov #0x7F80, R15 ; INF return jmp fp32divs_s .dbend