[video member=pervognsen stream_platform=twitch project=bitwise title="Logic Design, Part 7" vod_platform=youtube id=7MEVpbFGB8I annotator=Miblo] [0:07][Recap and set the stage for the day][:speech] [0:20][Review the off-stream implementation of the combined funnel shifter][:"logic design" :research] [2:09][Review the simple_shifter_unit()][:"logic design" :research] [3:02][Tweak the initialisation of s in right_shifter_radix2()][:"logic design"] [3:10][Continue to review simple_shifter_unit()][:"logic design" :research] [4:28][Set up to cover multipliers that may be almost as fast as adders][:"logic design" :speech] [8:24][The basic multiplier algorithm][:"logic design" :speech] [14:17][Introduce partial_products()][:"logic design"] [16:18][Introduce naive_multiplier() that simply sums partial products][:"logic design"] [17:44][Define Example33 as a multiplier module][:"logic design"] [19:30][Simulate our multiplier and fail the test][:emulation :"logic design" :run] [20:26][Check out our multiplier graph][:"debug visualisation" :"logic design" :run] [20:52][Fix partial_products() to correctly shift the result][:"logic design"] [21:04][Simulate our multiplier successfully][:emulation :"logic design" :run] [21:08][Simulate our multiplier successfully on signed values, noting the absence of "mulu" (for the lower bits) in RISC-V][:emulation :isa :"logic design" :run] [22:27][Check out the delay of our multiplier][:"logic design" :performance :run] [23:18][Change naive_multiplier() to binary reduce the circuit][:"logic design"] [23:37][Check out the (halved) delay of our binary reduced multiplier][:"logic design" :performance :run] [23:51][Instantiate an 8-bit multiplier and check out its delay][:"logic design" :performance :run] [24:31][Check out the delay of our original (not binary reduced) multiplier][:"logic design" :performance :run] [26:08][Carry-save adder][:"logic design" :optimisation :speech] [30:20][Introduce csa() as a carry-save adder][:"logic design" :optimisation] [32:09][Introduce array_multiadder() and array_multiplier()][:"logic design"] [35:27][Define Example34 as a carry-save adder][:"logic design"] [36:02][Simulate our carry-save adder and fail the test][:emulation :"logic design" :run] [36:28][@rygorous][@pervognsen 4\:2 compressor if you want the more symmetric (binary tree-like) structure later: "tmp, cout = add3(x,y,z)"; "sum, carry = add3(tmp, w, cin)" where the cout / cin are linked "horizontally"] [36:53][Fix sca() to pre-shift the second operand][:"logic design"] [37:38][Simulate our carry-save adder successfully][:emulation :"logic design" :run] [38:43][Check the delay of our carry-save adder][:"logic design" :performance :run] [41:53][Analyse the critical path of our carry-save adder][:"logic design" :performance :research] [44:09][Wallace tree multiply adder][:"logic design" :speech] [46:56][Introduce wallace_multiadder() and add2()][:"logic design"] [1:02:59][Define Example35 as a Wallace tree multiply adder, renaming wallace_multiadder() to wallace_tree_multiadder()][:"logic design"] [1:06:16][Simulate our Wallace tree multiply adder and fail the test][:emulation :"logic design" :run] [1:07:27][Check the graph of our Wallace tree multiply adder][:"debug visualisation" :"logic design" :run] [1:14:10][Introduce weighted_partial_products() to save our Wallace tree multiply adder having to pre-shift anything][:"logic design"] [1:15:43][Check the graph of our Wallace tree multiply adder without pre-shifting][:"debug visualisation" :"logic design" :run] [1:16:02][Encapsulate add2() into a module][:"logic design"] [1:17:12][Check the graph of our Wallace tree multiply adder with the Add2 module][:"debug visualisation" :"logic design" :run] [1:22:05][Trace wallace_tree_multiadder() in an effort to reveal our bug][:"logic design"] [1:25:22][Simulate our Wallace tree multiply adder, and inspect the trace][:emulation :"logic design" :run] [1:28:43][@rygorous][@pervognsen I don't really get why you're wiring this up at the individual-bit level to begin with? As in, why not build it out of full-word CSAs? That seems easier to reason about][:"logic design"] [1:29:25][Work through Wallace tree multiply addition, guided by our traced values][:"logic design" :speech] [1:32:13][Print the pending values in wallace_tree_multiadder()][:"logic design"] [1:34:00][Check out our pending values in the context of the full trace][:emulation :"logic design" :run] [1:35:03][A few words on @rygorous's suggestion to use full-word CSAs][:"logic design" :speech] [1:35:30][Continue to scrutinise wallace_tree_multiadder()][:"logic design" :research] [1:36:11][Prevent wallace_tree_multiadder() from shadowing variables][:"logic design"] [1:37:16][Simulate our Wallace tree multiply adder, still fail the test, but get to a second round][:emulation :"logic design" :run] [1:42:45][Check the graph of our Wallace tree multiply adder to see the same output routed to different locations][:"debug visualisation" :"logic design" :run] [1:44:11][Temporarily make trace() do nothing][:"logic design"] [1:44:43][@miotatsu][The graph is a gift that keeps on giving... except when the graph has bugs] [1:45:07][Check the graph and still see the same output routed to different locations][:"debug visualisation" :"logic design" :run] [1:45:32][Fix typo in the trace() calls in wallace_tree_multiadder()][:"logic design"] [1:45:42][Simulate our Wallace tree multiply adder successfully][:emulation :"logic design" :run] [1:46:47][Check the graph of our working Wallace tree multiply adder][:"debug visualisation" :"logic design" :run] [1:47:21][Instantiate 16-bit and 64-bit Wallace tree multiply adders and check out their delay][:"logic design" :performance :run] [1:49:30][A few words on 4\:2 compressors][:"logic design" :optimisation :speech] [1:51:54][@rygorous][Another piece of Real Good Stuff is booth recoding, which he didn't get to][:"logic design" :optimisation] [1:55:59][That's it for today][:speech] [/video]