cinera_handmade.network/cmuratori/hero/code/code114.hmml

[video output=day114 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Preparing a Function for Optimization" vod_platform=youtube id=_vkI9BedvKA annotator=dspecht annotator=Miblo]
[1:31][Open things up and recap]
[2:48][DrawRectangleSlowly: Increase efficiency]
[3:33][Create DrawRectangleHopefullyQuickly]
[4:34][DrawRectangleHopefullyQuickly: Skip the preamble]
[5:42][Remove all unnecessary code]
[6:44][Look at what's happening]
[8:01][Make the edge testing code more explicit]
[9:49][Blackboard: See what's happening with these inner products]
[12:04][DrawRectangleHopefullyQuickly: Test U and V instead]
[13:12][Run the game]
[13:33][Make these U and V computations more efficient]
[14:40][Run the game and ensure that everything still blits fine]
[15:16][Continue pruning]
[18:02][Flatten the routine]
[19:55][Blow out v4 Blended into scalar form]
[21:18][Take a close look at the routine and precompute InvTexelA]
[23:35][Blow out v4 Dest and Texel into scalar form]
[25:30][Flatten BilinearSample and SRGBBilinearBlend]
[28:02][Assess our situation]
[28:55][Unpack and optimise the Lerps]
[33:57][Run the game and annotate the code]
[35:33][Flatten SRGB255ToLinear1]
[36:38][Flatten Unpack4x8]
[38:59][That's everything flattened]
[39:22][Note that the code is faster]
[40:58][We have a nasty problem with the unpackings]
[44:01][Blackboard: What is our "wide" strategy?]
[48:43][Set the stage for SIMD]
[50:45][Consider solidifying texture boundaries]
[51:53][Leave it for today]
[53:09][Q&A][:speech]
[53:28][@braincruser][The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?]
[56:42][@stelar7][Why did you write float instead of real32 this stream?]
[57:14][@stelar7][Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?]
[58:06][@garryjohanson][Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?]
[59:04][@g3rain1][Aren't those square roots pretty expensive?[ref
    site="Intel"
    page="Intrinsics Guide"
    url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]
[1:03:31][@andsz_][Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)]
[1:04:04][@davidthomas426][You could loft some of those variables out one more loop]
[1:04:58][@waterlimon][How expensive is the float<>int conversion compared to the rest of the workload?[ref
    site="Intel"
    page="Intrinsics Guide"
    url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]
[1:05:40][@davidthomas426][Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?]
[1:06:56][@waterlimon][Does the compiler do any automatic SSE optimization (or have option for it?)]
[1:09:01][@stelar7][sqrt_ss vs sqrt_ps vs sqrt_pd?[ref
    site="Intel"
    page="Intrinsics Guide"
    url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]
[1:11:56][@waterlimon][Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?]
[1:12:41][@pseudonym73][The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them]
[1:14:44][@waterlimon][What happens if "/arch:AVX2" switch is enabled?]
[1:15:26][Look at this AVX-512 stuff[ref
    site="Intel"
    page="Intrinsics Guide"
    url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]
[1:16:51][@braincruser][FMA is fused multiply add]
[1:18:48][@andsz_][Yeah, looks like different caps bits]
[1:19:23][Wrap things up][:speech]
[/video]
Cinera 0.7.0 Update Add output parameter to all of hero/code, hero/intro-to-c and hero/misc, preserving the current URLs while allowing different .hmml filenames, notably for hero/misc which now gets sorted chronologically. Update the cinera__*.css files 2020-05-09 20:59:36 +00:00			`[video output=day114 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Preparing a Function for Optimization" vod_platform=youtube id=_vkI9BedvKA annotator=dspecht annotator=Miblo]`
Relocate riscy and add newly converted hero The idea here is to reduce the amount of superfluous stuff downloaded to each server running cinera 2017-12-06 22:26:13 +00:00			`[1:31][Open things up and recap]`
			`[2:48][DrawRectangleSlowly: Increase efficiency]`
			`[3:33][Create DrawRectangleHopefullyQuickly]`
			`[4:34][DrawRectangleHopefullyQuickly: Skip the preamble]`
			`[5:42][Remove all unnecessary code]`
			`[6:44][Look at what's happening]`
			`[8:01][Make the edge testing code more explicit]`
			`[9:49][Blackboard: See what's happening with these inner products]`
			`[12:04][DrawRectangleHopefullyQuickly: Test U and V instead]`
			`[13:12][Run the game]`
			`[13:33][Make these U and V computations more efficient]`
			`[14:40][Run the game and ensure that everything still blits fine]`
			`[15:16][Continue pruning]`
			`[18:02][Flatten the routine]`
			`[19:55][Blow out v4 Blended into scalar form]`
			`[21:18][Take a close look at the routine and precompute InvTexelA]`
			`[23:35][Blow out v4 Dest and Texel into scalar form]`
			`[25:30][Flatten BilinearSample and SRGBBilinearBlend]`
			`[28:02][Assess our situation]`
			`[28:55][Unpack and optimise the Lerps]`
			`[33:57][Run the game and annotate the code]`
			`[35:33][Flatten SRGB255ToLinear1]`
			`[36:38][Flatten Unpack4x8]`
			`[38:59][That's everything flattened]`
			`[39:22][Note that the code is faster]`
			`[40:58][We have a nasty problem with the unpackings]`
			`[44:01][Blackboard: What is our "wide" strategy?]`
			`[48:43][Set the stage for SIMD]`
			`[50:45][Consider solidifying texture boundaries]`
			`[51:53][Leave it for today]`
Fix some incorrectly converted annotations Also apply some :speech categorisation 2018-03-07 21:48:09 +00:00			`[53:09][Q&A][:speech]`
Relocate riscy and add newly converted hero The idea here is to reduce the amount of superfluous stuff downloaded to each server running cinera 2017-12-06 22:26:13 +00:00			`[53:28][@braincruser][The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?]`
			`[56:42][@stelar7][Why did you write float instead of real32 this stream?]`
			`[57:14][@stelar7][Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?]`
			`[58:06][@garryjohanson][Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?]`
			`[59:04][@g3rain1][Aren't those square roots pretty expensive?[ref`
			`site="Intel"`
			`page="Intrinsics Guide"`
			`url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]`
			`[1:03:31][@andsz_][Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)]`
			`[1:04:04][@davidthomas426][You could loft some of those variables out one more loop]`
			`[1:04:58][@waterlimon][How expensive is the float<>int conversion compared to the rest of the workload?[ref`
			`site="Intel"`
			`page="Intrinsics Guide"`
			`url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]`
			`[1:05:40][@davidthomas426][Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?]`
			`[1:06:56][@waterlimon][Does the compiler do any automatic SSE optimization (or have option for it?)]`
			`[1:09:01][@stelar7][sqrt_ss vs sqrt_ps vs sqrt_pd?[ref`
			`site="Intel"`
			`page="Intrinsics Guide"`
			`url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]`
			`[1:11:56][@waterlimon][Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?]`
			`[1:12:41][@pseudonym73][The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them]`
			`[1:14:44][@waterlimon][What happens if "/arch:AVX2" switch is enabled?]`
			`[1:15:26][Look at this AVX-512 stuff[ref`
			`site="Intel"`
			`page="Intrinsics Guide"`
			`url="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"]]`
			`[1:16:51][@braincruser][FMA is fused multiply add]`
			`[1:18:48][@andsz_][Yeah, looks like different caps bits]`
Fix some incorrectly converted annotations Also apply some :speech categorisation 2018-03-07 21:48:09 +00:00			`[1:19:23][Wrap things up][:speech]`
Relocate riscy and add newly converted hero The idea here is to reduce the amount of superfluous stuff downloaded to each server running cinera 2017-12-06 22:26:13 +00:00			`[/video]`