cinera_handmade.network/cmuratori/hero/code/code440.hmml

122 lines
11 KiB
Plaintext

[video output=day440 member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code medium=speech title="Introduction to Function Approximation with Andrew Bromage" vod_platform=youtube id=0b68cEY2wKs guest=Pseudonym73 annotator=Miblo]
[0:00][Welcome to a special episode with [@Pseudonym73 Andrew Bromage]]
[2:42][@Pseudonym73][How floating point numbers are represented by a computer[ref
publisher=IEEE
title="754-2008 - IEEE Standard for Floating-Point Arithmetic"
url=http://ieeexplore.ieee.org/servlet/opac?punumber=4610933]]
[4:36][@Pseudonym73][Scientific notation, and how IEEE 754 represents numbers in binary][:mathematics :"numeral system"]
[7:14][Get [@Pseudonym73 Andrew] back][:admin]
[9:27][@Pseudonym73][Return]
[10:36][@Pseudonym73][Continuing IEEE 754's[ref
publisher=IEEE
title="754-2008 - IEEE Standard for Floating-Point Arithmetic"
url=http://ieeexplore.ieee.org/servlet/opac?punumber=4610933] representation of floating point numbers[ref
site="Float Toy"
url=http://evanw.github.io/float-toy/]][:mathematics :"numeral system"]
[16:20][@Pseudonym73][Subnormal numbers, the special-case numbers infinity, quiet NaN and signaling NaN, and the quality of being "algebraically closed"][:mathematics :"numeral system"]
[24:10][@Pseudonym73][Any questions?]
[24:35][Is it just a peculiarity of binary as a number system, that you can skip encoding the leading digit?][:mathematics :"numeral system"]
[26:06][@desuused][Q: Is there a representation for underflowing numbers?][:mathematics :"numeral system"]
[27:28][@Pseudonym73][Note the binary and decimal representations of floating point numbers in the IEEE 754 standard[ref
publisher=IEEE
title="754-2008 - IEEE Standard for Floating-Point Arithmetic"
url=http://ieeexplore.ieee.org/servlet/opac?punumber=4610933]][:mathematics :"numeral system"]
[27:51][@Pseudonym73][Constants definitions in handmade_numerics.h][:mathematics :"numeral system" :research]
[30:22][@Pseudonym73][Constants definitions in C's float.h[ref
site=Wikibooks
page="C Programming/float.h"
url=https://en.wikibooks.org/wiki/C_Programming/float.h] as compared with those in handmade_numerics.h, with a special mention of machine epsilon][:mathematics :"numeral system" :research]
[33:32][@Pseudonym73][Describe the IEEEBinary32 union, ieee754_number_category enum and the special-case number functions][:mathematics :"numeral system" :research]
[36:48][@Pseudonym73][Describe Real32_Abs(), Real32_SetSign() and CategoryOfReal32()][:mathematics :"numeral system" :research]
[39:07][@Pseudonym73][Describe ExtractExponent() as similar to the CRT's frexp()[ref
site=MSDN
page=frexp
url=https://msdn.microsoft.com/en-us/library/w1xfschh.aspx] with an example of its use in a sqrt() function][:mathematics :"numeral system" :research]
[47:36][When multiplying a subnormal number by a power of two, does the floating point unit first shift the numbers into the normal range before incrementing the exponent?][:mathematics :"numeral system"]
[50:38][@Pseudonym73][Describe ScaleByExponent()][:mathematics :"numeral system" :research]
[53:58][@Pseudonym73][Note the differing range of absolute values of the mantissa in text books (as used in handmade_numerics.h) and the CRT's frexp()[ref
site=MSDN
page=frexp
url=https://msdn.microsoft.com/en-us/library/w1xfschh.aspx]][:mathematics :"numeral system" :research]
[57:01][@Pseudonym73][Quote James H. Wilkinson on the state of computer arithmetic in 1971][:mathematics :"numeral system"]
[58:30][@Pseudonym73][Describe SlowDivision(), with emphasis on the sheer amount of specification compliance it contains[ref
publisher=IEEE
title="754-2008 - IEEE Standard for Floating-Point Arithmetic"
url=http://ieeexplore.ieee.org/servlet/opac?punumber=4610933]][:mathematics :research]
[1:06:29][@Pseudonym73][Walk through an example of SlowDivision(), noting why it uses 11 bits of precision in its application of Horner's rule[ref
site=Wikipedia
page="Horner's method"
url=https://en.wikipedia.org/wiki/Horner%27s_method] (IA-32's RCPSS instruction[ref
site="Intel"
page="Intel 64 and IA-32 Architectures Software Developer Manuals"
url=https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html])][:isa :mathematics :research]
[1:14:13][@Pseudonym73][How SlowDivision() finishes up its computation to the highest precision it can][:mathematics :research]
[1:16:54][@Pseudonym73][Note the difference between SlowDivision() and how our FPU performs division][:mathematics :research]
[1:19:13][@Pseudonym73][Calculating those polynomial approximations from SlowDivision(), with range illustrations in Mathematica][:mathematics :research]
[1:25:12][@Pseudonym73][Relative vs absolute error][:mathematics :research]
[1:26:43][@Pseudonym73][Plot the error function in Mathematica, with a mention of Chebyshev's Equioscillation theorem[ref
site=Wikipedia
page="Equioscillation theorem"
url=https://en.wikipedia.org/wiki/Equioscillation_theorem] and Chebyshev nodes[ref
site=Wikipedia
page="Chebyshev nodes"
url=https://en.wikipedia.org/wiki/Chebyshev_nodes]][:mathematics :research]
[1:31:35][@Pseudonym73][Plot the eighth order Chebyshev polynomial in the range -1 to 1 in Mathematica][:mathematics :research]
[1:33:47][@Pseudonym73][Using the Remez exchange algorithm[ref
site=Wikipedia
page="Remez algorithm"
url=https://en.wikipedia.org/wiki/Remez_algorithm] to find approximations to Chebyshev's set of polynomials][:mathematics :research]
[1:38:21][On the initial guesses in the Remez exchange algorithm][:mathematics]
[1:39:15][@Pseudonym73][Plot the ninth order Chebyshev polynomial, for comparison with the eighth order, to explain extrema][:mathematics :research]
[1:40:20][Struggle with the communication link][:rant :speech]
[1:41:49][As the Remez exchange algorithm proceeds, what values does it use as its new guesses?][:mathematics]
[1:42:24][@Pseudonym73][Searching for the extremum, perhaps using golden-section search,[ref
site=Wikipedia
page="Golden-section search"
url=https://en.wikipedia.org/wiki/Golden-section_search] as the Remez exchange algorithm proceeds][:mathematics :research]
[1:46:50][@Pseudonym73][Plot sine in Mathematica, and introduce the computation of sine in the range 0 to 2π][:mathematics :research]
[1:56:43][@gg_nate][Can you set video / audio bit rate on hangouts? If he could lower the bit rate of the video the audio might work better]
[1:56:57][@Pseudonym73][Describe SinCos_TableVersion()][:mathematics :research]
[1:58:31][@Pseudonym73][Deriving trigonometric identities][:mathematics]
[2:02:28][@Pseudonym73][Calculating cosine around "a", branch-free][:mathematics]
[2:04:00][@Pseudonym73][Point out the experimental SinCos_QuadrantVersion() for SIMD sines and cosines][:mathematics :optimisation :research]
[2:05:54][@Pseudonym73][Describe the XSinCosX table look up][:mathematics :research]
[2:07:56][@Pseudonym73][Counting has always started at zero[ref
author="Philip Wadler"
title="As Natural as 0, 1, 2"
url=https://homepages.inf.ed.ac.uk/wadler/papers/natural/natural.pdf]][:mathematics :"numeral system" :research]
[2:08:47][@Pseudonym73][Continued description of the XSinCosX table look up][:mathematics :research]
[2:10:38][@Pseudonym73][:Run calculate_sincos_tables and explain the result for .5][:mathematics]
[2:12:10][@Pseudonym73][Describe how FindSinCosAround() searches adjacent floating point numbers][:mathematics :research]
[2:15:55][Could you explain why part of the table version looks up into XSinCosX while the polynomial part is the same no matter where you are in the table?][:mathematics]
[2:17:10][@Pseudonym73][Further explain the table lookups of cos(a) and sin(a) and the sin(e) and cos(e) polynomial approximations][:mathematics :research]
[2:19:46][Since table lookups are hard in SIMD, what sort of stuff would you end up doing if you couldn't use a table?][:mathematics :optimisation]
[2:20:09][@Pseudonym73][Describe the experimental SinCos_QuadrantVersion()][:mathematics :optimisation :research]
[2:28:25][@x13pixels][Q: Why do C0, C2, C4, C6, etc. have more precision than can fit in a float?][:mathematics]
[2:29:32][@Pseudonym73][Describe ATan() and ATan2(), noting the current use of atan2 in [~hero Handmade Hero]][:mathematics :research]
[2:35:01][Determine to remove atan2 from [~hero Handmade Hero] with thanks to [@Pseudonym73 Andrew]][:mathematics]
[2:35:31][Q&A]
[2:36:24][@0b0000000000000][If the values are denormal, they will run way slower][:mathematics]
[2:36:46][@filiadelski][Q: What was the reason we couldn't do two's complement in the exponent?][:mathematics :"numeral system"]
[2:37:32][@0b0000000000000][sin and cos[ref
site="what-when-how"
page="The Multiplane Downshooter (Non-Traditional Animation Techniques) Part 1"
url=http://what-when-how.com/non-traditional-animation-techniques/the-multiplane-downshooter-non-traditional-animation-techniques-part-1/]][:mathematics]
[2:41:32][@0b0000000000000][Unless you explicitly flush them to zero, they will run super slow][:mathematics]
[2:42:44][@spacealiens][Q: Is there a version of this maths source code anywhere, or will it be included in the [~hero Handmade Hero] project eventually?]
[2:43:17][@Pseudonym73][Note the educational nature of this sine and cosine implementation, with a mention of Cody-Waite reduction][:mathematics]
[2:45:43][@vateferfout][Q: Will you go through acos as well in a later stream?]
[2:46:10][@0b0000000000000][What is the guy speaking's username?]
[2:46:51][SSE denormal flushing]
[2:48:13][@0b0000000000000][I can show you some versions that are completely branchless without tables if you are interested][:mathematics]
[2:49:05][@sneakybob_wot][Q: Have you done a speed comparison vs the C versions?][:performance]
[2:50:34][@Pseudonym73][Re-emphasise the slowness of SlowDivision()][:mathematics :performance :research]
[2:52:55][@staythirsty90][Not industrial strength?! Why am I here??]
[2:53:15][@vateferfout][Q: To be sure, it means the SIMD intrinsics are not standards-compliant?]
[2:56:22][@Pseudonym73][Quote the Intel 64 and IA-32 Architectures Optimization Reference Manual:[ref
site="Intel"
page="Intel 64 and IA-32 Architectures Software Developer Manuals"
url=https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html] "Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases"][:isa :mathematics :performance :research]
[2:57:04][Thank you to [@Pseudonym73 Andrew] for walking us through sine and cosine, with closing thoughts on numerical approximations][:mathematics]
[/video]