[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Changing to Single Dispatch Per Pass (Part 2)" vod_platform=youtube id=0d0_NitChCY annotator=Miblo] [0:00][Recap our switch to texture vertex indices][:hardware :rendering :speech] [1:36][Augment game_render_commands with an IndexArray, with a few words on the GPU's ability to cache vertex shader transforms][:hardware :rendering] [7:29][Remove renderer_texture_group and renderer_memory_layout, and pass our new IndexArray down the pipe, also augmenting open_gl with IndexArray and MaxIndexCount][:hardware :rendering] [14:00][:Run the Renderer Test successfully][:hardware :rendering] [14:12][Reduce the MaxQuadCountPerFrame in RenderTest() and the (grass) CoverIndex in PushSimpleScene(), with a few words on breaking the problem into steps][:hardware :rendering] [16:30][:Run the Renderer Test with our single vertex buffer working][:hardware :rendering] [16:51][Augment render_entry_textured_quads with IndexArrayOffset for OpenGLEndFrame() to use the index buffer passed in][:hardware :rendering] [20:19][Change OpenGLEndFrame() to call glBufferData() once, instead of while processing every render command][:hardware :rendering] [22:17][:Run it to see problems with the depth peeling][:hardware :rendering] [22:44][Make OpenGLInit() generate a separate CompositeVertexBuffer and ScreenFillVertexBuffer[ref site="OpenGL 4 Reference Pages" page=glBufferData url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glBufferData.xhtml]][:hardware :rendering] [28:52][:Run it with the depth peel working correctly][:hardware :rendering] [29:03][Revert the MaxQuadCountPerFrame in RenderTest() and the (grass) CoverIndex in PushSimpleScene() to their original high values][:hardware :rendering] [29:55][:Run it to see that it is much zippier][:hardware :performance :rendering] [30:34][Temporarily disable the :lighting][:rendering] [31:15][:Run it to see that it didn't affect it much][:lighting :performance :rendering] [31:51][Make RenderLoop() disable the :lighting][:rendering] [33:17][:Run it and gauge the :performance with the :lighting disabled][:rendering] [35:40][Turn off all the grass and augment open_gl with an IndexBuffer for OpenGLEndFrame() to use, rather than initialising Indices itself[ref site="OpenGL 4 Reference Pages" page=glBufferData url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glBufferData.xhtml][ref site=Khronos page=glcorearb.h url=https://www.khronos.org/registry/OpenGL/api/GL/glcorearb.h]][:hardware :rendering] [45:31][Make PushQuad() fill our new IndexBuffer with the VertIndex][:hardware :rendering] [49:08][:Run it without crashing, but also without seeing anything][:hardware :rendering] [49:37][Fix OpenGLEndFrame() to pass GL_UNSIGNED_SHORT to glDrawElements()][:hardware :rendering] [50:36][:Run it to see that our triangle must be wound wrong][:hardware :rendering] [50:46][Triangle winding][:blackboard :hardware :rendering] [51:25][Fix the vertex winding in PushQuad()][:hardware :rendering] [51:29][:Run it to see that it looks about right][:hardware :rendering] [51:38][Let OpenGLEndFrame() texture our quads with their bitmaps][:hardware :rendering] [51:58][:Run it to see that it looks about right][:hardware :rendering] [52:05][Switch OpenGLEndFrame() to perform one draw call per scene, using the white bitmap for each sprite][:hardware :rendering] [53:21][:Run it to see that our :performance has improved][:hardware :rendering] [54:56][Re-enable all the grass with a view to overcoming the unsigned short vertex buffer index limit][:hardware :rendering] [55:40][:Run it and hit our VI == VertIndex assertion][:hardware :rendering] [56:23][Enable GetCurrentQuads() and PushQuad() to handle more than 65535 / 4 quads using glDrawElementsBaseVertex()[ref site=docs.GL page=glDrawElementsBaseVertex url=http://docs.gl/gl2/glDrawElementsBaseVertex][ref site=Khronos page=glcorearb.h url=https://www.khronos.org/registry/OpenGL/api/GL/glcorearb.h]][:hardware :rendering] [1:04:41][:Run it to see that it looks good][:hardware :rendering] [1:05:29][Let OpenGLEndFrame() texture our quads with the grass bitmap][:hardware :rendering] [1:06:06][:Run it to see that we now don't have any :performance spikes][:hardware :rendering] [1:08:30][Begin to switch the renderer over to use texture arrays, first enabling the vertex shader to understand the notion of a vertex index][:hardware :rendering] [1:18:44][:Run it successfully][:hardware :rendering] [1:18:51][Switch OpenGLAllocateTexture() to specify a feed-forward texture array, calling glTexSubImage3D()[ref site="OpenGL Wiki" page="Array Texture" url=https://www.khronos.org/opengl/wiki/Array_Texture][ref site=docs.GL page=glTexSubImage3D url=http://docs.gl/gl3/glTexSubImage3D]][:hardware :memory :rendering] [1:38:09][Switch [~hero Handmade Hero] over to use the renderer's new texture array][:"asset loading" :hardware :memory :rendering] [1:45:01][:Run it and hit an OpenGL error: "Texture name does not refer to a texture object generated by OpenGL"][:hardware :rendering] [1:46:08][Temporarily add a TextureHandles array to the open_gl struct][:hardware :rendering] [1:48:42][:Run it to see our tree-texture scene][:hardware :rendering] [1:49:00][Remove the TextureHandles in favour of sampling slices into the lone texture array[ref site=docs.GL page=glBindTexture url=http://docs.gl/gl3/glBindTexture][ref site="OpenGL Wiki" page="Array Texture" url=https://www.khronos.org/opengl/wiki/Array_Texture][ref site=Khronos page=glcorearb.h url=https://www.khronos.org/registry/OpenGL/api/GL/glcorearb.h][ref site="OpenGL Registry" page="The OpenGL Shading Language 1.50 Quick Reference Card" url=https://www.khronos.org/files/opengl-quick-reference-card.pdf]][:hardware :rendering] [2:02:05][:Run it and hit an OpenGL error: "Invalid texture format"][:hardware :rendering] [2:03:38][Prevent OpenGLInit() from allocating the WhiteBitmap, and make it specify a texture level of 1 when calling glTexStorage2D()][:hardware :rendering] [2:04:29][:Run it to see it all working with the correct textures, just that they are not always correctly sized[ref site="OpenGL 4 Reference Pages" page=glGenerateMipmap url=https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glGenerateMipmap.xhtml]][:"asset loading" :hardware :rendering] [2:07:26][:Run [~hero Handmade Hero] to see that it has issues][:"asset loading" :hardware :rendering] [2:08:09][Remove the WhiteBitmap from the renderer in favour of allowing the game itself to specify it if needed][:"asset loading"] [2:11:56][Enable AllocateGameAssets() in [~hero Handmade Hero] to create its needed WhiteBitmap][:"asset loading"] [2:15:24][Augment the renderer_texture with Width and Height for PushQuad() to use and so fix the texture sizing, renaming TextureHandle() to ReferToTexture()][:"asset loading"] [2:27:13][Make PushQuad() scale the texture UVs based on their sizes][:rendering] [2:29:14][:Run the Renderer Test and [~hero Handmade Hero] to see that the textures are correctly sized][:rendering] [2:29:37][Define TEXTURE_ARRAY_DIM][:rendering] [2:30:42][:Run it, doing texture arrays][:rendering] [2:31:22][Q&A][:speech] [2:31:46][@lkalinovcic][Q: Are there any benefits to using PBOs for async texture transfers these days? I've heard that drivers copy texture data into their internal :memory anyway, so the call returns immediately and behaves as if it were asynchronous][:hardware :rendering] [2:34:27][@mmozeiko][Q: In GL Core Profile you need to generate all "names" (textures, buffers, FBO, ...). Only in Compatibility Profile you are allowed to provide your own "name" values][:hardware :rendering] [2:34:49][@lkalinovcic][Q: Unrelated: What's the recommended way to deal with Windows doing an infinite WindowProc loop while moving or resizing a window? My current approach is to jump in and out of the event processing loop with fibers. Is there something less hacky?][:"platform layer"] [2:36:14][@mmozeiko][Q: You probably would want to replace glTexStorage3D() call with glTexImage3D(). TexStorage is from OpenGL 4.2 version (or ARB_texture_storage extension). TexStorage differs from TexImage in a way that its texture properties (size / format) are immutable – it cannot be changed once texture is created. It may help :performance but often it does not change][:hardware :rendering] [2:36:58][Replace glTexStorage3D() with glTexImage3D()[ref site=docs.GL page=glTexImage3D url=http://docs.gl/gl3/glTexImage3D]][:hardware :rendering] [2:40:04][@vateferfout][Q: Do you still multiply the offset in the glDrawElementsBaseVertex() call? Because I never do it and never had any problem, so I don't understand what's going on[ref site=docs.GL page=glDrawElementsBaseVertex url=http://docs.gl/gl2/glDrawElementsBaseVertex][ref site=OpenGL page="OpenGL 3.2 (Core Profile)" url=https://www.khronos.org/registry/OpenGL/specs/gl/glspec32.core.pdf]][:hardware :rendering] [2:44:19][@mmozeiko][Q: Please show glTexSubImage3D() call that fails][:hardware :rendering :run] [2:47:38][@mmozeiko][Q: It needs to be GL_RGBA or whatever (value does not matter, because it is for last argument pixel data, which is NULL, so no data specified)][:hardware :rendering] [2:48:28][@mmozeiko][Q: "does not matter", as long as it is valid, nonzero][:hardware :rendering] [2:49:49][@cemuzunlar][Q: I think glTexImage3D() is the failing call. Can you please check the call stack again at crash time?][:hardware :rendering] [2:51:45][Wrap it up with the determination to find out why we can't use glTexImage3D() here in OpenGLInit(), and a plug of [~cmuratori Casey] and [~checker Chris Hecker]'s upcoming talks at Busan Indie Connect Festival 2018[ref site="BIC Festival" url=http://www.bicfest.org/en/]][:speech] [/video]