orca/vector_renderer_notes.txt

Various notes/thoughts about the 2D vector graphics renderer

Triangle Rasterization
----------------------
https://fgiesen.wordpress.com/2013/02/08/triangle-rasterization-in-practice/

https://github.com/rygorous/trirast/blob/master/main.cpp

https://joshbeam.com/articles/triangle_rasterization/

https://nlguillemot.wordpress.com/2016/07/10/rasterizer-notes/

https://web.archive.org/web/20120625103536/http://devmaster.net/forums/topic/1145-advanced-rasterization/

Bindless textures
-----------------
It's tempting to use bindless textures to be able to draw individual images using only one draw call. This would avoid much of the complexity of either managing a texture atlas under the hood or breaking the draw list into batches...
But, it's only an extension and seem to not be supported everywhere. Moreover, there might be a problem where the texture handle used by the shader can not differ between batches (must be "dynamic uniforms"), which defeats the purpose in our case -> it requires OES_gpu_shader5 or GLES 3.2

ideally, the atlas should be built on top of lower level image features of the renderer, eg mg_image_upload_sub_region(), mg_image_render_sub_region() etc...

This would mean individual textures can be set and used in a frame. So without bindless textures, we would need to break down the draw list in batches, each time the texture attribute changes. This also mean we need to blend each batch result to the previous one.

 - It seems possible to implement bindless texture in metal using argument buffers
 - We could investigate if angle/our targets likely support OES_gpu_shader5
 - But, this means the canvas renderer relies on the backend to provide this kind of feature
 - It also assume the upper bound for indexable bindless textures is enough on every backend
 - We'll likely need a batching fallback anyway?


-> Angle doesn't seem to support GL_IMG_bindless_textures for now.

Workaround: we could use a desktop GL 4.3+ context for the canvas renderer on windows, _BUT_ the functions would conflict with the GLES canvas. Except if we use function pointers that are loaded differently for each context (which we probably should but I'd better keep it for later).

-> We'll probably want to do that, or make 1 draw call per changing texture.

Anyway for now, is it possible have an _under the hood_ atlas, and reserve a way to change the API so that we make the atlas explicit / allow using single textures for big images etc?

* We could decide that we can set an atlas, and all mg_images get allocated from that atlas. If no atlas is set a default one is used.

* or, we can expose mg_texture and related APIs for now, as if they were individual, but back them by a hidden atlas. And a bit later expose mg_image/atlas -> maybe better.

* Or just implement breaking the triangle stream into batches now...

Perf issue of binding large vertex buffer
-----------------------------------------

Binding big buffers has a high cost. We should send updates in smaller batches, either
	- use a temporary storage to build vertex buffer, then send with glBufferData just before rendering
	- stream data to large buffer using glMapBufferRange (instead of mapping the whole buffer)
	- stream data to large buffer using glBufferSubData

We have to account for these edge cases:
	- how we handle overflowing the sub range (ie space allocated or mapped to build vertices)
	- how we handle overflowing buffer capacity (if using a pre-allocated buffer and glMapBufferRange/glBufferSubData)

* Using a temporary store and glBufferData forces a draw call when exceeding the temporary buffer limits. But the two cases of overflow are handled at once.
* Using a temporary store and glBufferSubData distributed data transfer asynchronously, and doesn't force a draw call when exceeding temporary buffer. We'd still need to force a draw when exceeding the gl buffer size.
* Using glMapRange also distributes data transfer asynchronously. We need to force a draw when exceeding the gl buffer size.

The first solution (temporary building buffer + glBufferData) is simpler and probably ok for low number of vertices. We can even build the vertices in an arena and virtually never care about exceeding building buffer capacity. But if we have many vertices, maybe we care about distributing transfer across asynchronous calls.

what happens if we exceed the gl buffer size? -> we need to make a draw call to use vertices, and maybe then grow the buffer to bigger size. But this implies breaking the batch, probably in the middle of a shape? this isn't really possible because we'd need the previous candidate color and flipcount transfered between batches. We could use a texture for that, but it complicates things quite a bit...

Notes:
	* Mapping/Unmapping smaller ranges of a big buffer doesn't seem to lower the cost of binding that buffer. Does the driver sends the full buffer regardless of the range that was changed?
	* Orphaning the buffer before mapping doesn't seem to do any good
	* Doing glBufferData from a small build buffer is surprisingly slow...
		* Angle seems to take into account only the first call to glBufferData to allocate size, and then send the full buffer???
		* Not pre-allocating in creation procedure "solves" the problem with glBufferData...