using temporary buffer to build canvas verex data and use glBufferData with exact number of vertices to submit buffer to gpu

This commit is contained in:
martinfouilleul 2023-02-06 16:14:43 +01:00
parent 3dfaea1bba
commit 5754fc1ad2
8 changed files with 284 additions and 22 deletions

3
.gitignore vendored
View File

@ -8,5 +8,8 @@ bin/*
*.ilk
*.vs
*.obj
*.lib
*.dll
*.sln
src/gles_canvas_shaders.h

View File

@ -0,0 +1,4 @@
set INCLUDES=/I ..\..\src /I ..\..\src\util /I ..\..\src\platform /I ../../ext /I ../../ext/angle_headers
cl /we4013 /Zi /Zc:preprocessor /DMG_IMPLEMENTS_BACKEND_GLES /std:c11 %INCLUDES% main.c /link /LIBPATH:../../bin milepost.lib /LIBPATH:../../bin libEGL.dll.lib libGLESv2.dll.lib user32.lib opengl32.lib gdi32.lib /out:../../bin/perf_text.exe

View File

@ -0,0 +1,11 @@
#!/bin/bash
BINDIR=../../bin
RESDIR=../../resources
SRCDIR=../../src
INCLUDES="-I$SRCDIR -I$SRCDIR/util -I$SRCDIR/platform -I$SRCDIR/app -I$SRCDIR/graphics"
LIBS="-L$BINDIR -lmilepost -framework Cocoa -framework Carbon -framework Metal -framework QuartzCore"
FLAGS="-O2 -mmacos-version-min=10.15.4"
clang++ -g $FLAGS $LIBS $INCLUDES -o $BINDIR/textbench main.cpp

210
examples/perf_text/main.c Normal file
View File

@ -0,0 +1,210 @@
#include<stdio.h>
#include<stdlib.h>
#define LOG_DEFAULT_LEVEL LOG_LEVEL_MESSAGE
#define LOG_COMPILE_DEBUG
#include"milepost.h"
#define LOG_SUBSYSTEM "Main"
static const char* TEST_STRING =
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla quam enim, aliquam in placerat luctus, rutrum in quam.\n" \
"Cras urna elit, pellentesque ac ipsum at, lobortis scelerisque eros. Aenean et turpis nibh. Maecenas lectus augue, eleifend\n" \
"nec efficitur eu, faucibus eget turpis. Suspendisse vel nulla mi. Duis imperdiet neque orci, ac ultrices orci molestie a.\n"
"Etiam malesuada vulputate hendrerit. Cras ultricies diam in lectus finibus, eu laoreet diam rutrum.\n" \
"\n" \
"Etiam dictum orci arcu, ac fermentum leo dapibus lacinia. Integer vitae elementum ex. Vestibulum tempor nunc eu hendrerit\n" \
"ornare. Nunc pretium ligula sit amet massa pulvinar, vitae imperdiet justo bibendum. Maecenas consectetur elementum mi, sed\n" \
"vehicula neque pulvinar sit amet. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc tortor erat, accumsan in laoreet\n" \
"quis, placerat nec enim. Nulla facilisi. Morbi vitae nibh ligula. Suspendisse in molestie magna, eget aliquet mauris. Sed \n" \
"aliquam faucibus magna.\n" \
"\n" \
"Sed metus odio, imperdiet et consequat non, faucibus nec risus. Suspendisse facilisis sem neque, id scelerisque dui mattis sit\n" \
"amet. Nullam tincidunt nisl nec dui dignissim mattis. Proin fermentum ornare ipsum. Proin eleifend, mi vitae porttitor placerat,\n" \
"neque magna elementum turpis, eu aliquet mi urna et leo. Pellentesque interdum est mauris, sed pellentesque risus blandit in.\n" \
"Phasellus dignissim consequat eros, at aliquam elit finibus posuere. Proin suscipit tortor leo, id vulputate odio lobortis in.\n" \
"Vestibulum et orci ligula. Sed scelerisque nunc non nisi aliquam, vel eleifend felis suscipit. Integer posuere sapien elit, \n" \
"lacinia ultricies nibh sodales nec.\n" \
"\n" \
"Etiam aliquam purus sit amet purus ultricies tristique. Nunc maximus nunc quis magna ornare, vel interdum urna fermentum.\n" \
"Vestibulum cursus nisl ut nulla egestas, quis mattis elit venenatis. Praesent malesuada mi non magna aliquam fringilla eget eu\n" \
"turpis. Integer suscipit elit vel consectetur vulputate. Integer euismod, erat eget elementum tempus, magna metus consectetur\n" \
"elit, sed feugiat urna sapien sodales sapien. Sed sit amet varius nunc. Curabitur sodales nunc justo, ac scelerisque ipsum semper\n" \
"eget. Integer ornare, velit ut hendrerit dapibus, erat mauris commodo justo, vel semper urna justo non mauris. Proin blandit,\n" \
"enim ut posuere placerat, leo nibh tristique eros, ut pulvinar sapien elit eget enim. Pellentesque et mauris lectus. Curabitur\n" \
"quis lobortis leo, sit amet egestas dui. Nullam ut sapien eu justo lacinia ultrices. Ut tincidunt, sem non luctus tempus, felis\n" \
"purus imperdiet nisi, non ultricies libero ipsum eu augue. Mauris at luctus enim.";
mg_font create_font()
{
//NOTE(martin): create font
/* str8 fontPath = mp_app_get_resource_path(mem_scratch(), "../resources/OpenSansLatinSubset.ttf");
char* fontPathCString = str8_to_cstring(mem_scratch(), fontPath);
*/
char* fontPathCString = "resources/OpenSansLatinSubset.ttf";
FILE* fontFile = fopen(fontPathCString, "r");
if(!fontFile)
{
LOG_ERROR("Could not load font file '%s'\n", fontPathCString);
return(mg_font_nil());
}
unsigned char* fontData = 0;
fseek(fontFile, 0, SEEK_END);
u32 fontDataSize = ftell(fontFile);
rewind(fontFile);
fontData = (unsigned char*)malloc(fontDataSize);
fread(fontData, 1, fontDataSize, fontFile);
fclose(fontFile);
unicode_range ranges[5] = {UNICODE_RANGE_BASIC_LATIN,
UNICODE_RANGE_C1_CONTROLS_AND_LATIN_1_SUPPLEMENT,
UNICODE_RANGE_LATIN_EXTENDED_A,
UNICODE_RANGE_LATIN_EXTENDED_B,
UNICODE_RANGE_SPECIALS};
mg_font font = mg_font_create_from_memory(fontDataSize, fontData, 5, ranges);
free(fontData);
return(font);
}
int main()
{
LogLevel(LOG_LEVEL_MESSAGE);
mp_init();
mp_clock_init();
mp_rect rect = {.x = 100, .y = 100, .w = 980, .h = 600};
mp_window window = mp_window_create(rect, "test", 0);
//NOTE: create surface, canvas and font
#if defined(OS_MACOS)
mg_surface surface = mg_metal_surface_create_for_window(window);
#elif defined(OS_WIN64)
mg_surface surface = mg_gles_surface_create_for_window(window);
#else
#error "unsupported OS"
#endif
mg_canvas canvas = mg_canvas_create(surface);
mg_font font = create_font();
mg_font_extents extents = mg_font_get_extents(font);
f32 fontScale = mg_font_get_scale_for_em_pixels(font, 12);
f32 lineHeight = fontScale*(extents.ascent + extents.descent + extents.leading);
int codePointCount = utf8_codepoint_count_for_string(str8_from_cstring((char*)TEST_STRING));
u32* codePoints = malloc_array(utf32, codePointCount);
utf8_to_codepoints(codePointCount, codePoints, str8_from_cstring((char*)TEST_STRING));
// start app
mp_window_bring_to_front(window);
mp_window_focus(window);
f64 frameTime = 0;
while(!mp_should_quit())
{
f64 startFrameTime = mp_get_time(MP_CLOCK_MONOTONIC);
mp_pump_events(0);
mp_event event = {0};
while(mp_next_event(&event))
{
switch(event.type)
{
case MP_EVENT_WINDOW_CLOSE:
{
mp_request_quit();
} break;
default:
break;
}
}
f32 textX = 10;
f32 textY = 600 - lineHeight;
mg_surface_prepare(surface);
mg_set_color_rgba(1, 1, 1, 1);
mg_clear();
mg_set_font(font);
mg_set_font_size(12);
mg_set_color_rgba(0, 0, 0, 1);
mg_move_to(textX, textY);
int startIndex = 0;
while(startIndex < codePointCount)
{
bool lineBreak = false;
int subIndex = 0;
for(; (startIndex+subIndex) < codePointCount && subIndex < 512; subIndex++)
{
if(codePoints[startIndex + subIndex] == '\n')
{
lineBreak = true;
break;
}
}
ASSERT(subIndex < 512 && (startIndex+subIndex)<=codePointCount);
u32 glyphs[512];
mg_font_get_glyph_indices(font, (str32){subIndex, codePoints+startIndex}, (str32){512, glyphs});
mg_glyph_outlines((str32){subIndex, glyphs});
mg_fill();
if(lineBreak)
{
textY -= lineHeight;
mg_move_to(textX, textY);
startIndex++;
}
startIndex += subIndex;
}
f64 startFlushTime = mp_get_time(MP_CLOCK_MONOTONIC);
mg_set_color_rgba(0, 0, 1, 1);
mg_set_font(font);
mg_set_font_size(12);
mg_move_to(50, 50);
str8 text = str8_pushf(mem_scratch(),
"Milepost vector graphics test program (frame time = %fs, fps = %f)...",
frameTime,
1./frameTime);
mg_text_outlines(text);
mg_fill();
mg_flush();
f64 startPresentTime = mp_get_time(MP_CLOCK_MONOTONIC);
mg_surface_present(surface);
f64 endFrameTime = mp_get_time(MP_CLOCK_MONOTONIC);
frameTime = (endFrameTime - startFrameTime);
printf("frame time: %.2fms (%.2fFPS), draw = %f.2ms, flush = %.2fms, present = %.2fms\n",
frameTime*1000,
1./frameTime,
(startFlushTime - startFrameTime)*1000,
(startPresentTime - startFlushTime)*1000,
(endFrameTime - startPresentTime)*1000);
mem_arena_clear(mem_scratch());
}
mp_terminate();
return(0);
}

View File

@ -158,7 +158,6 @@ int main()
y += dy;
mg_surface_prepare(surface);
// background
mg_set_color_rgba(0, 1, 1, 1);
mg_clear();
@ -194,8 +193,8 @@ int main()
frameTime,
1./frameTime);
mg_text_outlines(text);
//*/
mg_fill();
//*/
//*
printf("Milepost vector graphics test program (frame time = %fs, fps = %f)...\n",

View File

@ -60,6 +60,7 @@ typedef struct debug_vertex
u8 align2[12];
} debug_vertex;
#define LayoutNext(prevName, prevType, nextType) \
AlignUpOnPow2(_cat3_(LAYOUT_, prevName, _OFFSET)+_cat3_(LAYOUT_, prevType, _SIZE), _cat3_(LAYOUT_, nextType, _ALIGN))
@ -83,7 +84,7 @@ enum {
};
enum {
MG_GLES_CANVAS_DEFAULT_BUFFER_LENGTH = 8<<10,
MG_GLES_CANVAS_DEFAULT_BUFFER_LENGTH = 1<<20,
MG_GLES_CANVAS_VERTEX_BUFFER_SIZE = MG_GLES_CANVAS_DEFAULT_BUFFER_LENGTH * LAYOUT_VERTEX_SIZE,
MG_GLES_CANVAS_INDEX_BUFFER_SIZE = MG_GLES_CANVAS_DEFAULT_BUFFER_LENGTH * LAYOUT_INT_SIZE,
MG_GLES_CANVAS_TILE_COUNTER_BUFFER_SIZE = 65536,
@ -93,12 +94,6 @@ enum {
void mg_gles_canvas_update_vertex_layout(mg_gles_canvas_backend* backend)
{
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->vertexBuffer);
backend->vertexMapping = glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, MG_GLES_CANVAS_VERTEX_BUFFER_SIZE, GL_MAP_WRITE_BIT);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->indexBuffer);
backend->indexMapping = glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, MG_GLES_CANVAS_INDEX_BUFFER_SIZE, GL_MAP_WRITE_BIT);
backend->interface.vertexLayout = (mg_vertex_layout){
.maxVertexCount = MG_GLES_CANVAS_DEFAULT_BUFFER_LENGTH,
.maxIndexCount = MG_GLES_CANVAS_DEFAULT_BUFFER_LENGTH,
@ -118,6 +113,14 @@ void mg_gles_canvas_update_vertex_layout(mg_gles_canvas_backend* backend)
.indexStride = LAYOUT_INT_SIZE};
}
void mg_gles_send_buffers(mg_gles_canvas_backend* backend, int vertexCount, int indexCount)
{
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->vertexBuffer);
glBufferData(GL_SHADER_STORAGE_BUFFER, LAYOUT_VERTEX_SIZE*vertexCount, backend->vertexMapping, GL_DYNAMIC_DRAW);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->indexBuffer);
glBufferData(GL_SHADER_STORAGE_BUFFER, LAYOUT_INT_SIZE*indexCount, backend->indexMapping, GL_DYNAMIC_DRAW);
}
void mg_gles_canvas_begin(mg_canvas_backend* interface)
{
mg_gles_canvas_backend* backend = (mg_gles_canvas_backend*)interface;
@ -161,10 +164,7 @@ void mg_gles_canvas_draw_batch(mg_canvas_backend* interface, u32 vertexCount, u3
debug_vertex vertex;
printf("foo %p\n", &vertex);
//*/
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->vertexBuffer);
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->indexBuffer);
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
mg_gles_send_buffers(backend, vertexCount, indexCount);
mp_rect frame = mg_surface_get_frame(backend->surface);
@ -194,16 +194,17 @@ void mg_gles_canvas_draw_batch(mg_canvas_backend* interface, u32 vertexCount, u3
glUniform1ui(2, tileSize);
glUniform1ui(3, tileArraySize);
glDispatchCompute(indexCount/3, 1, 1);
u32 threadCount = indexCount/3;
glDispatchCompute((threadCount + 255)/256, 1, 1);
//NOTE: next we sort triangles in each tile
glUseProgram(backend->sortProgram);
/*
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, backend->vertexBuffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, backend->indexBuffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, backend->tileCounterBuffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, backend->tileArrayBuffer);
*/
glUniform1ui(0, indexCount);
glUniform2ui(1, tileCountX, tileCountY);
glUniform1ui(2, tileSize);
@ -215,12 +216,12 @@ void mg_gles_canvas_draw_batch(mg_canvas_backend* interface, u32 vertexCount, u3
// glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
glUseProgram(backend->drawProgram);
/*
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, backend->vertexBuffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, backend->indexBuffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, backend->tileCounterBuffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, backend->tileArrayBuffer);
*/
glBindImageTexture(0, backend->outTexture, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA8);
glUniform1ui(0, indexCount);
@ -304,11 +305,11 @@ mg_canvas_backend* mg_gles_canvas_create(mg_surface surface)
glGenBuffers(1, &backend->vertexBuffer);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->vertexBuffer);
glBufferData(GL_SHADER_STORAGE_BUFFER, MG_GLES_CANVAS_VERTEX_BUFFER_SIZE, 0, GL_DYNAMIC_DRAW);
// glBufferData(GL_SHADER_STORAGE_BUFFER, MG_GLES_CANVAS_VERTEX_BUFFER_SIZE, 0, GL_DYNAMIC_DRAW);
glGenBuffers(1, &backend->indexBuffer);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->indexBuffer);
glBufferData(GL_SHADER_STORAGE_BUFFER, MG_GLES_CANVAS_INDEX_BUFFER_SIZE, 0, GL_DYNAMIC_DRAW);
// glBufferData(GL_SHADER_STORAGE_BUFFER, MG_GLES_CANVAS_INDEX_BUFFER_SIZE, 0, GL_DYNAMIC_DRAW);
glGenBuffers(1, &backend->tileCounterBuffer);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, backend->tileCounterBuffer);
@ -439,6 +440,9 @@ mg_canvas_backend* mg_gles_canvas_create(mg_surface surface)
}
}
backend->vertexMapping = malloc_array(char, 1<<30);
backend->indexMapping = malloc_array(char, 1<<30);
mg_gles_canvas_update_vertex_layout(backend);
}

View File

@ -1,5 +1,5 @@
#version 310 es
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
precision mediump float;
layout(std430) buffer;
@ -36,7 +36,11 @@ layout(location = 3) uniform uint tileArraySize;
void main()
{
uint triangleIndex = gl_WorkGroupID.x * 3u;
uint triangleIndex = (gl_WorkGroupID.x*gl_WorkGroupSize.x + gl_LocalInvocationIndex) * 3u;
if(triangleIndex >= indexCount)
{
return;
}
uint i0 = indexBuffer.elements[triangleIndex];
uint i1 = indexBuffer.elements[triangleIndex+1u];

View File

@ -41,3 +41,30 @@ Anyway for now, is it possible have an _under the hood_ atlas, and reserve a way
* or, we can expose mg_texture and related APIs for now, as if they were individual, but back them by a hidden atlas. And a bit later expose mg_image/atlas -> maybe better.
* Or just implement breaking the triangle stream into batches now...
Perf issue of binding large vertex buffer
-----------------------------------------
Binding big buffers has a high cost. We should send updates in smaller batches, either
- use a temporary storage to build vertex buffer, then send with glBufferData just before rendering
- stream data to large buffer using glMapBufferRange (instead of mapping the whole buffer)
- stream data to large buffer using glBufferSubData
We have to account for these edge cases:
- how we handle overflowing the sub range (ie space allocated or mapped to build vertices)
- how we handle overflowing buffer capacity (if using a pre-allocated buffer and glMapBufferRange/glBufferSubData)
* Using a temporary store and glBufferData forces a draw call when exceeding the temporary buffer limits. But the two cases of overflow are handled at once.
* Using a temporary store and glBufferSubData distributed data transfer asynchronously, and doesn't force a draw call when exceeding temporary buffer. We'd still need to force a draw when exceeding the gl buffer size.
* Using glMapRange also distributes data transfer asynchronously. We need to force a draw when exceeding the gl buffer size.
The first solution (temporary building buffer + glBufferData) is simpler and probably ok for low number of vertices. We can even build the vertices in an arena and virtually never care about exceeding building buffer capacity. But if we have many vertices, maybe we care about distributing transfer across asynchronous calls.
what happens if we exceed the gl buffer size? -> we need to make a draw call to use vertices, and maybe then grow the buffer to bigger size. But this implies breaking the batch, probably in the middle of a shape? this isn't really possible because we'd need the previous candidate color and flipcount transfered between batches. We could use a texture for that, but it complicates things quite a bit...
Notes:
* Mapping/Unmapping smaller ranges of a big buffer doesn't seem to lower the cost of binding that buffer. Does the driver sends the full buffer regardless of the range that was changed?
* Orphaning the buffer before mapping doesn't seem to do any good
* Doing glBufferData from a small build buffer is surprisingly slow...
* Angle seems to take into account only the first call to glBufferData to allocate size, and then send the full buffer???
* Not pre-allocating in creation procedure "solves" the problem with glBufferData...