Handmade Hero Day 001 - Setting Up the Windows Build - by Emmanuel Vaccaro
+diff --git a/cmuratori/hero/code/code001.hmml b/cmuratori/hero/code/code001.hmml index 2daffae..ab8b73e 100644 --- a/cmuratori/hero/code/code001.hmml +++ b/cmuratori/hero/code/code001.hmml @@ -1,4 +1,4 @@ -[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code title="Setting Up the Windows Build" vod_platform=youtube id=Ee3EtYb8d1o annotator=jacebennett annotator=Miblo annotator=Mannilie annotator=theinternetftw annotator=wheatdog] +[video member=cmuratori stream_platform=twitch stream_username=handmade_hero project=code template=code001_template.html title="Setting Up the Windows Build" vod_platform=youtube id=Ee3EtYb8d1o annotator=jacebennett annotator=Miblo annotator=Mannilie annotator=theinternetftw annotator=wheatdog] [0:46][Course of the Handmade Hero series] [3:04][Start the project] [5:06][Command line in Windows] diff --git a/cmuratori/hero/code/code001_template.html b/cmuratori/hero/code/code001_template.html new file mode 100644 index 0000000..3d37058 --- /dev/null +++ b/cmuratori/hero/code/code001_template.html @@ -0,0 +1,16 @@ + +
+ + + + + + + +Handmade Hero Day 001 - Setting Up the Windows Build - by Emmanuel Vaccaro
+Handmade Hero Day 002 - Opening a Win32 Window - by Emmanuel Vaccaro
+Handmade Hero Day 003 - Allocating a Back Buffer - by Emmanuel Vaccaro
+You may have found yourself confused when looking at this code:
+#define X_INPUT_GET_STATE(name) DWORD WINAPI name(DWORD dwUserIndex, XINPUT_STATE *pState)
+typedef X_INPUT_GET_STATE(x_input_get_state);
+X_INPUT_GET_STATE(XInputGetStateStub)
+{
+ return(0);
+}
+global_variable x_input_get_state *XInputGetState_ = XInputGetStateStub;
+#define XInputGetState XInputGetState_
+
What on God's green earth is happening here? Well, don't despair. It's really quite simple, we just need to unwrap a few things and break it down line by line.
+#define X_INPUT_GET_STATE(name) DWORD WINAPI name(DWORD dwUserIndex, XINPUT_STATE *pState)
+
In C (as well as C++ and Objective C), the first step of the compilation process is a code transformation pass by the preprocessor. As we have covered previously in the series, any command which begins with a "#" is what is known as a preprocessor directive. This is just a command which will tell the preprocessor to do something, such as #include a file. In this case, "#define" is a directive which creates a macro. A macro can be thought of as a code substitution, anytime the processor finds the macro used in your code, it will swap it out for whatever you defined.
+There are two types of macros, the first type is very simple and defines a constant value:
+#define PI 3.14159265359
+
This macro will cause the preprocessor to convert any instances of "PI" in the code directly into "3.14159265359". If we break it down a bit, on the left hand side we have what is called the identifier ("PI"), and on the right hand side we have what is known as the token string ("3.14159265359"). The preprocessor searches the code for the identifier, and replaces it with the token string. This means that the literal text "PI" will no longer appear in our code, having been replaced with "3.14159265359". This can be useful for constant values that we may want to use repeatedly, but not type in full all the time. Note that the preprocessor will not convert PI if it appears in a comment or as part of a longer identifier such as "PieceOfPI".
+So let's go back to the macro we were looking at earlier.
+#define X_INPUT_GET_STATE(name) DWORD WINAPI name(DWORD dwUserIndex, XINPUT_STATE *pState)
+
It is somewhat more complex and appears to have a function definition on the right for a function called "name". This is the second type of macro, which uses a token string as part of it's definition. +As a simpler, but somewhat different example:
+#define ADD(argument1, argument2) argument1 + argument2
+
Looking at this, we can see that it can be broken down into two parts just the same as the previous macro. First, on the left side we have the identifier "ADD(argument1, argument2)". This works similarly to the first kind of macro, but because we have written it like a function, the preprocessor will save whatever it finds in place of "argument1" and "argument2", and then plug those in for where they were found in the token string.
+For example, if later on in the code we were to type:
+int c = ADD(a,b);
+
Our macro would expand this out to:
+int c = a + b;
+
So, again we will go back to our original macro, but this time we are going to pair it with its first use:
+#define X_INPUT_GET_STATE(name) DWORD WINAPI name(DWORD dwUserIndex, XINPUT_STATE *pState)
+typedef X_INPUT_GET_STATE(x_input_get_state);
+
Based on what we learned earlier we can see that the second line will expand out into the following:
+typedef DWORD WINAPI x_input_get_state(DWORD dwUserIndex, XINPUT_STATE *pState);
+
So that's easy enough to see, but what does 'typedef' mean?
+In C, typedef declares a different name for a type. This can almost be thought of as similar to #define, but it is limited to type names only and is performed by the compiler instead of the preprocessor.
+As a basic example, if we were to look at the following code:
+typedef char CHARACTER;
+CHARACTER x;
+
The compiler will treat the type of x as though it were 'char.' Why would we want to do this? Well, because declaring our own types allows us to make use of the compilers strict type checking. This means that if we were to declare a function which took a CHARACTER as the type of one of it's arguments then we would not be able to pass it a char without the compiler warning us about it, even though a CHARACTER was defined as a char and they are both signed one byte values.
+typedef DWORD WINAPI x_input_get_state(DWORD dwUserIndex, XINPUT_STATE *pState);
+
But in the Handmade Hero code, we are using typedef with a function declaration. This declares a function signature as a type. It is important to understand that although we can use that type to declare a function like this:
+x_input_get_state _XInputGetState()
+
Due to the rules of C, we are not allowed to then go on to define the function as follows:
+//INVALID C CODE FOLLOWS
+x_input_get_state _XInputGetState()
+{
+ //Do some things.
+}
+
We can define a function using our typedef and also declare the same function by using the original function type information, however, this makes the typedef itself unnecessary.
+So why are we using a typedef with a function declaration here? Because we are planning on using it to create a function pointer. A function pointer is just what it sounds like, a pointer to a function. The main benefit of function pointers is that they can be treated like variables, that is, they can be reassigned or passed as arguments.
+So this is how we would use our "x_input_get_state" typedef to declare a function pointer:
+x_input_get_state *PointerToXInputGetStateFunction;
+
We can then assign that pointer to any function which matches the signature of x_input_get_state:
+typedef DWORD WINAPI x_input_get_state(DWORD dwUserIndex, XINPUT_STATE *pState);
+DWORD XInputGetStateStub(DWORD dwUserIndex, XINPUT_STATE *pState);
+XInputGetStateStub
+{
+ return(0);
+}
+global_variable x_input_get_state *XInputGetState_ = XInputGetStateStub;
+
Which if we were to unwrap our macros from the original code, is almost exactly what we would have:
+#define X_INPUT_GET_STATE(name) DWORD WINAPI name(DWORD dwUserIndex, XINPUT_STATE *pState)
+typedef X_INPUT_GET_STATE(x_input_get_state);
+X_INPUT_GET_STATE(XInputGetStateStub)
+{
+ return(0);
+}
+global_variable x_input_get_state *XInputGetState_ = XInputGetStateStub;
+
Now all we have left is the last line:
+#define XInputGetState XInputGetState_
+
This is another simple macro which will substitute "XInputGetState_" for "XInputGetState" any time we use it in the code. We do this because XInputGetState is a function that has already been declared in xinput.h, and C does not allow us to declare functions multiple times. However, by using the preprocessor, we can type out our code as though we were just using XInputGetState we are actually using a function pointer which will either point to 'XInputGetStateStub' in the case that we are unable to dynamically load in the XInput library, or will point to the correct 'XInputGetState' function as defined by Microsoft in the case that we can load it.
+Hope that's helped clear some things up for you. Happy Heroing!
+This episode starts with some cleanup and fixes to the input handling code from yesterday.
+0
return value to indicate success. Our stubs probably shouldn't return 0
.xinput1_4.dll
only. So we need to try each version in turnCasey starts with a high level overview of sound programing for games. The key ideas here are that we are allocating + a circular buffer for sound, and the system will play it continually + on a loop. If you haven't worked with circular buffers (or ring buffers) before, much of this code will be confusing. + It's worth taking some time to familiarize yourself with them.
+Resources:
+The basic process for initializing DirectSound is as follows:
+LoadLibrary
("dsound.dll")
DirectSoundCreate()
IDirectSound8::SetCooperativeLevel()
IDirectSound8::CreateSoundBuffer()
IDirectSoundBuffer8::Play()
In the next episode we will look closely at how to fill this buffer and implement it in the game loop.
+DirectSound in an object-oriented API. What does that mean? Casey starts this episode with a discussion of "method" + dispatch through vtables, and why c++ virtual calls are costly.
+The basic process for initializing DirectSound is as follows:
+LoadLibrary
("dsound.dll")
DirectSoundCreate()
IDirectSound8::SetCooperativeLevel()
IDirectSound8::CreateSoundBuffer()
IDirectSoundBuffer8::Play()
Audio is a complicated topic, and we start with a discussion of audio "waveforms" and how PCM audio data is encoded in + memory.
+The procedure for writing sound data into a buffer is as follows
+IDirectSoundBuffer8::GetCurrentPosition()
IDirectSoundBuffer8::Lock()
+ IDirectSoundBuffer8::Unlock()
Audio latency is determined not by the size of the buffer, but by how far ahead of the PlayCursor you write. The optimal + amount of latency is the amount that will cause this frame's audio to coincide with the display of this frame's image. + On most platforms, it is very difficult to ascertain the proper amount of latency. It's an unsolved problem, and games + with need precise AV sync (like Guitar Hero) go to some lengths to achieve it.
+Don't let this happen to you, kids. You need good audio hardware to debug audio code.
+Because square waves are already pretty harsh, they prevent our ability to diagnose some audio bugs. A Sine wave is a
+ "purer" tone, and will enhance our ear's ability to pick up on weirdness. The sin
function, however, is defined to
+ return a value between -1 and 1, so we need to talk how to represent fractional numbers on a computer.
Fixed point is just integer math. We define some number of bits at the low end of our integer to represent the + fractional part of the number, and the remaining bits represent the whole part. Normal addition, subtraction, + multiplication, and division work fine, although we need to be aware of the rounding characteristics of fixed-point + when doing any numeric computation.
+Fixed-point math was used more widely before computers commonly had floating point hardware. Today every computer, GPU, + and phone has very strong floating point capabilities, and so it is the defacto way to do numerics on a modern computer.
+Floating-point is a more complicated (although very rigorously defined) way to represent fractional values. It + approaches the problem by dividing the available bits into:
+Such that the value represented is given by (sign)(mantissa * 2^exponent). This allows us to preserve a consistent + number of bits of precision (like "significant figures" from your physics class), given by the size of the mantissa, + regardless of the scale of our numbers, given by the exponent. This means that values representable by floating point + will be more densely packed near zero, and more sparse near the limits.
+Floating Point values come in a few different precisions: float
(single-precision, 32-bit), double
+ (double-precision, 64-bit), and long double
(128-bit). We will rely on single-precision float
s almost exclusively,
+ because they are good enough, and often we can operate on them twice as quickly as double
s.
For the test code, we use the c standard sinf
function. It's defined in math.h
. Its defined to accept a float
+ "angle" and return a float in the range [-1.0f, 1.0f]. The angle is a function of:
RunningSampleIndex
.When we set the tone frequency, we calculate its period in samples, and call it WavePeriod
.
The "angle" is then given by 2.0f*PI*((float)RunningSampleIndex / (float)WavePeriod)
.
The SampleValue
is given by the sinf(angle) * Volume
.
When you change the frequency with the current code, you'll end up with an artifact. To combat this, you need to track
+ an additional value in your synth, basically your progress through the period of the wave, here called tSine
.
+ Incrementally accumulate it per sample written:
tSine += 2.0f*Pi32*1.0f/(float)WavePeriod // tSine = 2*Pi*how many "WavePeriods" we've played since we started
+
Then just use it as the angle for the SampleValue
calculation.
SampleValue = sinf(tSine) * ToneVolume;
+
xinput9_1_0.dll
.
+ Add it to the chain when loading libs.Today we look at some techniques to get basic timing information from your running game. Timing, like everything, is + more complicated than it first appears.
+A Couple of ideas of time:
+The Windows platform attempts to provide us with some tools for high precision timing, but as it is a complicated topic, + there are some gotchas.
+QueryPerformanceFrequency()
returns a
+ LARGE_INTEGER number of counts/sec. It's guaranteed to be stable, so you can get away with just calling it once at
+ startup. QueryPerformanceCounter()
returns
+ a LARGE_INTEGER number of counts.
So, dividing counter/freq will give you a number of seconds since some unknown time in the past. More useful would be + (counter - last_counter)/freq. This will allow us to get an elapsed time since some known point in the past. However, + almost anything we want to time should be less than a second, and since this is an integer divide, anything between 1 + and 0 seconds will return 0. Not super useful. So, we instead multiply the elapsed counts by 1000 to get our formula + to get to elapsed milliseconds.
+elapsedMs = (1000*(counter - last_counter)) / freq
+
To get instantaneous frames per second, we can just divide without changing to milliseconds:
+fps = freq / (counter - last_counter)
+
Important ideas:
+Every x86 family proccessor has a Timestamp Counter (TSC), which + increments with every clock cycle since it was reset. RDTSC is a processor intruction that reads the TSC into general + purpose registers.
+For processors before Sandy Bridge but after dynamic clocking, RDTSC gave us actual clocks, but it was difficult to + correlate to wall time because of the variable frequency. Since Sandy Bridge, they give us "nominal" clocks, which + is to say the number of clocks elapsed at the chip's nominal frequency. These should correlate closely to wall clock + time, but make tracking the "number of cycles" notion of processor time more difficult.
+RDTSC is usually exposed in a compiler intrinsic. Check the docs for your compiler.
+Resources:
+ +Casey had to cover a couple of new corners of C in order to work with the techniques above.
+Union types are a C feature that let you superimpose a number of different + layouts over the same chunk of memory. For example LARGE_INTEGER, the return type of the QueryPerf calls. I can treat + it as an int64 by accessing its QuadPart, or as two int32s via HighPart and LowPart.
+An intrinsic is a compiler-specific extension that allows direct + invocation of some processor instruction. They generally need to be extensions to the compiler so they can avoid all + the expensive niceties compilers have to afford functions.
+Key ideas:
+alloca
is a compiler feature that allows for dynamic allocation on the stack. Much was learned and discussed, but
+ it should be noted that the function is deprecated and probably shouldn't be used in shipping code.
Continuation on how dll uses memory: https://hero.handmadedev.org/forum/code-discussion/99-day-21-s-statement-about-msvcrt-is-correct
+Masking the write:
+In SIMD, doing operations "4-wide" means that one wide (packed) operation operates on four pixels. So there's no + difference between doing an operation on one pixel or two or three or four, except when it comes to reading and + writing.
+The way we can make sure we only write the pixels we're actually operating on meaningfully is by masking out the ones we + aren't. Instead of doing a conditional check every loop, we want to build a mask that's filled with 1s in the places + where we'll keep the pixels, and 0s in the places where we'll throw out the pixels. + If we're operating on four pixels at once and we're hanging 2 off the edge, the mask might look like:
+[0x00000000,0x00000000,0xFFFFFFFF,0xFFFFFFFF]
+By doing a bitwise AND with the pixel data we generate, we can mask out the values that are invalid, since the zeroes in + the mask will knock out any bits set in our data. Likewise, the 1s will ensure any values we want to keep will remain in + place.
+We still need to preserve the destination how it was, and the easiest way to do that is to remember what the destination + looked like before, and use those values wherever we knocked out values in our data. So we generate an inverted mask + that might look something like:
+[0xFFFFFFFF,0xFFFFFFFF,0x00000000,0x00000000]
+Using the same AND technique, we can grab out the destination values that should remain unchanged. Then, we can combine + that with the set of valid pixel values we generated using the other mask using a bitwise OR. Since the places where the + two sets of values overlap are set to 0s in one of them, the data will effectively just be copied from one onto the + other with no interference.
+Semaphores:
+A semaphore is essentially a number that the operating system keeps track of, that can be incremented and decremented. + When you wait for a semaphore, you're essentially telling the OS to let you know when the semaphore number becomes + greater than zero. Once it does, then the Wait() call will return and the thread can do something. Calling + ReleaseSemaphore(), maybe a little counterintuitively, increments the semaphore, allowing any threads waiting on it to + continue working. (Thus, it releases those threads to do work). It doesn't actually change the state of the semaphore + other than making the number go up. The semaphore number goes down when a thread has successfully Wait()ed for the + semaphore. In cases like the one demonstrated on stream, this usually means the semaphore number will go up/down really + fast and stick close to 0 as most of the time the threads are waiting for the semaphore to increment.
+What this allows you to do is tell several threads at once that some work is ready without having to signal to each one + individually. As long as each one is waiting on the same semaphore object, they'll all know when there's more work to be + done.
+