2018-01-08 22:10:24 +00:00
|
|
|
<html>
|
|
|
|
<head>
|
|
|
|
<!-- __CINERA_INCLUDES__ -->
|
|
|
|
</head>
|
|
|
|
<body>
|
2018-01-15 22:08:37 +00:00
|
|
|
<div id="cinera">
|
|
|
|
<!-- __CINERA_MENUS__ -->
|
|
|
|
<!-- __CINERA_PLAYER__ -->
|
|
|
|
</div>
|
|
|
|
<!-- __CINERA_SCRIPT__ -->
|
2018-01-08 22:10:24 +00:00
|
|
|
|
|
|
|
<article id="video-notes">
|
|
|
|
<h1><!-- __CINERA_TITLE__ --></h1>
|
|
|
|
<p>Today we look at some techniques to get basic timing information from your running game. Timing, like everything, is
|
|
|
|
more complicated than it first appears.</p>
|
|
|
|
<p>A Couple of ideas of time:</p>
|
|
|
|
<ul>
|
|
|
|
<li>Wall clock time - time as it passes in the real world. Measured in seconds.</li>
|
|
|
|
<li>Processor time - how many cycles? this is related to wall clock time by processor frequency, but for a long time now
|
|
|
|
frequency varies a lot and quickly.</li>
|
|
|
|
</ul>
|
|
|
|
<h2 id="wall-clock-time">Wall Clock Time</h2>
|
|
|
|
<p>The Windows platform attempts to provide us with some tools for high precision timing, but as it is a complicated topic,
|
|
|
|
there are some gotchas.</p>
|
|
|
|
<p><a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms644905.aspx"><code>QueryPerformanceFrequency()</code></a> returns a
|
|
|
|
LARGE_INTEGER number of counts/sec. It's guaranteed to be stable, so you can get away with just calling it once at
|
|
|
|
startup. <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms644904.aspx"><code>QueryPerformanceCounter()</code></a> returns
|
|
|
|
a LARGE_INTEGER number of counts.</p>
|
|
|
|
<p>So, dividing counter/freq will give you a number of seconds since some unknown time in the past. More useful would be
|
|
|
|
(counter - last_counter)/freq. This will allow us to get an elapsed time since some known point in the past. However,
|
|
|
|
almost anything we want to time should be less than a second, and since this is an integer divide, anything between 1
|
|
|
|
and 0 seconds will return 0. Not super useful. So, we instead multiply the elapsed counts by 1000 to get our formula
|
|
|
|
to get to elapsed milliseconds.</p>
|
|
|
|
<pre><code>elapsedMs = (1000*(counter - last_counter)) / freq
|
|
|
|
</code></pre><p>To get instantaneous frames per second, we can just divide without changing to milliseconds:</p>
|
|
|
|
<pre><code>fps = freq / (counter - last_counter)
|
|
|
|
</code></pre><p>Important ideas:</p>
|
|
|
|
<ul>
|
|
|
|
<li>To time a frame, only query the timer once per frame, otherwise your timer will leave out time between last frame's
|
|
|
|
end and this frame's start.</li>
|
|
|
|
</ul>
|
|
|
|
<h2 id="processor-time">Processor Time</h2>
|
|
|
|
<p>Every x86 family proccessor has a <a href="http://en.wikipedia.org/wiki/Time_Stamp_Counter">Timestamp Counter (TSC)</a>, which
|
|
|
|
increments with every clock cycle since it was reset. RDTSC is a processor intruction that reads the TSC into general
|
|
|
|
purpose registers.</p>
|
|
|
|
<p>For processors before Sandy Bridge but after dynamic clocking, RDTSC gave us actual clocks, but it was difficult to
|
|
|
|
correlate to wall time because of the variable frequency. Since Sandy Bridge, they give us "nominal" clocks, which
|
|
|
|
is to say the number of clocks elapsed at the chip's nominal frequency. These should correlate closely to wall clock
|
|
|
|
time, but make tracking the "number of cycles" notion of processor time more difficult.</p>
|
|
|
|
<p>RDTSC is usually exposed in a compiler intrinsic. Check the docs for your compiler.</p>
|
|
|
|
<p>Resources:</p>
|
|
|
|
<ul>
|
|
|
|
<li><a href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html">Intel Architecture Manuals</a></li>
|
|
|
|
</ul>
|
|
|
|
<h2 id="other-topics">Other topics</h2>
|
|
|
|
<p>Casey had to cover a couple of new corners of C in order to work with the techniques above.</p>
|
|
|
|
<h3 id="union-types">Union types</h3>
|
|
|
|
<p><a href="http://en.wikipedia.org/wiki/Union_type">Union types</a> are a C feature that let you superimpose a number of different
|
|
|
|
layouts over the same chunk of memory. For example LARGE_INTEGER, the return type of the QueryPerf calls. I can treat
|
|
|
|
it as an int64 by accessing its QuadPart, or as two int32s via HighPart and LowPart.</p>
|
|
|
|
<h3 id="compiler-intrinsics">Compiler Intrinsics</h3>
|
|
|
|
<p>An <a href="http://en.wikipedia.org/wiki/Intrinsic_function">intrinsic</a> is a compiler-specific extension that allows direct
|
|
|
|
invocation of some processor instruction. They generally need to be extensions to the compiler so they can avoid all
|
|
|
|
the expensive niceties compilers have to afford functions.</p>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|