68 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			68 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <html>
 | |
|     <head>
 | |
|         <!-- __CINERA_INCLUDES__ -->
 | |
|     </head>
 | |
|     <body>
 | |
|         <div>
 | |
|             <!-- __CINERA_MENUS__ -->
 | |
|             <!-- __CINERA_PLAYER__ -->
 | |
|         </div>
 | |
|         <!-- __CINERA_SCRIPT__ -->
 | |
| 
 | |
|         <article id="video-notes">
 | |
|             <h1><!-- __CINERA_TITLE__ --></h1>
 | |
|             <p>Today we look at some techniques to get basic timing information from your running game. Timing, like everything, is
 | |
|                 more complicated than it first appears.</p>
 | |
|             <p>A Couple of ideas of time:</p>
 | |
|             <ul>
 | |
|                 <li>Wall clock time - time as it passes in the real world. Measured in seconds.</li>
 | |
|                 <li>Processor time - how many cycles? this is related to wall clock time by processor frequency, but for a long time now
 | |
|                     frequency varies a lot and quickly.</li>
 | |
|             </ul>
 | |
|             <h2 id="wall-clock-time">Wall Clock Time</h2>
 | |
|             <p>The Windows platform attempts to provide us with some tools for high precision timing, but as it is a complicated topic,
 | |
|                 there are some gotchas.</p>
 | |
|             <p><a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms644905.aspx"><code>QueryPerformanceFrequency()</code></a> returns a
 | |
|                 LARGE_INTEGER number of counts/sec. It's guaranteed to be stable, so you can get away with just calling it once at
 | |
|                 startup. <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms644904.aspx"><code>QueryPerformanceCounter()</code></a> returns
 | |
|                 a LARGE_INTEGER number of counts.</p>
 | |
|             <p>So, dividing counter/freq will give you a number of seconds since some unknown time in the past. More useful would be
 | |
|                 (counter - last_counter)/freq. This will allow us to get an elapsed time since some known point in the past. However,
 | |
|                 almost anything we want to time should be less than a second, and since this is an integer divide, anything between 1
 | |
|                 and 0 seconds will return 0. Not super useful. So, we instead multiply the elapsed counts by 1000 to get our formula
 | |
|                 to get to elapsed milliseconds.</p>
 | |
|         <pre><code>elapsedMs = (1000*(counter - last_counter)) / freq
 | |
|         </code></pre><p>To get instantaneous frames per second, we can just divide without changing to milliseconds:</p>
 | |
|         <pre><code>fps = freq / (counter - last_counter)
 | |
|         </code></pre><p>Important ideas:</p>
 | |
|         <ul>
 | |
|             <li>To time a frame, only query the timer once per frame, otherwise your timer will leave out time between last frame's
 | |
|                 end and this frame's start.</li>
 | |
|         </ul>
 | |
|         <h2 id="processor-time">Processor Time</h2>
 | |
|         <p>Every x86 family proccessor has a <a href="http://en.wikipedia.org/wiki/Time_Stamp_Counter">Timestamp Counter (TSC)</a>, which
 | |
|             increments with every clock cycle since it was reset. RDTSC is a processor intruction that reads the TSC into general
 | |
|             purpose registers.</p>
 | |
|         <p>For processors before Sandy Bridge but after dynamic clocking, RDTSC gave us actual clocks, but it was difficult to
 | |
|             correlate to wall time because of the variable frequency. Since Sandy Bridge, they give us "nominal" clocks, which
 | |
|             is to say the number of clocks elapsed at the chip's nominal frequency. These should correlate closely to wall clock
 | |
|             time, but make tracking the "number of cycles" notion of processor time more difficult.</p>
 | |
|         <p>RDTSC is usually exposed in a compiler intrinsic. Check the docs for your compiler.</p>
 | |
|         <p>Resources:</p>
 | |
|         <ul>
 | |
|             <li><a href="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html">Intel Architecture Manuals</a></li>
 | |
|         </ul>
 | |
|         <h2 id="other-topics">Other topics</h2>
 | |
|         <p>Casey had to cover a couple of new corners of C in order to work with the techniques above.</p>
 | |
|         <h3 id="union-types">Union types</h3>
 | |
|         <p><a href="http://en.wikipedia.org/wiki/Union_type">Union types</a> are a C feature that let you superimpose a number of different
 | |
|             layouts over the same chunk of memory. For example LARGE_INTEGER, the return type of the QueryPerf calls. I can treat
 | |
|             it as an int64 by accessing its QuadPart, or as two int32s via HighPart and LowPart.</p>
 | |
|         <h3 id="compiler-intrinsics">Compiler Intrinsics</h3>
 | |
|         <p>An <a href="http://en.wikipedia.org/wiki/Intrinsic_function">intrinsic</a> is a compiler-specific extension that allows direct
 | |
|             invocation of some processor instruction. They generally need to be extensions to the compiler so they can avoid all
 | |
|             the expensive niceties compilers have to afford functions.</p>
 | |
|         </article>
 | |
|     </body>
 | |
| </html>
 |