37 lines
		
	
	
		
			2.3 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			37 lines
		
	
	
		
			2.3 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <html>
 | |
|     <head>
 | |
|         <!-- __CINERA_INCLUDES__ -->
 | |
|     </head>
 | |
|     <body>
 | |
|         <div>
 | |
|             <!-- __CINERA_MENUS__ -->
 | |
|             <!-- __CINERA_PLAYER__ -->
 | |
|         </div>
 | |
|         <!-- __CINERA_SCRIPT__ -->
 | |
| 
 | |
|         <article id="video-notes">
 | |
|             <h1><!-- __CINERA_TITLE__ --></h1>
 | |
|             <p>Masking the write:</p>
 | |
|             <p>In SIMD, doing operations "4-wide" means that one wide (packed) operation operates on four pixels. So there's no
 | |
|                 difference between doing an operation on one pixel or two or three or four, except when it comes to reading and
 | |
|                 writing.</p>
 | |
|             <p>The way we can make sure we only write the pixels we're actually operating on meaningfully is by masking out the ones we
 | |
|                 aren't. Instead of doing a conditional check every loop, we want to build a mask that's filled with 1s in the places
 | |
|                 where we'll keep the pixels, and 0s in the places where we'll throw out the pixels. 
 | |
|                 If we're operating on four pixels at once and we're hanging 2 off the edge, the mask might look like:</p>
 | |
|             <p>[0x00000000,0x00000000,0xFFFFFFFF,0xFFFFFFFF]</p>
 | |
|             <p>By doing a bitwise AND with the pixel data we generate, we can mask out the values that are invalid, since the zeroes in
 | |
|                 the mask will knock out any bits set in our data. Likewise, the 1s will ensure any values we want to keep will remain in
 | |
|                 place.</p>
 | |
|             <p>We still need to preserve the destination how it was, and the easiest way to do that is to remember what the destination
 | |
|                 looked like before, and use those values wherever we knocked out values in our data. So we generate an inverted mask
 | |
|                 that might look something like:</p>
 | |
|             <p>[0xFFFFFFFF,0xFFFFFFFF,0x00000000,0x00000000]</p>
 | |
|             <p>Using the same AND technique, we can grab out the destination values that should remain unchanged. Then, we can combine
 | |
|                 that with the set of valid pixel values we generated using the other mask using a bitwise OR. Since the places where the
 | |
|                 two sets of values overlap are set to 0s in one of them, the data will effectively just be copied from one onto the
 | |
|                 other with no interference.</p>
 | |
|         </article>
 | |
|     </body>
 | |
| </html>
 |