2018-01-08 22:10:24 +00:00
|
|
|
<html>
|
|
|
|
<head>
|
2018-01-15 22:23:46 +00:00
|
|
|
<!-- __CINERA_INCLUDES__ -->
|
2018-01-08 22:10:24 +00:00
|
|
|
</head>
|
|
|
|
<body>
|
2018-01-17 20:16:02 +00:00
|
|
|
<div>
|
2018-01-15 22:08:37 +00:00
|
|
|
<!-- __CINERA_MENUS__ -->
|
|
|
|
<!-- __CINERA_PLAYER__ -->
|
|
|
|
</div>
|
|
|
|
<!-- __CINERA_SCRIPT__ -->
|
2018-01-17 20:16:02 +00:00
|
|
|
|
|
|
|
<article id="video-notes">
|
|
|
|
<h1><!-- __CINERA_TITLE__ --></h1>
|
|
|
|
<p>Masking the write:</p>
|
|
|
|
<p>In SIMD, doing operations "4-wide" means that one wide (packed) operation operates on four pixels. So there's no
|
|
|
|
difference between doing an operation on one pixel or two or three or four, except when it comes to reading and
|
|
|
|
writing.</p>
|
|
|
|
<p>The way we can make sure we only write the pixels we're actually operating on meaningfully is by masking out the ones we
|
|
|
|
aren't. Instead of doing a conditional check every loop, we want to build a mask that's filled with 1s in the places
|
|
|
|
where we'll keep the pixels, and 0s in the places where we'll throw out the pixels.
|
|
|
|
If we're operating on four pixels at once and we're hanging 2 off the edge, the mask might look like:</p>
|
|
|
|
<p>[0x00000000,0x00000000,0xFFFFFFFF,0xFFFFFFFF]</p>
|
|
|
|
<p>By doing a bitwise AND with the pixel data we generate, we can mask out the values that are invalid, since the zeroes in
|
|
|
|
the mask will knock out any bits set in our data. Likewise, the 1s will ensure any values we want to keep will remain in
|
|
|
|
place.</p>
|
|
|
|
<p>We still need to preserve the destination how it was, and the easiest way to do that is to remember what the destination
|
|
|
|
looked like before, and use those values wherever we knocked out values in our data. So we generate an inverted mask
|
|
|
|
that might look something like:</p>
|
|
|
|
<p>[0xFFFFFFFF,0xFFFFFFFF,0x00000000,0x00000000]</p>
|
|
|
|
<p>Using the same AND technique, we can grab out the destination values that should remain unchanged. Then, we can combine
|
|
|
|
that with the set of valid pixel values we generated using the other mask using a bitwise OR. Since the places where the
|
|
|
|
two sets of values overlap are set to 0s in one of them, the data will effectively just be copied from one onto the
|
|
|
|
other with no interference.</p>
|
|
|
|
</article>
|
2018-01-08 22:10:24 +00:00
|
|
|
</body>
|
|
|
|
</html>
|