<html>
    <head>
        <!-- __CINERA_INCLUDES__ -->
    </head>
    <body>
        <div>
            <!-- __CINERA_MENUS__ -->
            <!-- __CINERA_PLAYER__ -->
        </div>
        <!-- __CINERA_SCRIPT__ -->

        <article id="video-notes">
            <h1><!-- __CINERA_TITLE__ --></h1>
            <p>Masking the write:</p>
            <p>In SIMD, doing operations &quot;4-wide&quot; means that one wide (packed) operation operates on four pixels. So there&#39;s no
                difference between doing an operation on one pixel or two or three or four, except when it comes to reading and
                writing.</p>
            <p>The way we can make sure we only write the pixels we&#39;re actually operating on meaningfully is by masking out the ones we
                aren&#39;t. Instead of doing a conditional check every loop, we want to build a mask that&#39;s filled with 1s in the places
                where we&#39;ll keep the pixels, and 0s in the places where we&#39;ll throw out the pixels. 
                If we&#39;re operating on four pixels at once and we&#39;re hanging 2 off the edge, the mask might look like:</p>
            <p>[0x00000000,0x00000000,0xFFFFFFFF,0xFFFFFFFF]</p>
            <p>By doing a bitwise AND with the pixel data we generate, we can mask out the values that are invalid, since the zeroes in
                the mask will knock out any bits set in our data. Likewise, the 1s will ensure any values we want to keep will remain in
                place.</p>
            <p>We still need to preserve the destination how it was, and the easiest way to do that is to remember what the destination
                looked like before, and use those values wherever we knocked out values in our data. So we generate an inverted mask
                that might look something like:</p>
            <p>[0xFFFFFFFF,0xFFFFFFFF,0x00000000,0x00000000]</p>
            <p>Using the same AND technique, we can grab out the destination values that should remain unchanged. Then, we can combine
                that with the set of valid pixel values we generated using the other mask using a bitwise OR. Since the places where the
                two sets of values overlap are set to 0s in one of them, the data will effectively just be copied from one onto the
                other with no interference.</p>
        </article>
    </body>
</html>