import React from "react";
import styled from 'styled-components';

const StyledDownload = styled.a`
display: block;
border: 1px solid #AAA;
border-radius: 1rem;
color: inherit;
text-decoration: none;
padding: 0 1rem;
user-select: none;
`;
export default function BBCode() {
    return (<div>
        <h1>BBCode Parser</h1>
        <p>This is a project initially created in 2011. Although much of the work discussed here is quite old, this is an effort that I have fond memories of and have chosen to keep, as it chronicles an evolving effort to solve a relatively narrow problem, which lends itself to self-retrospection. It summarises many iterations and rebuilds of a library capable of converting a string of text (typically a forum post) from "BBCode" formatting into HTML, and initially was also required to convert back to an approximation of the original string.</p>
        <h2>Requirements</h2>
        <p>Each iteration of the BBCode engine had to deal with the same problem - Users can enter plain text with enriched formatting tokens which should be safely converted to a HTML representation. The user should not be able to submit their own HTML, as this would create a significant security risk by allowing the use of malicious script, leading to a wide range of potential attacks via XSS.</p>
        <p>Although not immediately apparent, the system has to deal with many of the same problems as an XML parser (though fortunately, a reduced set). Formatting instructions may be nested, though some combinations of formatting instructions may be less desirable when nested. Some tags in BBCode - Such as list item identifiers - Are special, and do not require a closing token. Other tags may have special formatting requirements for their content, such as requiring a valid URI.</p>
        <h2>Desirable Features</h2>
        <p>Beyond the requirements around handling of tags, I was interested in creating a better editing experience than other implementations at the time.</p>
        <h3>Autocorrect Mistakes</h3>
        <p>Although many basic errors in BBCodes are easy to spot, they were typically noticed post-publication. Few BBCode libraries at the time were capable of even identifying a mistake, and fewer still were capable of handling it. Commonly, these mistakes were caused by people being aware of the formatting tokens (e.g. <tt>[b]</tt> for emboldened text), but mistyping the end result in some manner (for example, mixing formatting tokens but not closing them in the correct order, or forgetting closing tokens entirely)</p>
        <p>Though error correction is not always perfectly accurate, the common cases covered by the library resulted in positive user feedback. Non-desirable output from invalid input was assumed to almost always better than a raw formatting token displayed within the published article, which feedback also reflected.</p>
        <h3>Support Complex Use-Cases</h3>
        <p>Although it was important for the project to be able to cover basic formatting such as emboldened or italicised text, it was also desirable to allow for more complicated features to be embedded into the editor. A common request was for a "gallery", essentially a list of specified images, with captioning support. Although technically possible with some other implementations, implementing this via a regular expression and correctly validating the input was challenging with contemporary solutions to the problem.</p>
        <h3>Condense Output</h3>
        <p>Frequently mixing and embedding formatting tokens created a relatively complicated tree of tags, and a lengthy output string. Though this didn't <em>actually</em> cause a problem, this apparently bothered me enough at the time to explicitly fix it. The desired result was a smaller, less complex output string that ignored repeated uses of a tag where it would have no effect, and reduced the depth of the resulting node tree for more complex formatting.</p>
        <h2>Initial Version (2011)</h2>
        <p>When I first tackled this problem, the then-contemporary method of solving this problem was via the use of regular expressions, generally by adding a separate regular expression for each method of formatting desired by the implementation. Due to the problem being similar to XML parsing, this implementation had many of the same problems that developers face when parsing XML via regular expressions, such as invalid HTML being output. There was also no way to provide feedback to the user when their input was incorrect (for example, a <tt>[link]</tt> token with an invalid URI), and would publish the message with the mistake included.</p>
        <p>The 2011 version of this BBCode parser converted the input string to an intermediate XML representation, then leveraged PHP's native XML parsing to handle common issues in user-supplied BBCode. The version available in this portfolio is slightly outdated, missing some of the tags and fixes added later on, along with some API improvements. The included code sample has also been commented to indicate how the code is intended to work, a lesson learned later.</p>
        <p>The solution allowed for a callback to be run for each formatting method, and always resulted in valid XHTML, ensuring that it was unlikely to affect the rest of the layout. It also allowed for reversing the output to resemble the original input, understanding its own resulting strings and converting them back to one or more formatting tokens.</p>
        <h3>Criticism</h3>
        <p>This piece of code was written many years ago, targeting both PHP4 and PHP5, as such it misses out on things I've learned in the intervening years, and features of newer PHP releases. In particular, the API could benefit from some improvements - Many things could have been offloaded to separate functions to reduce duplication of code. Some convenience functions could have been added, and some code should have been broken up into separate functions in order to allow different use cases.</p>
        <p>One core issue with the API is that it coerces the BBCode into an XML parser. This is useful for the auto-correction feature, but means that quirks in XML must be accounted for in the parser. These quirks are avoided wherever possible, but required knowledge of lesser-used XML features in order to avoid them. It's also possible that there are more aspects of XML not required for this project which cause quirks or other issues with the rendering.</p>
        <p>The code also created different output depending on the version of PHP in use, as it relied on the underlying parser. This tended to mean that a more recent version of PHP resulted in better output, which was desirable, but left deployments on older versions of the software with a less effective solution.</p>
        <p>Despite the room for potential improvements, however, the library was successfully deployed as part of a web forum and several news systems. Messages were parsed quickly and errors were corrected automatically, rather than forcing users to edit them.</p>
        <StyledDownload href="./download/bbcode/old.zip">
            <h2>Download (2011)</h2>
            <p>A 2011 version of this parser is available for download. Not recommended for use, but provided to satisfy curiosity.</p>
        </StyledDownload>
        <h2>Updated Version (2017)</h2>
        <p>The requirement for a BBCode engine again resurfaced in 2017 for a bespoke gaming network website. After the initial deployment of the site, there was sufficient time to revisit the problem as part of a feature increment, eventually resulting in a near-total rewrite.</p>
        <p>Instead of using the (admittedly tried-and-tested) BBCode-to-XML regex, a bespoke parser was created which took the input and parsed it, character-by-character, into several intermediate representations. This implementation eventually resembled a compiler, and borrowed several tricks from compilers in order to make the code sufficiently fast. It also avoided many tricks from compilers in order to make the code actually readable.</p>
        <p>The result was a well-understood library which worked the same way for any version of PHP which supported the used language features (7.0+), and could quickly-but-capably parse long strings of formatted text. The implementation also retained the flexibility of the earlier solution to the problem, but was more maintainable. The code also included unit and component tests, allowing for more confidence in the resulting output when changes were made.</p>
        <h3>Criticism</h3>
        <p>There's a question here regarding how necessary this solution was over its predecessor, but I had concerns regarding the security of the solution due to relying on XML and not knowing how well-secured the library and implementation was. The former library was slow, but sufficiently fast for most uses cases, though the new implementation did help pave the way toward responding fast enough to display more "posts" (messages) per page.</p>
        <p>As this rewrite was more about replacing the parsing of BBCode than rewriting the existing tags, this version still suffers from a large entry point. It could have been beneficial to separate this into a series of functions that each provide a token formatter to allow easier editing.</p>
        <p>Unfortunately, moving this to the client side was not valuable enough, as the site itself did not separate the UI from API. As such, a server-side version was still useful, if not necessarily desirable.</p>
        <StyledDownload href="./download/bbcode/new.zip">
            <h2>Download</h2>
            <p>A 2017 version of this parser is available for download.</p>
        </StyledDownload>
        <h2>Future(?)</h2>
        <p>Were this problem to be attempted for a third time, and no existing solution deemed suitable, this project would likely not be developed on the server side. In the intervening years, I have come to believe that this problem is not actually within the responsibility of a server, and should actually be implemented by the client.</p>
        <p>Since the original version of this software library, browsers have developed to the point where they could confidently be used to implement this solution, without relying on the server. This moves several concerns around performance to the client, where they can actually be better-addressed (e.g. by delaying parsing) and allows the server to respond faster with content.</p>
        <p>It is also questionable whether <em>BBCode</em> is still commonly requested by users, or whether <em>Markdown</em> is a better option. Markdown is now much more popular than it was ten years ago, and libraries for parsing it are readily available. It also has the advantage of being much simpler for most use cases, allowing users and developers to think about something else.</p>
</div>);
};