Developing a Web Browser - Part 1 - Text Elements

Web browsers. We use them everyday, but what’s involved in the development of such an application? I intend to find out by developing one from scratch over the following months (and quite possibly years). Regular updates will be posted to this blog when certain milestones are completed. The browser will support all features supported by all modern web browsers including but not limited to:

  • Javascript

  • CSS3

  • Video

  • HTML5

  • WebGL (Maybe)

The web browser will be implemented using the Java programming language - and - since I’m writing way too much Javascript these days I’m thinking about embedding Perl support for scripting, adding an element of uniqueness to the application.

HTML Downloading and Parsing

A logical place to start a project like this would be to handle downloading an parsing of HTML as these are first steps a browser must perform before anything is able to presented to the user. Downloading the HTML is handled via the HTTP client, of which I’m using JSoup - which will also be used for parsing the downloaded HTML - using the same library for both tasks is very convenient. I’ve used JSoup for a few projects in the past and I’ve grown to like to it’s convenient high level API. After parsing the downloaded HTML a DOM (Document Object Model) tree is created, which for the following example HTML (figure 1) will look something like the tree presented in figure 2.

HTML Rendering - Text Elements

Using the above example, once the DOM has been built it’s time to present it to the user in a graphical manner. The browser achieves this by constructing another tree composed of different Views - one for each type of element present in the DOM, paragraphs, headings, spans … etc - each view is effectively a customised Swing component, composed of various child components and layouts. Each view has a reference to it’s associated DOM element. Most of the text elements were quite easy to represent using swing’s APIs and classes the only real difficulty was handling the difference between those elements that are supposed to rendered as a block versus those that are inline, and how different combinations of either element interacted with each other.

Representation of the (visual) elements will be kept as close as possible to the standards set out by W3C which can be found here - HTML: The Markup Language (an HTML language reference) using this in conjunction with testing against modern browsers such as Chrome and Firefox I should be able to achieve a pretty good result.

Showcase - Text Elements

Click through the following showcase to view the text elements that are currently supported.

What’s Next

Next I’m planning on supporting structural elements (sections) such as header, footer, section, nav, article …. etc anything that’s supposed to represent a specific portion of a website. I might also implement div and possibly touch some CSS we’ll see.

Developing a web browser - Part 2 - Sections, Divs and CSS

Surveillance System - Using USB Cameras.