hbml-html-parser

A fork of rem is an HTML5 parser written in Zig.

This fork is for the HBML compiler

How to use the parser:

const std = @import("std");
const rem = @import("html_parser");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    // This is the text that will be read by the parser.
    // Since the parser accepts Unicode codepoints, the text must be decoded before it can be used.
    const input = "<!doctype html><html><h1 style=bold>Your text goes here!</h1>";
    const decoded_input = try rem.util.utf8DecodeString(input);

    // Create the DOM in which the parsed Document will be created.
    var dom = rem.dom.Dom{ .allocator = allocator };
    defer dom.deinit();

    // Create the HTML parser.
    var parser = try rem.Parser.init(&dom, decoded_input, allocator, .report, false);
    defer parser.deinit();

    // This causes the parser to read the input and produce a Document.
    try parser.run();

    // `errors` returns the list of parse errors that were encountered while parsing.
    // Since we know that our input was well-formed HTML, we expect there to be 0 parse errors.
    const errors = parser.errors();
    std.debug.assert(errors.len == 0);

    // We can now print the resulting Document to the console.
    const stdout = std.io.getStdOut().writer();
    const document = parser.getDocument();
    try rem.util.printDocument(stdout, document, &dom, allocator);
}

Test the code

rem uses html5lib-tests as a test suite. Specifically, it tests against the 'tokenizer' and 'tree-construction' tests from that suite.

zig build test-tokenizer will run the 'tokenizer' tests. zig build test-tree-construction will run the 'tree-construction' tests in 2 ways: with scripting disabled, then with scripting enabled. The expected results are as follows:

tokenizer: All tests pass.
tree-construction (scripting disabled): Some tests are skipped because they rely on HTML features that aren't yet implemented in this library (specifically, templates). All other tests pass.
tree-construction (scripting enabled): Similar to testing with scripting off, but in addition, some entire test files are skipped because they would cause panics.

License

GPL-3.0-only

rem is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this library. If not, see https://www.gnu.org/licenses/.

References

HTML Parsing Specification

DOM Specification

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
src		src
test		test
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
example.zig		example.zig
rem.zig		rem.zig

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hbml-html-parser

How to use the parser:

Test the code

License

GPL-3.0-only

References

About

Releases

Packages

Languages

License

Polyscape-Research/hbml-html-parser

Folders and files

Latest commit

History

Repository files navigation

hbml-html-parser

How to use the parser:

Test the code

License

GPL-3.0-only

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages