READMEs/README.html-parser.md
Lws is able to parse and render a subset of CSS + HTML5 on very constrained devices such as ESP32, which have a total of 200KB heap available after boot at best. There are some technology advances in lws that allow much greater capability that has previously been possible on those platforms.
The goal is that all system display content is expressed in HTML/CSS by user code, which may also be dynamically generated, with CSS responsive layout simplifying managing the same UI over different display dimensions.
There are restrictions - most generic html on the internet are too complex or want more assets from different hosts than tiny devices can connect to - but they are quite far beyond what you would expect from a 200KB heap limit. It is very possible to mix remote and local http content over h2 including large JPEG and PNG images and express all UI in html/css.
style= element attributes
not supported.Heap Costs during active decode (while rendering line that includes image)
| Feature | Decoder Cost in heap (600px width) |
|---|---|
| JPEG-Grayscale | 6.5KB |
| JPEG-YUV 4:4:4 | 16.4KB |
| JPEG-YUV 4:4:2v | 16.4KB |
| JPEG-YUV 4:4:2h | 31KB |
| JPEG-YUV 4:4:0 | 31KB |
| PNG | 36KB |
Connecting to an external tls source costs around 50KB. So for very constrained targets like ESP32, the only practical way is a single h2 connection that provides the assets as streams multiplexed inside a single tls tunnel.
Integrates CA trust bundle dynamic querying into lws, with openssl and mbedtls. It can support all the typical Mozilla 130+ trusted CAs, using the trust chain information from the server cert to identify the CA cert required, and just instantiating that one to validate the server cert, if it trusts it. The trust CTX is kept around in heap for a little while for the case there are multiple connections needing it.
No heap is needed for trusted certs that are not actively required. This means lws can securely connect over tls to arbitrary servers like a browser would without using up all the memory; without this it's not possible to support arbitrary connections securely within the memory constraints.
Lws supports a logical Display List for graphical primitives common in HTML + CSS, including compressed antialiased fonts, JPEG, PNG and rounded rectangles.
This intermediate representation allows display surface layout without having all the details to hand, and provides flexibility for how to render the logical representation of the layout.
There may not be enough heap to hold a framebuffer for even a midrange display device, eg an RGB buffer for the 600 x 448 display at the top of the page is 800KB. Even if there is, for display devices that hold a framebuffer on the display, eg, SPI TFT, OLED, or Electrophoretic displays, the display data is anyway sent linewise (perhaps in two planes, but still linewise) to the display.
In this case, there is no need for a framebuffer at the device, if the software stack is rewritten to stream-parse all the page elements asynchronously, and each time enough is buffered, processed and composed to produce the next line's worth of pixels. Only one or two lines' worth of buffer is required then.
This is the lws approach, rewrite the asset decoders to operate completely statefully so they can collaborate to provide just the next line's data Just-in- Time.
Lws includes fully stream-parsed decoders, which can run dry for input or output at any state safely, and pick up where they left off when more data or space is next available.
These were rewritten from UPNG and Picojpeg to be wholly stateful. These DLO are bound to flow-controlled SS so the content can be provided to the composer Just In Time. The rewrite requires that it can exit the decode at any byte boundary, due to running out of input, or needing to flush output, and resume with the same state, this is a complete inversion of the original program flow where it only returns when it has rendered the whole image into a fullsize buffer and decode state is spread around stack or filescope variables.
PNG transparency is supported via its A channel and composed by modulating alpha.
Based on mcufont, these are 4-bit antialised fonts produced from arbitrary TTFs. They are compressed, a set of a dozen different font sizes from 10px thru 32px and bold sizes only costs 100KB storage. The user can choose their own fonts and sizes, the encoder is included in lws.
The mcufont decompressor was rewritten to be again completely stateful, glyphs present on the current line are statefully decoded to produce that line's-worth of output only and paused until the next line. Only glyphs that appear on the current line have instantiated decoders.
The anti-alias information is composed into the line buffer as alpha.
Secure Streams and lws VFS now work together via file:// URLs, a SS can be
directed to a local VFS resource the same way as to an https:// resource.
Resources from https:// and file:// can refer to each other in CSS or ``
cleanly.
Dynamic content, such as dynamic HTML, can be registered in a DLO VFS filesystem and referenced via SS either as the toplevel html document or by URLs inside the HTML.
.jpg and .png resources can be used in the html and are fetched using their
own SS, if coming from the same server over h2, these have very modest extra
memory needs since they are sharing the existing h2 connection and tls.
All of the efforts to make JPEG or PNG stream-parsed are not useful if either there is an h1 connection requiring a new TLS session that exhausts the heap, or even if multiplexed into the same h2 session, the whole JPEG or PNG is dumped too quickly into the device which cannot buffer it.
On constrained devices, the only mix that allows multiple streaming assets that
are decoded as they come, is an h2 server with streaming modulated by h2 tx
credit. The demos stream css, html, JPEG and CSS from libwebsockets.org over h2.
In lws, lws_flow provides the link between maximum buffering targets and the
tx_credit flow control management.
The number of assets that can be handled simultaneously on an HTML page is restricted by the irreducible heap cost of decoding them, about 36KB + an RGB line buffer for PNGs, and an either 8 (YUV4:4:4) or 16 RGB (4:4:2 or 4:4:0) line buffer for JPEG.
However, PNG and JPEG decode occurs lazily starting at the render line where the object starts becoming visible, and all DLO objects are destroyed after the last line where they are visible. The SS responsible for fetching and regulating the bufferspace needed is started at layout-time, and the parser is started too up to the point that the header with the image dimensions is decoded, but not beyond that where the large decoder allocation is required.
It means only images that appear on the same line have decoders that are instantiated in memory at the same time; images that don't share any horizontal common lines do not exist in heap simultaneously; basically multiple vertically stacked images cost little more than one.
The demo shows that even on ESP32, the images are cheap enough to allow a full size background JPEG with a partially-transparent PNG composed over it.
Internally, lws provides either a 8-bit Y (grayscale) or 32-bit RGBA (trucolor) composition pipeline for all display elements, based on if the display device is monochrome or not. Alpha (opacity) is supported. This is true regardless of final the bit depth of the display device, so even B&W devices can approximate the same output.
Gamma of 2.2 is also applied before palettization, then floyd-steinberg dithering, all with just a line buffer and no framebuffer needed at the device. The assets like JPEG can be normal, RGB ones, and the rendering adapts down to the display palette and capabilities dynamically.
The lws_display support in lws has been extended to a variety of common EPD
controllers such as UC8171, supporting B&W, B&W plus a third colour (red or
yellow typically) and 4-level Gray. The ILI9341 driver for the displays found
on WROVER KIT and the ESP32S Kaluga KIT has been enhanced to work with the new
display pipline using native 565.
To maximize the scalability, HTML is parsed into an element stack, consisting of a set of nested parent-child elements. As an element goes out of scope and the parsing moves on to the next, its parents also go out of scope and are destroyed... new parsents are kept in the stack again only while they have children in scope. This keeps a strict pressure against large instantaneous heap allocations for HTML parsing, but it has some implications.
This "goldfish memory" "keyhole parsing" scheme by itself is inadequate when the
dimensions of future elements will affect the dimensions of the current one, eg,
a table where we don't find out until later how many rows it has, and so how
high it is. There's also a class of retrospective dimension acquisition, eg,
where a JPEG img is in a table, but we don't find out its dimensions until we
parse its header much later, long after the whole http parser stack related to
it has been destroyed, and possibly many other things laid out after it.
int
lws_lhp_ss_browse(struct lws_context *cx, lws_display_render_state_t *rs,
const char *url, sul_cb_t render);
You basically give it an https:// or file:// URL, a structure for the
render state, and a callback for when the DLOs have been created and lines of
pixels are being emitted. The source fetching, parsing, layout, and finally
rendering proceed asynchronously on the event loop without blocking beyond the
time taken to emit by default 4 lines.
In these examples, the renderer callback passes the lines of pixels to the
lws_display blit op.
See ./include/libwebsockets/lws-html.h for more information.
Also see ./minimal-examples-lowlevel/api-tests/api-test-lhp-dlo/main.c
example, you can render to 24-bit RGB on stdout by giving it a URL, eg
$ ./bin/lws-api-test-lhp-dlo https://libwebsockets.org/lhp-tests/t1.html >/tmp/a.data
The raw RGB can be opened in GIMP.