examples/README.md
This page contains a curated list of examples, tutorials, blogs about WebLLM usecases. Please send a pull request if you find things that belong here.
Note that all examples below run in-browser and use WebGPU as a backend.
get-started: minimum get started example with chat completion.
simple-chat-js: a mininum and complete chat bot app in vanilla JavaScript.
simple-chat-ts: a mininum and complete chat bot app in TypeScript.
get-started-web-worker: same as get-started, but using web worker.
next-simple-chat: a mininum and complete chat bot app with Next.js.
subgroups-usage: capability-based routing between baseline and subgroup WebGPU WASM builds.
multi-round-chat: while APIs are functional, we internally optimize so that multi round chat usage can reuse KV cache
text-completion: demonstrates API engine.completions.create(), which is pure text completion with no conversation, as opposed to engine.chat.completions.create()
embeddings: demonstrates API engine.embeddings.create(), integration with EmbeddingsInterface and MemoryVectorStore of Langchain.js, and RAG with Langchain.js using WebLLM for both LLM and Embedding in a single engine
multi-models: demonstrates loading multiple models in a single engine concurrently
These examples demonstrate various capabilities via WebLLM's OpenAI-like API.
seed.tools and tool_choice (with preliminary support).logit_bias is supported, we additionally support stateful logit processing where users can specify their own rules. We also expose low-level API forwardTokensAndSample().appConfig.cacheBackend. Also demonstrates various cache utils such as checking
whether a model is cached, deleting a model's weights from cache, deleting a model library wasm from cache, etc. Note: cross-origin backend currently does not support programmatic tensor-cache deletion.