docs/get_started/quick_start.rst
.. _quick-start:
To begin with, try out MLC LLM support for int4-quantized Llama3 8B. It is recommended to have at least 6GB free VRAM to run it.
.. tabs::
.. tab:: Python
**Install MLC LLM**. :ref:`MLC LLM <install-mlc-packages>` is available via pip.
It is always recommended to install it in an isolated conda virtual environment.
**Run chat completion in Python.** The following Python script showcases the Python API of MLC LLM:
.. code:: python
from mlc_llm import MLCEngine
# Create engine
model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
engine = MLCEngine(model)
# Run chat completion in OpenAI API.
for response in engine.chat.completions.create(
messages=[{"role": "user", "content": "What is the meaning of life?"}],
model=model,
stream=True,
):
for choice in response.choices:
print(choice.delta.content, end="", flush=True)
print("\n")
engine.terminate()
.. Todo: link the colab notebook when ready:
**Documentation and tutorial.** Python API reference and its tutorials are :ref:`available online <deploy-python-engine>`.
.. figure:: https://raw.githubusercontent.com/mlc-ai/web-data/main/images/mlc-llm/tutorials/python-engine-api.jpg
:width: 600
:align: center
MLC LLM Python API
.. tab:: REST Server
**Install MLC LLM**. :ref:`MLC LLM <install-mlc-packages>` is available via pip.
It is always recommended to install it in an isolated conda virtual environment.
**Launch a REST server.** Run the following command from command line to launch a REST server at ``http://127.0.0.1:8000``.
.. code:: shell
mlc_llm serve HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
**Send requests to server.** When the server is ready (showing ``INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)``),
open a new shell and send a request via the following command:
.. code:: shell
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC",
"messages": [
{"role": "user", "content": "Hello! Our project is MLC LLM. What is the name of our project?"}
]
}' \
http://127.0.0.1:8000/v1/chat/completions
**Documentation and tutorial.** Check out :ref:`deploy-rest-api` for the REST API reference and tutorial.
Our REST API has complete OpenAI API support.
.. figure:: https://raw.githubusercontent.com/mlc-ai/web-data/main/images/mlc-llm/tutorials/python-serve-request.jpg
:width: 600
:align: center
Send HTTP request to REST server in MLC LLM
.. tab:: Command Line
**Install MLC LLM**. :ref:`MLC LLM <install-mlc-packages>` is available via pip.
It is always recommended to install it in an isolated conda virtual environment.
For Windows/Linux users, make sure to have latest :ref:`Vulkan driver <vulkan_driver>` installed.
**Run in command line**.
.. code:: bash
mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
If you are using windows/linux/steamdeck and would like to use vulkan,
we recommend installing necessary vulkan loader dependency via conda
to avoid vulkan not found issues.
.. code:: bash
conda install -c conda-forge gcc libvulkan-loader
.. tab:: Web Browser
`WebLLM <https://webllm.mlc.ai/#chat-demo>`__. MLC LLM generates performant code for WebGPU and WebAssembly,
so that LLMs can be run locally in a web browser without server resources.
**Download pre-quantized weights**. This step is self-contained in WebLLM.
**Download pre-compiled model library**. WebLLM automatically downloads WebGPU code to execute.
**Check browser compatibility**. The latest Google Chrome provides WebGPU runtime and `WebGPU Report <https://webgpureport.org/>`__ as a useful tool to verify WebGPU capabilities of your browser.
.. figure:: https://blog.mlc.ai/img/redpajama/web.gif
:width: 300
:align: center
MLC LLM on Web
.. tab:: iOS
**Install MLC Chat iOS**. It is available on AppStore:
.. image:: https://developer.apple.com/assets/elements/badges/download-on-the-app-store.svg
:width: 135
:target: https://apps.apple.com/us/app/mlc-chat/id6448482937
|
**Note**. The larger model might take more VRAM, try start with smaller models first.
**Tutorial and source code**. The source code of the iOS app is fully `open source <https://github.com/mlc-ai/mlc-llm/tree/main/ios>`__,
and a :ref:`tutorial <deploy-ios>` is included in documentation.
.. figure:: https://blog.mlc.ai/img/redpajama/ios.gif
:width: 300
:align: center
MLC Chat on iOS
.. tab:: Android
**Install MLC Chat Android**. A prebuilt is available as an APK:
.. image:: https://seeklogo.com/images/D/download-android-apk-badge-logo-D074C6882B-seeklogo.com.png
:width: 135
:target: https://github.com/mlc-ai/binary-mlc-llm-libs/releases/download/Android-09262024/mlc-chat.apk
|
**Note**. The larger model might take more VRAM, try start with smaller models first.
The demo is tested on
- Samsung S23 with Snapdragon 8 Gen 2 chip
- Redmi Note 12 Pro with Snapdragon 685
- Google Pixel phones
**Tutorial and source code**. The source code of the android app is fully `open source <https://github.com/mlc-ai/mlc-llm/tree/main/android>`__,
and a :ref:`tutorial <deploy-android>` is included in documentation.
.. figure:: https://blog.mlc.ai/img/android/android-recording.gif
:width: 300
:align: center
MLC LLM on Android
Check out :ref:introduction-to-mlc-llm for the introduction of a complete workflow in MLC LLM.
Depending on your use case, check out our API documentation and tutorial pages:
webllm-runtimedeploy-rest-apideploy-clideploy-python-enginedeploy-iosdeploy-androiddeploy-ide-integration:ref:convert-weights-via-MLC, if you want to run your own models.
:ref:compile-model-libraries, if you want to deploy to web/iOS/Android or control the model optimizations.
Report any problem or ask any question: open new issues in our GitHub repo <https://github.com/mlc-ai/mlc-llm/issues>_.