docs_new/docs/advanced_features/model_loading.mdx
--model-path selects the checkpoint to serve; --load-format and the weight-loading flags below control how those weights are read into memory. To stream weights from cloud object storage (S3/GCS/Azure), see Loading Models from Object Storage.
SGLang picks a loader from --load-format, falling back to auto-detection from the checkpoint or model path. The default auto loader reads safetensors and falls back to PyTorch .bin.
python -m sglang.launch_server \
--model-path Qwen/Qwen3.6-35B-A3B \
--load-format auto
Some formats are auto-detected and override auto:
mistral..gguf model path is detected and loaded with gguf.s3://, gs://, az://) is loaded with runai_streamer.remote.Set with --load-format:
--model-loader-extra-config takes a JSON string passed to the loader selected by --load-format.
python -m sglang.launch_server \
--model-path Qwen/Qwen3.6-35B-A3B \
--model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 16}'
Top-level arguments that tune how safetensors weights are read, independent of --load-format.