Back to Machinelearning

README

docs/gen-ai/README.md

5.0.01019 B
Original Source

This folder contains the design doc for GenAI Model package

Basic

Contracts && API

Need further investigation

  • Dynamic loading: load only part of model to GPU when gpu memory is limited. We explore the result w/o dynamic loading in this report
  • Improve loading speed: I notice that the model loading speed from disk to memory is slower in torchsharp than what it is in huggingface. Need to investigate the reason and improve the loading speed
  • Quantization: quantize the model to reduce the model size and improve the inference speed