Back to Annotated Deep Learning Paper Implementations

Evaluate GPT-NeoX using LLM.int8() quantization on test suite

docs/neox/evaluation/half_precision.html

latest1.3 KB
Original Source

homeneoxevaluation

View code on Github

#

Evaluate GPT-NeoX using LLM.int8() quantization on test suite

This code evaluate GPT-NeoX using, on a suite of tasks.

12importargparse1314importtorch15fromtorchimportnn1617fromlabml\_nn.neox.evaluationimportrun\_eval\_harness18fromlabml\_nn.neox.modelimportLayerGenerator

#

21defmain():

#

Argument parser

23parser=argparse.ArgumentParser()2425parser.add\_argument("--flash",action='store\_true',help="whether to use Flash Attention")2627opt=parser.parse\_args()

#

Device

30device=torch.device('cuda:0')

#

Load layers

32layers=list(LayerGenerator(is\_clone\_layers=True,33filter\_layers=None,34dtype=torch.float16,35device=device,36is\_flash\_attention=opt.flash,37).load())

#

Create nn.Sequential model

40model=nn.Sequential(\*layers)

#

Run evaluation harness

43print(run\_eval\_harness(model,'half\_precision',['lambada'],device))

#

47if\_\_name\_\_=='\_\_main\_\_':48main()

labml.ai