Denoising විසරණ ව්යංග ආකෘති (DDIM) නියැදීම

මෙය කඩදාසි වලින් DDIM නියැදීම ක්රියාත්මක කරයි Denoising Diffusion Implicit ආකෘති Denoising

16fromtypingimportOptional,List1718importnumpyasnp19importtorch2021fromlabmlimportmonit22fromlabml\_nn.diffusion.stable\_diffusion.latent\_diffusionimportLatentDiffusion23fromlabml\_nn.diffusion.stable\_diffusion.samplerimportDiffusionSampler

ඩීඩීඅයිඑම් නියැදිකරු

මෙය DiffusionSampler මූලික පන්තිය පුළුල් කරයි.

ඩීඩීපීඑම් සාම්පල රූප භාවිතා කරමින් පියවරෙන් පියවර නියැදීමෙන් ශබ්දය නැවත නැවතත් ඉවත් කිරීමෙන්,

xτi−1=ατi−1(ατixτi−1−ατiϵθ(xτi))+1−ατi−1−στi2⋅ϵθ(xτi)+στiϵτi

අහඹු ශබ්දයϵτi යනු කොතැනද, දිගτ අනුක්රමයකිS, සහ[1,2,…,T]στi=η1−ατi1−ατi−11−ατi−1ατi

ඩීඩීඅයිඑම් කඩදාසිαt වල ඩීඩීපීඑම්αtˉ වෙතින් සඳහන් වන බව සලකන්න.

26classDDIMSampler(DiffusionSampler):

52model:LatentDiffusion

model ශබ්දය පුරෝකථනය කිරීමේ ආකෘතියයිϵcond(xt,c)
n_steps DDIM නියැදි පියවර ගණන,S
ddim_discretizeτඋපුටා ගන්නේ කෙසේද යන්න නියම කරයි[1,2,…,T]. එය එක්කෝuniform හෝ විය හැකියquad .
ddim_eta ගණනය කිරීමටη භාවිතා වේστi. η=0නියැදි ක්රියාවලිය තීරණය කරයි.

54def\_\_init\_\_(self,model:LatentDiffusion,n\_steps:int,ddim\_discretize:str="uniform",ddim\_eta:float=0.):

63super().\_\_init\_\_(model)

පියවර ගණන,T

65self.n\_steps=model.n\_steps

ඒකාකාරව බෙදාτ හැරීමට ගණනය කරන්න[1,2,…,T]

68ifddim\_discretize=='uniform':69c=self.n\_steps//n\_steps70self.time\_steps=np.asarray(list(range(0,self.n\_steps,c)))+1

චතුරස්රාකාර ලෙස බෙදාτ හැරීමට ගණනය කරන්න[1,2,…,T]

72elifddim\_discretize=='quad':73self.time\_steps=((np.linspace(0,np.sqrt(self.n\_steps\*.8),n\_steps))\*\*2).astype(int)+174else:75raiseNotImplementedError(ddim\_discretize)7677withtorch.no\_grad():

ලබා ගන්නαtˉ

79alpha\_bar=self.model.alpha\_bar

ατi

82self.ddim\_alpha=alpha\_bar[self.time\_steps].clone().to(torch.float32)

ατi

84self.ddim\_alpha\_sqrt=torch.sqrt(self.ddim\_alpha)

ατi−1

86self.ddim\_alpha\_prev=torch.cat([alpha\_bar[0:1],alpha\_bar[self.time\_steps[:-1]]])

στi=η1−ατi1−ατi−11−ατi−1ατi

91self.ddim\_sigma=(ddim\_eta\*92((1-self.ddim\_alpha\_prev)/(1-self.ddim\_alpha)\*93(1-self.ddim\_alpha/self.ddim\_alpha\_prev))\*\*.5)

1−ατi

96self.ddim\_sqrt\_one\_minus\_alpha=(1.-self.ddim\_alpha)\*\*.5

නියැදි ලූප

shape ස්වරූපයෙන් ජනනය කරන ලද රූපවල හැඩය[batch_size, channels, height, width]
cond කොන්දේසි සහිත කාවැද්දීම් වේc
temperature යනු ශබ්දයේ උෂ්ණත්වය (අහඹු ශබ්දය මෙයින් ගුණ කරනු ලැබේ)
x_last වේxτS. සපයා නොමැති නම් අහඹු ශබ්දය භාවිතා කරනු ඇත.
uncond_scale යනු කොන්දේසි විරහිත මාර්ගෝපදේශs පරිමාණයයි. මෙය භාවිතා වේϵθ(xt,c)=sϵcond(xt,c)+(s−1)ϵcond(xt,cu)
uncond_cond හිස් විමසුමක් සඳහා කොන්දේසි සහිත කාවැද්දීම වේcu
skip_steps මඟ හැරීමට කාල පියවර ගණනi′ වේ. අපි නියැදීම ආරම්භ කරමුS−i′. එවිටx_last යxτS−i′.

[email protected]\_grad()99defsample(self,100shape:List[int],101cond:torch.Tensor,102repeat\_noise:bool=False,103temperature:float=1.,104x\_last:Optional[torch.Tensor]=None,105uncond\_scale:float=1.,106uncond\_cond:Optional[torch.Tensor]=None,107skip\_steps:int=0,108):

උපාංගය සහ කණ්ඩායම් ප්රමාණය ලබා ගන්න

125device=self.model.device126bs=shape[0]

ලබා ගන්නxτS

129x=x\_lastifx\_lastisnotNoneelsetorch.randn(shape,device=device)

නියැදි කිරීමට කාල පියවරτS−i′,τS−i′−1,…,τ1

132time\_steps=np.flip(self.time\_steps)[skip\_steps:]133134fori,stepinmonit.enum('Sample',time\_steps):

iලැයිස්තුවේ දර්ශකය[τ1,τ2,…,τS]

136index=len(time\_steps)-i-1

පියවර වේලාවτi

138ts=x.new\_full((bs,),step,dtype=torch.long)

නියැදියxτi−1

141x,pred\_x0,e\_t=self.p\_sample(x,cond,ts,step,index=index,142repeat\_noise=repeat\_noise,143temperature=temperature,144uncond\_scale=uncond\_scale,145uncond\_cond=uncond\_cond)

ආපසුx0

148returnx

නියැදියxτi−1

x හැඩයෙන්xτi යුක්ත වේ[batch_size, channels, height, width]
c හැඩයේ කොන්දේසි සහිතc කාවැද්දීම් වේ[batch_size, emb_size]
t හැඩයෙන්τi යුක්ත වේ[batch_size]
step යනු පූර්ණ සංඛ්යාවක්τi ලෙස පියවරයි
indexiලැයිස්තුවේ දර්ශකය වේ[τ1,τ2,…,τS]
repeat_noise කණ්ඩායමේ සියලුම සාම්පල සඳහා ශබ්දය සමාන විය යුතුද යන්න නිශ්චිතව දක්වා ඇත
temperature යනු ශබ්දයේ උෂ්ණත්වය (අහඹු ශබ්දය මෙයින් ගුණ කරනු ලැබේ)
uncond_scale යනු කොන්දේසි විරහිත මාර්ගෝපදේශs පරිමාණයයි. මෙය භාවිතා වේϵθ(xt,c)=sϵcond(xt,c)+(s−1)ϵcond(xt,cu)
uncond_cond හිස් විමසුමක් සඳහා කොන්දේසි සහිත කාවැද්දීම වේcu

[email protected]\_grad()151defp\_sample(self,x:torch.Tensor,c:torch.Tensor,t:torch.Tensor,step:int,index:int,\*,152repeat\_noise:bool=False,153temperature:float=1.,154uncond\_scale:float=1.,155uncond\_cond:Optional[torch.Tensor]=None):

ලබා ගන්නϵθ(xτi)

172e\_t=self.get\_eps(x,t,c,173uncond\_scale=uncond\_scale,174uncond\_cond=uncond\_cond)

ගණනයxτi−1 කර පුරෝකථනය කර ඇතx0

177x\_prev,pred\_x0=self.get\_x\_prev\_and\_pred\_x0(e\_t,index,x,178temperature=temperature,179repeat\_noise=repeat\_noise)

182returnx\_prev,pred\_x0,e\_t

xτi−1ලබා දී ඇති නියැදියϵθ(xτi)

184defget\_x\_prev\_and\_pred\_x0(self,e\_t:torch.Tensor,index:int,x:torch.Tensor,\*,185temperature:float,186repeat\_noise:bool):

ατi

192alpha=self.ddim\_alpha[index]

ατi−1

194alpha\_prev=self.ddim\_alpha\_prev[index]

στi

196sigma=self.ddim\_sigma[index]

1−ατi

198sqrt\_one\_minus\_alpha=self.ddim\_sqrt\_one\_minus\_alpha[index]

සඳහා වත්මන් අනාවැකියx0,ατixτi−1−ατiϵθ(xτi)

202pred\_x0=(x-sqrt\_one\_minus\_alpha\*e\_t)/(alpha\*\*0.5)

දිශාව යොමු කරයිxt1−ατi−1−στi2⋅ϵθ(xτi)

205dir\_xt=(1.-alpha\_prev-sigma\*\*2).sqrt()\*e\_t

ශබ්දයක් එකතු නොවේ, විටη=0

208ifsigma==0.:209noise=0.

කණ්ඩායමේ සියලුම සාම්පල සඳහා එකම ශබ්දය භාවිතා කරන්නේ නම්

211elifrepeat\_noise:212noise=torch.randn((1,\*x.shape[1:]),device=x.device)

එක් එක් නියැදිය සඳහා විවිධ ශබ්ද

214else:215noise=torch.randn(x.shape,device=x.device)

උෂ්ණත්වය අනුව ශබ්දය ගුණ කරන්න

218noise=noise\*temperature

# xτi−1=ατi−1(ατixτi−1−ατiϵθ(xτi))+1−ατi−1−στi2⋅ϵθ(xτi)+στiϵτi

227x\_prev=(alpha\_prev\*\*0.5)\*pred\_x0+dir\_xt+sigma\*noise

230returnx\_prev,pred\_x0

වෙතින් නියැදියqσ,τ(xτi∣x0)

qσ,τ(xt∣x0)=N(xt;ατix0,(1−ατi)I)

x0 හැඩයෙන්x0 යුක්ත වේ[batch_size, channels, height, width]
index යනු කාල පියවරτi දර්ශකයයිi
noise ශබ්දය,ϵ

[email protected]\_grad()233defq\_sample(self,x0:torch.Tensor,index:int,noise:Optional[torch.Tensor]=None):

අහඹු ශබ්දය, ශබ්දය නිශ්චිතව දක්වා නොමැති නම්

246ifnoiseisNone:247noise=torch.randn\_like(x0)

වෙතින් නියැදියqσ,τ(xt∣x0)=N(xt;ατix0,(1−ατi)I)

252returnself.ddim\_alpha\_sqrt[index]\*x0+self.ddim\_sqrt\_one\_minus\_alpha[index]\*noise

පින්තාරු ලූප

x හැඩයෙන්xS′ යුක්ත වේ[batch_size, channels, height, width]
cond කොන්දේසි සහිත කාවැද්දීම් වේc
t_start සිට ආරම්භ කිරීමට නියැදි පියවර වේ,S′
orig යනු මුල් රූපයයි ගුප්ත පිටුව අපි පැල්ලම් කරන. මෙය සපයා නොමැති නම්, එය රූප පරිවර්තනයට රූපයක් වනු ඇත.
mask මුල් රූපය තබා ගැනීම සඳහා වෙස්මුහුණ වේ.
orig_noise මුල් රූපයට එකතු කළ යුතු ස්ථාවර ශබ්දය.
uncond_scale යනු කොන්දේසි විරහිත මාර්ගෝපදේශs පරිමාණයයි. මෙය භාවිතා වේϵθ(xt,c)=sϵcond(xt,c)+(s−1)ϵcond(xt,cu)
uncond_cond හිස් විමසුමක් සඳහා කොන්දේසි සහිත කාවැද්දීම වේcu

[email protected]\_grad()255defpaint(self,x:torch.Tensor,cond:torch.Tensor,t\_start:int,\*,256orig:Optional[torch.Tensor]=None,257mask:Optional[torch.Tensor]=None,orig\_noise:Optional[torch.Tensor]=None,258uncond\_scale:float=1.,259uncond\_cond:Optional[torch.Tensor]=None,260):

කණ්ඩායම් ප්රමාණය ලබා ගන්න

276bs=x.shape[0]

නියැදි කිරීමට කාල පියවරτS‘,τS′−1,…,τ1

279time\_steps=np.flip(self.time\_steps[:t\_start])280281fori,stepinmonit.enum('Paint',time\_steps):

iලැයිස්තුවේ දර්ශකය[τ1,τ2,…,τS]

283index=len(time\_steps)-i-1

පියවර වේලාවτi

285ts=x.new\_full((bs,),step,dtype=torch.long)

නියැදියxτi−1

288x,\_,\_=self.p\_sample(x,cond,ts,step,index=index,289uncond\_scale=uncond\_scale,290uncond\_cond=uncond\_cond)

වෙස් ගත් ප්රදේශය මුල් රූපය සමඟ ප්රතිස්ථාපනය කරන්න

293iforigisnotNone:

ගුප්ත අවකාශයේ මුල් රූපයqσ,τ(xτi∣x0) සඳහා ලබා ගන්න

295orig\_t=self.q\_sample(orig,index,noise=orig\_noise)

වෙස්ගත් ප්රදේශය ප්රතිස්ථාපනය කරන්න

297x=orig\_t\*mask+x\*(1-mask)

300returnx

Trending Research Papers labml.ai

Denoising විසරණ ව්යංග ආකෘති (DDIM) නියැදීම

Denoising විසරණ ව්යංග ආකෘති (DDIM) නියැදීම

ඩීඩීඅයිඑම් නියැදිකරු

නියැදි ලූප

නියැදියxτi−1​​

xτi−1​​ලබා දී ඇති නියැදියϵθ​(xτi​​)

වෙතින් නියැදියqσ,τ​(xτi​​∣x0​)

පින්තාරු ලූප

නියැදියxτi−1

xτi−1ලබා දී ඇති නියැදියϵθ(xτi)

වෙතින් නියැදියqσ,τ(xτi∣x0)