Back to Machinelearning

ML.NET 4.0

docs/release-notes/4.0/release-4.0.md

5.0.016.6 KB
Original Source

ML.NET 4.0

New Features

  • Add sweepable estimator to NER (6965)
  • Introducing Tiktoken Tokenizer (6981)
  • Add text normalizer transformer to AutoML (6998)
  • Introducing Llama Tokenizer (#7078)
  • Introducing CodeGen Tokenizer (#7139)
  • Support Gpt-4o tokenizer model (#7157)
  • Add GenAI core package (#7177)
  • Use new System.Numerics.Tensors library for DataFrame arithmetic operations (.net8) (#7179) - Thanks @asmirnov82!
  • Add Microsoft.ML.GenAI.Phi (#7184)
  • [GenAI] Add LLaMA support (#7220)
  • [GenAI] Support Llama 3.2 1B and 3B model (#7245)
  • [GenAI] Introduce CausalLMPipelineChatClient for MEAI.IChatClient (#7270)
  • Can now set advanced runtime settings in the MLContext. (#7273)
  • Introducing WordPiece and Bert tokenizers (#7275)

Enhancements

  • Add support for Apache.Arrow.Types.TimestampType to DataFrame (6871) - Thanks @asmirnov82!
  • Add new type to key-value converter (6973)
  • Update OnnxRuntime to 1.16.3 (6975)
  • Tokenizer's Interfaces Cleanup (7001)
  • Match SweepableEstimatorFactory name with Ml.net name. (7007)
  • First round of perf improvements for tiktoken (7012)
  • Tweak CreateByModelNameAsync (7015)
  • Avoid LruCache in Tiktoken when cacheSize specified is 0 (7016)
  • Tweak Tiktoken's BytePairEncode for improved perf (7017)
  • Optimize regexes used in tiktoken (7020)
  • Address the feedback on the tokenizer's library (7024)
  • Add Span support in tokenizer's Model abstraction (7035)
  • Adding needed Tokenizer's APIs (7047)
  • Add Tiktoken Synchronous Creation Using Model Name (#7080)
  • Embed Tiktoken data files (#7098)
  • Tokenizer's APIs Polishing (#7108)
  • More tokenizer's APIs cleanup (#7110)
  • Add more required Tokenizer APIs (#7114)
  • Tokenizer's APIs Update (#7128)
  • Allow developers to supply their own function to infer column data types from data while loading CSVs (#7142) - Thanks @sevenzees!
  • Implement DataFrameColumn Apply and DropNulls methods (#7123) - Thanks @asmirnov82!
  • Extend dataframe orderby method to allow defining preferred position for null values (#7118) - Thanks @asmirnov82!
  • Implement ToString() method for DataFrameColumn class (#7103) - Thanks @asmirnov82!
  • Added error handling, removed unwanted null check and enhanced readability (#7147) - Thanks @ravibaghel!
  • Add targeting .Net 8.0 for DataFrame package (#7168) - Thanks @asmirnov82!
  • create unique temporary directories to prevent permission issues (#7173) - Thanks @ErikApption!
  • Tokenizer APIs Update (#7190)
  • Make most Tokenizer abstract methods virtual (#7198)
  • Reduce Tiktoken Creation Memory Allocation (#7202)
  • Refactor Namespace and Seald Classes in Microsoft.ML.AutoML.SourceGenerator Project (#7223) - Thanks @mhshahmoradi!
  • [GenAI] Add generateEmbedding API to CausalLMPipeline (#7227)
  • [GenAI] Add Mistral 7B Instruction V0.3 (#7231)
  • Move the Tokenizer's data into separate packages. (#7248)
  • Load onnx model from Stream of bytes (#7254)
  • Update tiktoken regexes (#7255)
  • Misc Changes (#7264)
  • Address the feedback regarding Bert tokenizer (#7280)
  • Add Timeout to Regex used in the tokenizers (#7284)
  • Final tokenizer's cleanup (#7291)

Bug Fixes

  • Fix formatting that fails in VS (7023)
  • Issue #6606 - Add sample variance and standard deviation to NormalizeMeanVariance (6885) - Thanks @tearlant!
  • Rename NameEntity to NamedEntity (#6917)
  • Fixes NER to correctly expand/shrink the labels (#6928)
  • fix #6949 (#6951)
  • Fix DataFrame NullCount property of StringDataFrameColumn (#7090) - Thanks @asmirnov82!
  • Fix Logical binary operations not supported exception (#7093) - Thanks @asmirnov82!
  • Fix inconsistency in DataFrameColumns Clone API implementation (#7100) - Thanks @asmirnov82!
  • Add Tiktoken's missing model names (#7111)
  • Accessing data by column after adding columns to a DataFrame returns error data (#7136) - Thanks @feiyun0112!
  • Fix iterator type so that it matches boundary condition type (#7150)
  • Fix crash in Microsoft.ML.Recommender with validation set (#7196)
  • Fix #7203 (#7207)
  • Fix decoding special tokens in SentencePiece tokenizer (#7233)
  • Fix dataframe incorrectly parse CSV when renameDuplicatedColumns is true (#7242) - Thanks @asmirnov82!
  • Fixes #7271 AOT for ML.Tokenizers (#7272) - Thanks @euju-ms!

Build / Test updates

  • [main] Update dependencies from dotnet/arcade (#6703)
  • Migrate to the 'locker' GitHub action for locking closed/stale issues/PRs (6896)
  • Reorganize dataframe files (6872) - Thanks @asmirnov82!
  • Updated ml.net versioning (6907)
  • Don't include the SDK in our helix payload (6918)
  • Make double assertions compare with tolerance instead of precision (6923)
  • Fix assert by only accessing idx (6924)
  • Only use semi-colons for NoWarn - fixes build break (6935)
  • Packaging cleanup (6939)
  • Add Backport github workflow (6944)
  • [main] Update dependencies from dotnet/arcade (6957)
  • Update .NET Runtimes to latest version (6964)
  • Testing light gbm bad allocation (6968)
  • [main] Update dependencies from dotnet/arcade (6969)
  • [main] Update dependencies from dotnet/arcade (6976)
  • FabricBot: Onboarding to GitOps.ResourceManagement because of FabricBot decommissioning (6983)
  • [main] Update dependencies from dotnet/arcade (6985)
  • [main] Update dependencies from dotnet/arcade (6995)
  • Temp fix for the race condition during the tests. (7021)
  • Make MlImage tests not block file for reading (7029)
  • Remove SourceLink SDK references (7037)
  • Change official build to use 1ES templates (7048)
  • Auto-generated baselines by 1ES Pipeline Templates (7051)
  • Update package versions in use by ML.NET tests (7055)
  • testing arm python brew overwite (7058)
  • Split out non concurrent test collections. (#6937)
  • [release/3.0] Update dependencies from dotnet/arcade (#6938)
  • Branding for 3.0.1 (#6943)
  • Add Backport github workflow (#6944)
  • Torch sharp version updates and test fixes (#6954)
  • [main] Update dependencies from dotnet/arcade (#6957)
  • Working on memory issue during tests for TorchSharp (#7022)
  • M1 helix testing (#7033)
  • [main] Update dependencies from dotnet/arcade (#7052)
  • [main] Update dependencies from dotnet/arcade (#7075)
  • Reenable log publishing (#7076)
  • [main] Update dependencies from dotnet/arcade (#7079)
  • Update VMs (#7087)
  • Don't trigger PR validation builds for docs only changes (#7096)
  • Add CodeQL exclusions file (#7105)
  • Don't use deprecated -pt images (#7131)
  • Update locker.yml (#7133)
  • [main] Update dependencies from dotnet/arcade (#7138)
  • Try enabling TSA scan during build (#7149)
  • [main] Update dependencies from dotnet/arcade (#7151)
  • Remove Codeql.SourceRoot (#7155)
  • [main] Update dependencies from dotnet/arcade (#7161)
  • [main] Update dependencies from dotnet/arcade (#7165)
  • Add a stub packageSourceMapping (#7171)
  • update torchsharp and helix image (#7188)
  • Publish source index directly from repo (#7189)
  • Add package readmes (#7200)
  • Update dependency versions. (#7216)
  • [main] Update dependencies from dotnet/arcade (#7218)
  • Directly refer sql data client 4.8.6 package in GenAI tests to fix security vulnerable package (#7228)
  • [main] Update dependencies from dotnet/arcade (#7235)
  • docs: update nuget package badge (#7236) - Thanks @WeihanLi!
  • [GenAI] Enable pack (#7237)
  • [GenAI] pack GenAI core package (#7246)
  • Enable SDL tools (#7247)
  • Add Service Tree ID for .NET Libraries (#7252)
  • fixing apple silicon official build (#7278)
  • fixing osx ci (#7279)
  • Fixing native lookup (#7282)
  • Add the components governance file cgmanifest.json for tokenizer's vocab files (#7283)
  • Update To MacOS 13 (#7285)
  • Updated remote executor (#7295)
  • Update dependencies from maintenance-packages to latest versions (#7301)

Documentation Updates

  • Update developer-guide.md (6870) - Thanks @computerscienceiscool!
  • Update release-3.0.0.md (6895) - Thanks @taeerhebend!
  • Update branding for 3.0.2 (#6970)
  • Add release notes for 4.0-preview1 (#7064)
  • Update readmes for Tokenizers and Microsoft.ML (#7070)
  • Adding migration guide for deepdev (#7073)
  • Update PACKAGE.md to include Llama info (#7104)
  • Update the tokenizer migration guide (#7109)
  • add document for GenAI (#7170)
  • [GenAI] Add readme to Microsoft.ML.GenAI.Phi (#7206)
  • Update wording in LDA docs (#7253)