ML.NET is an open-source, cross-platform machine learning framework for .NET developers that enables integration of custom machine learning into .NET apps.

We are excited to update you on what we’ve been working on over the past few months.

AutoML Updates

 

## djffd

測試

@@ 

Training machine learning models is a time-consuming and iterative task. Automated Machine Learning (AutoML) automates that process by making it easier to find the best algorithm for your scenario and dataset. AutoML is the backend that powers the training experiences in Model Builder and the ML.NET CLI. Last year we announced updates to the AutoML implementation in our Model Builder and ML.NET CLI tools based Neural Network Intelligence (NNI) and Fast and Lightweight AutoML (FLAML) technologies from Microsoft Research. These updates provided a few benefits and improvements over the previous solution which include:

  • Increase in the number of models explored.
  • Improved time-out error rate.
  • Improved performance metrics (for example, accuracy and R-squared).

Until recently, you could only take advantage of these AutoML improvements inside of our tools.

We’re excited to announce that we’ve integrated the NNI / FLAML implementations of AutoML into the ML.NET framework so you can use them from a code-first experience.

To get started today with the AutoML API install the latest pre-release version of the Microsoft.ML and Microsoft.ML.Auto NuGet packages using the ML.NET daily feed.

https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json

The Experiment API

An experiment is a collection of training runs or trials. Each trial produces information about itself such as:

  • Evaluation metrics: The metrics used to assess the predictive capabilities of a model.
  • Pipeline: The algorithm and hyperparameters used to train a model.

The experiment API provides you with a set of defaults for AutoML making it simpler for you to add to your training pipeline.

// Configure AutoML pipeline
var experimentPipeline =    
    dataPrepPipeline
        .Append(mlContext.Auto().Regression(labelColumnName: "fare_amount"));

// Configure experiment
var experiment = mlContext.Auto().CreateExperiment()
                   .SetPipeline(experimentPipeline)
                   .SetTrainingTimeInSeconds(50)
                   .SetDataset(trainTestSplit.TrainSet, validateTestSplit.TrainSet)
                   .SetEvaluateMetric(RegressionMetric.RSquared, "fare_amount", "Score");

// Run experiment
var result = await experiment.Run();

In this code snippet, the dataPrepPipeline is the series of transforms to get the data into the right format for training. The AutoML components to train a regression model are appended onto that pipeline. The same concept applies for other supported scenarios like classification.

When you create an experiment with the training pipeline you’ve defined, among the settings you can customize are how long you train for, training and validation sets, and the evaluation metric you’re optimizing for.

Once your pipeline and experiment are defined, call the Run method to start training.

Search Spaces and Sweepable Estimators

If you need more control over the hyperparameter search space, you can define your search space and add it to your training pipeline using a sweepable estimator.

// Configure Search Space
var searchSpace = new SearchSpace<LgbmOption>();

// Initialize estimator pipeline 
var sweepingEstimatorPipeline =
    dataPrepPipeline
        .Append(mlContext.Auto().CreateSweepableEstimator((context, param) =>
                 {
                     var option = new LightGbmRegressionTrainer.Options()
                     {
                         NumberOfLeaves = param.NumberOfLeaves,
                         NumberOfIterations = param.NumberOfTrees,
                         MinimumExampleCountPerLeaf = param.MinimumExampleCountPerLeaf,
                         LearningRate = param.LearningRate,
                         LabelColumnName = "fare_amount",
                         FeatureColumnName = "Features",
                         HandleMissingValue = true
                     };

                     return context.Regression.Trainers.LightGbm(option);
                 }, searchSpace));

The search space defines a range of hyperparameters to search from.

Sweepable estimators enables you to use the search space inside an ML.NET pipeline just like you would any other estimator.

To create and run the experiment, you go through the same process of using the CreateExperiment and Run methods.

Model Builder and ML.NET CLI Updates

We’ve made several updates to Model Builder and the ML.NET CLI. Two of the ones I want to highlight are:

  • Time Series forecasting scenario in Model Builder
  • New version of the .NET CLI

Time Series Forecasting Scenario (Preview)

Time-series forecasting is the process of identifying patterns in time-dependent observations and making predictions several periods into the future. Examples of real-world use cases are:

  • Forecasting product demand
  • Forecasting energy consumption

In ML.NET, choosing the trainer for time-series forecasting isn’t too difficult because you only have one choice, ForecastBySsa. The hard part comes when finding the parameters such as the time window to analyze and how far to predict into the future. Finding the right parameters is an experimental process, making it an excellent job for AutoML. Updates to our AutoML implementation make it possible to intelligently search through hyperparamters simplifying the process of training a time-series forecasting model.

As a result of these efforts, we’re excited to share that you can now train time-series forecasting models in Model Builder.

Model Builder evaluation screen for forecasting scenario

Download or update to the latest version of Model Builder to start training your time-series

 

Comments

Post a comment