EDUCAÇÃO E TECNOLOGIA

Solving regression use-cases with Data Attribute Recommendation

In this post, I will give an overview of the new regression template which was recently added to Data Attribute Recommendation. In case you would like to learn more about the service and its features, I recommend reading the following blogposts as well: 

Regression in machine learning

In machine learning the term regression denotes the task of predicting a continuous outcome based on the value of one or more independent variables (also called explanatory variables). A continuous outcome is a real value, such as an integer or floating-point value. An example of a regression use-case is demand forecasting, where the goal is to predict customer demand based on historical sales data. Accurate demand forecasting enables organizations to make data-driven decisions and optimize their production planning and inventory management.

In cases where categorical predictions might be relevant, Data Attribute Recommendation offers several classification templates which target different scenarios. For more details, refer to the list of available model templates.

How to use the regression template in Data Attribute Recommendation 

Data Attribute Recommendation is a versatile machine learning service, as it allows users to train and deploy machine learning models for a variety of different use-cases, covering classification tasks and now regression problems, too. The workflow is quite simple and applies to classification as well as regression scenarios. In this post, I will briefly summarize the main steps involved, but you can refer to this tutorial for more details.

The user must first define a dataset schema that consists of features and labels (for regression a single label of type NUMBER is allowed), and then upload a dataset which is consistent with such schema. Once the training data is uploaded, the user can start the training process by specifying the ID of the regression template and the desired dataset. The result of the training process is a model which can be used to make predictions for new records.
Once the model has been trained successfully, the user will be able to assess the model quality using different metrics. For regression, Data Attribute Recommendation reports the following metrics, where lower values indicate a better model performance: 

  • Mean Absolute Error (MAE): Average absolute difference between the predicted values and the actual values.
  • Mean Absolute Percentage Error (MAPE): Average of the absolute percentage errors of the predictions.
  • Mean Squared Error (MSE): Average squared difference between the predicted values and the actual values. This is the target metric that is optimized during training. 

In case you need further support with these or other machine learning concepts, please refer to the service help page.

Insights on the regression template

The regression template available in Data Attribute Recommendation leverages a neural network with dropout layers which are active not only during the training phase but at inference too. This allows the service to sample several predictions, which are inherently stochastic, and estimate the model uncertainty by returning the average (value) and the standard deviation (std) of the sampled predictions. The larger the standard deviation compared to the magnitude of the average, the lower the confidence of the model. This helps understand the model behavior and facilitate the integration of the service into other applications and solutions. For example, assume that we are interested in predicting a product rating (which could be any real value between 1 and 5) from its review. 

Example%20of%20confident%20and%20uncertain%20responses.

Example of confident and uncertain responses.

Since the standard deviation represents how spread out the sampled predictions are from their average, we can clearly see that the response on the left-hand side is noticeably confident, as the sampled predictions were well concentrated around the mean value. On the other hand, the response on the right-hand side is more uncertain, as the standard deviation is significantly larger, indicating that the sampled predictions were quite different from each other. This could suggest that the given product review was particularly difficult to interpret, or that additional labeling is needed to improve the model performance. 

Final remarks

In this post I presented the new regression template available in Data Attribute Recommendation and showed some of its features. If you would like to read more about Data Attribute Recommendation and be notified of future posts, please follow the tag Data Attribute Recommendation. 

Useful resources