Oracle has released a new version of Heatwave, anin-memory MySQL database accelerator with improved machine learning. HeatWave ML is an in-memory analytical query accelerator connected to the Inno DB engine across the MySQL database. This means that analytical (OLAP) and transactional (OLTP) processing can be carried out from a single database. The new version is now about natively supporting machine
learning directly in the database.
“The problem with machine learning is that customers need to extract their data from MySQL before processing it,” said Nipun Argawal, senior vice president, MySQL DB and HeatWave at Oracle. “In the past, customers have faced the same disadvantages when analysing. Once the data leaves the database, it’s not secure, that brings a form of complexity to the application, and you have to store the data elsewhere and run those machine earning models from other services, which costs more.”HeatWave ML is intended to enable the training,inferring, and explain machine learning models. Oracle references several patents to clean up and standardise the features needed to train a model,automatically choose an algorithm, select the right sample from the data set,identify the optimal hyper parameters, generate explanations and train the models required. HeatWave ML does all of this in a single pass, Nipun Argawal points out.
“Explainability is a very important concept for customers. A bank wants to understand the model used in production and at the same time be able to explain to its customers why a loan is approved, or a transaction is rejected,” says Argawal. In addition to HeatWave ML, Oracle has also implemented the ability to self-scale. Its new data compression system would reduce data storage costs by nearly 50 per cent or store twice the amount of data per node.
In addition, HeatWave environments can be paused and restarted at the desired time with the “pause-and-resume” function to reduce costs. In practice, the service stores the data in an object store, the computing resources are prepared at startup, and the data is restored at the restart. HeatWave ML offers the following capabilities compared to
other cloud database services:
Fully automated model building All phases of model building with HeatWave ML are fully automated and require no developer intervention. The result is a tuned model that is more accurate, requires no manual work, and the training process is always complete. Other cloud database services like Amazon Redshift offer integration with machine learning capabilities in external services that require extensive manual input from developers during the ML training process.
Model and Inference Explanations
Model explain ability helps developers understand the behaviour of a machine learning model. For example, if a bank denies a customer a loan, the bank needs to determine which model parameters have been taken into account or whether the model contains bias. Prediction explain ability is a set of techniques that help answer why a machine learning model made a particular prediction. Explanations of predictions are becoming increasingly important these days as companies need to be able to explain the decisions made by their machine learning models. HeatWave ML integrates model and prediction explanations as part of model training. As a result, all model screated by HeatWave ML can offer both model and inference explanations without training data at the time of inference explanation. Oracle has enhanced explanation techniques to improve performance, interpretability, and quality. Other cloud database services do not provide such a rich explanation for all of their machine learning models.
HeatWave ML implements a new gradient-search based reduction algorithm for hyper-parameter tuning. This allows the hyperparametersearch to run in parallel without affecting model accuracy. Hyperparameter tuning is the most time-consuming phase of ML model training. This unique capability gives HeatWave ML a significant performance advantage over othercloud machine learning model building services.
Algorithm Selection: HeatWave ML uses the concept of
proxy models – simple models that exhibit the properties of a full complexmodel – to determine the best ML algorithm for training. No other databaseservice for building machine learning models has this proxy modelling capability.
Intelligent Data Sampling
HeatWave ML samples a small percentage of the data during model training to improve performance. This sampling is performed to collect all representative data points in the sample data set. Other cloud services for building machine learning models take a less efficient approach – using random data sampling – in which a small percentage of the data is sampled without considering data distribution characteristics.
Feature selection helps determine the attributes of the training data that affect the behaviour of the machine learning model when making predictions. The techniques in HeatWave ML for feature selection have been trained over a wide range of datasets in different domains and applications. Based on the collected statistics and meta information, HeatWave ML can identify the relevant features in a new data set efficiently