Google’s Simple ML has been released in beta for Sheets users. What could this mean for your big datasets?
Last week, Google announced and released a beta version of Simple ML for Sheets, a TensorFlow Decision Forests-produced add-on for Google Sheets. This release is one of the first of its kind, offering many simple and some complex machine learning functionalities directly to Google Sheets users.
SEE: Hiring kit: Machine learning engineer (TechRepublic Premium)
Although Simple ML has been touted as the machine learning solution for people with no prior knowledge of machine learning, the Advanced Tasks it offers promise value to data scientists, machine learning experts and anyone else working with bigger datasets. Read on to learn more about this release and how it may shape spreadsheet-based data and machine learning projects in the future.
Fast facts about the Simple ML release
Simple ML for Sheets is currently available in beta. The Google Sheets add-on was created by a group of TensorFlow developers to make machine learning accessible to Sheets users, even if they have no previous machine learning knowledge. This is primarily achieved through pretrained ML models and other no-code features.
SEE: Research: Increased use of low-code/no-code platforms poses no threat to developers (TechRepublic Premium)
This machine learning add-on has been designed to support two primary ML tasks: predicting missing values and spotting abnormal values. However, Simple ML for Sheets can also be used for more advanced use cases, like training, evaluating and analyzing ML models. Particularly for data scientists and more advanced users who want to run Simple ML to make predictions, Simple ML’s Advanced Tasks will likely need to be used.
Simple ML’s most compelling features include:
- Beginner Tasks for automated and simple ML functionality
- Advanced Tasks for ML model training and management
- Model training via WebAssembly in browser
- Support for prototyping tabular datasets
- Model exporting for TensorFlow, Colab and TensorFlow Serving
- No data sharing with third parties
- Models saved to Google Drive for easy access and sharing
How does Simple ML work?
Once Simple ML for Sheets is installed in your add-on library, it can be used to predict missing values and identify abnormal values in a dataset. Users will start by opening their data in Google Sheets and selecting which of those two tasks is the best fit for their project.
After making their selection, users should run that task; they can expect to have Simple ML’s statistical predictions back in a few seconds.
For predicting missing values, Simple ML trains a model on the non-missing values provided in a dataset. For identifying abnormal values, Simple ML trains a set of models with cross-validation to predict the values currently there. Then, based on how the actual data and predicted data differ, Simple ML will identify abnormal parts of the dataset and provide an abnormality probability score between 0% and 100%.
SEE: Machine learning: A cheat sheet (TechRepublic)
From there, users can review the ML-generated model and use it as a guide for any changes they need to make to their dataset.
Models are initially saved in a Google Drive folder called simple_ml_for_sheets. For Simple ML to work appropriately, users will need to update their settings, so Simple ML has the following permissions:
- See, edit, create and delete all Google Drive files
- See, edit, create, and delete all Google Sheets spreadsheets
- Display and run third-party web content prompts and sidebars inside Google applications
Tips and tricks for using Simple ML
Although Simple ML is quick and fairly accurate, it’s still important for users to understand how to set up their data and read the newly generated model for success.
Firstly, users need to understand that predictive ML analysis is only possible if a large enough dataset is provided for model training. At least 20 lines of data need to be present for a worthwhile model, but 100+ lines of data is preferable and more likely to create an accurate model.
Also, in general, it’s important to remember that the predictive data generated by Simple ML models is just that — predictive. While it can come close to the true missing data values, it’s important for teams of data science professionals to review the model before filling in the gaps.
How to install Simple ML
To install Simple ML for Sheets, users should visit the Extensions tab, hover over the Add-ons options and click Get add-ons. From there, it is a fairly straightforward process to search for and install Simple ML.
Using Simple ML for big data-driven projects
Although Simple ML truly is simple and focused on a less ML-savvy clientele, big data and machine learning experts alike can use this tool to manage and draw further insights from their datasets and existing models. The tool is flexible enough to manage very large datasets, allowing users to run models for millions of data lines without SQL queries. It’s also an advantageous add-on for Google BigQuery users, because Simple ML is able to analyze data in instances of this cloud data warehouse.
SEE: Cloud data warehouse guide and checklist (TechRepublic Premium)
So how exactly can Simple ML be leveraged for more complex big data projects? Briefly, here are some of the Advanced Task options Simple ML offers for this kind of user:
- Train a model: With this task, users can train their own machine learning models with training data values they provide in tabular format.
- Make predictions: This task predicts column values in every row, rather than just missing values, based on an already-trained model.
- Evaluate a model: This task measures trained model quality based on the labels and metrics that were used to train the model. If it’s a categorically labeled model, this task will primarily measure accuracy; if it’s based on a numerically labeled model, regression metrics like RMSE will be the focus.
- Understand a model: With this task, users can learn all kinds of facts about a previous model. The model-understanding window offers information on training date, target and source columns, quality, columnar statistics, important input features, and predictions.
Making Simple ML work for complex use cases
For the simple operations Simple ML is mostly designed for, users shouldn’t have any problems with processing data and generating models quickly. However, as is the case with many tools as inputs scale, new issues could arise with bigger datasets.
For example, extremely large datasets can require multiple minutes rather than seconds for a model to be trained or predictions to be generated. The processing time may be even higher for datasets that contain text or other unstructured data.
That being said, Simple ML is still in beta and optimizations are being made regularly. The Simple ML team is open to new test users as well as algorithm suggestions, so now is the time for data scientists to learn how this tool works and how it could be incorporated into business operations.
Read next: Top data modeling tools (TechRepublic)