Machine Learning in Production - Concepts you should know
To productionize machine learning, know the concepts first
Are you interested in learning about the Machine Learning side of data? Hurry 🎉 , you have reached the right place to start learning about it.
Here is a list of concepts for you to get started:
ML algorithm is a procedure that runs on data and produces a machine learning model. Some of the popular ones are Decision trees, Naive Bayes, and Linear Regression.
ML model is the ML algorithm process outcome; It often contains a statistical representation of the data ingested into the algorithm. ML model input is data, and the output is either a prediction, decision, or classification.
The training set is the data ingested into the machine learning algorithm; it trains the ML model.
The testing set is the dataset we test the ml model with. To test the ML model’s accuracy, we ingest the data into the model and measure the accuracy level of the outcome. It helps us reason about the quality of the machine learning model.
Machine Learning pipeline
The machine learning pipeline is an automation process of the machine learning workflow. It includes data transformation and correlation to fit the ML algorithm, running the algorithm to produce a model, and testing it with a test set.
ML Model interpretability is the degree to which a human can reason the machine learning model’s output. The higher the degree, the easier it is for a human to understand the model’s decision or prediction.
Data quality measures the data’s condition based on accuracy, precision, legitimacy, validity, reliability, consistency, completeness, and more. In machine learning, data quality is important for producing high-quality, non-bias machine learning models.
Data drift is unexpected and undocumented changes to the data structure, semantics. Data drift can result in corrupted data and data low quality. Lack of awareness of data drift can result in a lesser quality of ML models.
Concept drift refers to the changes in target variables. Target variables are the outcomes of the prediction process you do with machine learning models. You can detect concept drift by measuring the statistical properties of the target variables. Machine learning’s actual target variable can change over time in unforeseen ways and presents a challenge since the predictions become less accurate as time passes.
I hope it was helpful for you and gave you more clarity about the concepts.
💡 Curious to learn more?
Read here about how to create machine learning models with python.