In thisguide, we’ll takea practical, concise tour through modern machine learning algorithms. While other such listsexist, they don’t really explain thepractical tradeoffs of each algorithm, which we hope to do here. We’ll discuss the advantages and disadvantages of each algorithm based on our experience.
Categorizing machine learning algorithms is tricky, and there are several reasonable approaches; theycan be grouped into generative/discriminative, parametric/non-parametric, supervised/unsupervised, and so on.
For example,Scikit-Learn’s documentation page groups algorithms by theirlearning mechanism.This producescategories such as:
- Generalized linear models
- Support vector machines
- Nearest neighbors
- Decision trees
- Neural networks
- And so on…
However, from our experience, thisisn’t always the most practical way to group algorithms. That’s because for applied machine learning, you’re usuallynot thinking,“boy do I want to train a support vector machine today!”
Instead, you usually haveanend goal in mind, such as predictingan outcome or classifying your observations.
Therefore, we want to introduce anotherapproach to categorizing algorithms, which isbymachine learningtask.
No Free Lunch
In machine learning, there’ssomething called the “No Free Lunch” theorem. In a nutshell, it states that no one algorithmworks best for every problem, and it’s especially relevant for supervised learning (i.e. predictive modeling).
For example, you can’t saythat neural networks are always better than decision trees or vice-versa. There are many factors at play, such as the size and structure of your dataset.
As a result, you shouldtry many different algorithms for your problem, while using ahold-out “test set” of data to evaluate performance and select the winner.
Of course, the algorithms you try must be appropriate foryour problem, which is where picking the right machine learning task comes in. As an analogy, if you need toclean your house, you might usea vacuum, a broom, or a mop, but you wouldn’t bust out ashovel and start digging.
Machine Learning Tasks
This is Part 1 of this series. In this part, we will cover the “Big 3” machine learning tasks, which are by far the most common ones. They are:
In Part 2: Dimensionality Reduction Algorithms, we will cover:
- Feature Selection
- Feature Extraction
Twonotes before continuing:
- We will not cover domain-specific adaptations, such as natural language processing.
- We will not cover every algorithm. There are too many to list, andnew ones pop up all the time. However, this listwillgive you arepresentative overview of successful contemporaryalgorithms foreach task.
Regression is the supervised learning task for modeling and predicting continuous, numeric variables.Examples include predicting real-estateprices, stock price movements, or student test scores.
Regression tasks are characterized bylabeled datasets that have a numeric target variable. In other words, you have some “ground truth” value for eachobservation that you can use to supervise your algorithm.
1.1. (Regularized) Linear Regression
Linear regression is one of the most common algorithms for the regression task. In its simplest form, it attempts to fit a straight hyperplane to your dataset (i.e. a straight line when you only have 2 variables).As you might guess, it works well when there arelinear relationships between the variables in your dataset.
In practice, simple linear regression is oftenoutclassed by its regularized counterparts (LASSO, Ridge, and Elastic-Net). Regularization is a technique for penalizing large coefficients in order to avoid overfitting, and the strength of the penalty should be tuned.
- Strengths:Linear regression is straightforward to understand and explain,and can be regularizedto avoid overfitting. In addition, linear models can be updated easily with new data using stochasticgradient descent.
- Weaknesses:Linear regression performs poorlywhen there are non-linear relationships.They are not naturally flexible enough to capture more complex patterns, and adding the right interaction terms or polynomials can be tricky and time-consuming.
- Implementations:Python/ R
1.2. Regression Tree (Ensembles)
Regression trees (a.k.a. decision trees) learn in a hierarchical fashion by repeatedly splitting your dataset into separate branches that maximize the information gain of each split. This branching structure allows regression trees to naturally learn non-linear relationships.
Ensemble methods, such as Random Forests (RF) and Gradient Boosted Trees (GBM), combine predictions from many individual trees. We won’t go into their underlying mechanics here, but in practice, RF’softenperform very well out-of-the-box while GBM’sare harder to tune but tend to have higher performance ceilings.
- Strengths:Decision trees can learn non-linear relationships, and are fairly robust to outliers. Ensembles perform very well in practice, winningmany classical (i.e. non-deep-learning) machine learning competitions.
- Weaknesses:Unconstrained, individual trees are prone to overfitting because they can keep branching until they memorize the training data. However, this can be alleviated by using ensembles.
- Implementations:Random Forest –Python/ R, Gradient Boosted Tree –Python / R
1.3. Deep Learning
Deep learning refers tomulti-layer neural networks that can learn extremely complex patterns. They use “hidden layers” between inputs and outputs in order to modelintermediary representations of the datathat other algorithms cannot easily learn.
They have several important mechanisms, such as convolutions and drop-out, that allows them to efficiently learn from high-dimensional data. However, deep learning still requires much more data to train compared to other algorithms because the models have orders of magnitudes more parameters to estimate.
- Strengths:Deep learningis the current state-of-the-art for certain domains, such as computer vision and speech recognition. Deepneural networks perform very well on image, audio, and text data, and they can be easily updated with new data using batch propagation. Their architectures (i.e. number and structure of layers) can be adapted to many types of problems, and their hidden layers reduce the need for feature engineering.
- Weaknesses:Deep learning algorithms are usually not suitableas general-purpose algorithms because they require a very large amount of data. In fact, they are usually outperformed by tree ensembles for classical machine learning problems. In addition, they are computationally intensive to train, and they require much more expertise to tune (i.e. set the architectureand hyperparameters).
- Implementations:Python/ R
1.4. Honorable Mention: Nearest Neighbors
Nearest neighbors algorithms are “instance-based,” which means that that save each training observation. They then makepredictions for new observations by searching for the most similar training observations and pooling their values.
These algorithms are memory-intensive, perform poorlyfor high-dimensional data, and require a meaningful distance function to calculate similarity. In practice, trainingregularized regression or tree ensembles are almost always better uses of your time.
Classification is the supervised learning task for modeling and predicting categorical variables. Examples include predicting employee churn, email spam, financial fraud, or student letter grades.
As you’ll see,many regression algorithms have classification counterparts. Thealgorithms areadapted to predict a class (or class probabilities) instead of real numbers.
2.1. (Regularized) Logistic Regression
Logistic regression is the classification counterpart to linear regression. Predictions are mapped to bebetween 0 and 1 through the logistic function, which means that predictionscan be interpreted as class probabilities.
Themodels themselves are still “linear,” sotheywork well when your classes arelinearly separable (i.e.they can be separated by a single decision surface). Logistic regression can also be regularized by penalizing coefficients with a tunable penalty strength.
- Strengths:Outputs havea nice probabilistic interpretation, and the algorithm can be regularizedto avoid overfitting. Logistic models can be updated easily with new data using stochasticgradient descent.
- Weaknesses:Logistic regression tends to underperformwhen there are multiple or non-linear decision boundaries. They are not flexible enough to naturally capture more complex relationships.
- Implementations:Python/ R
2.2. ClassificationTree (Ensembles)
Classification trees are the classification counterparts to regression trees. They are both commonly referred to as “decision trees” or by the umbrella term “classification and regressiontrees (CART).”
- Strengths:Aswithregression, classificationtree ensembles also perform very well in practice. Theyare robust to outliers, scalable, and able to naturally model non-linear decision boundaries thanks to their hierarchical structure.
- Weaknesses: Unconstrained, individual trees are prone to overfitting, but this can be alleviated by ensemble methods.
- Implementations:Random Forest – Python / R, Gradient Boosted Tree –Python / R
2.3. Deep Learning
To continue the trend, deep learning is also easily adapted to classification problems. In fact,classification is often the more common use of deep learning, such as in image classification.
- Strengths:Deep learning performs very well when classifyingfor audio, text, and image data.
- Weaknesses:As withregression, deep neural networks require very large amounts of data to train, soit’s not treated as a general-purpose algorithm.
- Implementations:Python/ R
2.4. Support Vector Machines
Support vector machines (SVM) use a mechanism called kernels, which essentially calculatedistancebetween two observations. The SVM algorithm then finds a decision boundary that maximizes the distance between the closest members of separate classes.
For example, an SVM with a linear kernel is similar tologistic regression. Therefore, in practice, the benefit of SVM’s typicallycomes from using non-linear kernels to model non-linear decision boundaries.
- Strengths:SVM’s can model non-linear decision boundaries, and there are many kernels to choose from. They are also fairly robust against overfitting, especially in high-dimensional space.
- Weaknesses:However, SVM’s are memory intensive, trickierto tune due to the importance of picking the right kernel, and don’t scale well to larger datasets. Currently in the industry, random forests are usually preferred over SVM’s.
- Implementations: Python / R
2.5. Naive Bayes
Naive Bayes (NB) is a very simple algorithm based around conditional probability and counting. Essentially, your model is actually a probability table that gets updated through yourtraining data. To predict a new observation, you’d simply “look up” the class probabilities in your “probability table” based on its feature values.
It’s called “naive” because its core assumption of conditional independence (i.e. all input features are independent from one another) rarely holds true in the real world.
- Strengths:Even thoughthe conditional independence assumption rarely holds true, NB models actually perform surprisingly well in practice, especially for how simple they are. They are easy to implement and can scale with your dataset.
- Weaknesses:Due to their sheer simplicity, NB modelsare oftenbeaten by models properly trained and tuned using the previous algorithms listed.
- Implementations:Python/ R
Clustering is anunsupervised learningtask for finding natural groupings of observations(i.e. clusters) based on the inherent structure within your dataset. Examples include customer segmentation, grouping similar items in e-commerce, and social network analysis.
Because clustering is unsupervised (i.e. there’s no “right answer”), data visualization is usually used to evaluateresults. If there is a “right answer” (i.e. you have pre-labeled clusters in your training set), then classification algorithms aretypically more appropriate.
K-Means is a general purpose algorithm that makes clusters based on geometric distances (i.e. distance on a coordinate plane) between points. The clusters are grouped around centroids, causing them to beglobular and have similar sizes.
This is our recommended algorithm for beginners because it’s simple, yet flexible enough to get reasonable results for most problems.
- Strengths:K-Means is hands-down the most popular clustering algorithm because it’s fast, simple, and surprisingly flexible if you pre-process your data and engineer useful features.
- Weaknesses:The user must specify the number of clusters, whichwon’t always be easy to do. In addition, if the true underlying clusters in your data are not globular, then K-Means will produce poor clusters.
- Implementations:Python / R
3.2. Affinity Propagation
Affinity Propagation is a relatively new clustering technique that makes clusters based on graph distances between points. The clusters tend to be smaller and have uneven sizes.
- Strengths: The user doesn’t need to specify the number of clusters (but does need to specify ‘sample preference’ and ‘damping’ hyperparameters).
- Weaknesses: The main disadvantage of Affinity Propagation is that it’s quite slow and memory-heavy,making it difficult to scale to larger datasets. In addition, it also assumes the true underlying clusters are globular.
- Implementations:Python / R
3.3. Hierarchical / Agglomerative
Hierarchical clustering, a.k.a. agglomerative clustering, is a suite of algorithms based on the same idea: (1) Start with each point in its own cluster. (2) For each cluster, merge it with another based on some criterion. (3) Repeat until only one cluster remains and you are left with ahierarchy of clusters.
- Strengths:The main advantage of hierarchical clustering is that the clusters are not assumed to be globular. In addition, it scales well to larger datasets.
- Weaknesses:Much like K-Means, the user must choose the number of clusters (i.e. the level of the hierarchy to “keep” after the algorithm completes).
- Implementations: Python / R
DBSCAN is a density based algorithm that makes clusters fordense regions of points. There’s also a recent new development called HDBSCAN that allows varying density clusters.
- Strengths: DBSCAN does not assume globular clusters, and its performanceis scalable. In addition,it doesn’t require every point to be assigned to a cluster, reducing the noise of the clusters (thismay be a weakness, depending on your use case).
- Weaknesses: The user must tune the hyperparameters ‘epsilon’ and ‘min_samples,’ which define the density of clusters. DBSCAN is quite sensitive to these hyperparameters.
- Implementations:Python / R
We’ve just taken a whirlwind tour through modern algorithms for the “Big 3” machine learning tasks: Regression, Classification, and Clustering.
In Part 2: Dimensionality Reduction Algorithms, we will look at algorithms for Feature Selection and Feature Extraction.
However, we want to leave you with a few words of advice based on our experience:
- First… practice, practice, practice. Reading about algorithms can help you find your footing at the start, but true mastery comes with practice. As you work through projects and/or competitions, you’ll develop practical intuition, which unlocksthe abilityto pick up almost any algorithm and apply it effectively.
- Second… masterthe fundamentals.There are dozens of algorithms we couldn’t list here, and some of them can be quite effective in specific situations. However, almost all of them are some adaptation of the algorithms on this list, which will provide you a strong foundation for applied machine learning.
- Finally, remember that better data beats fancier algorithms. In applied machine learning, algorithms are commodities because you can easily switch them in and out depending on the problem. However, effective exploratory analysis, data cleaning, and feature engineering can significantly boost your results.
What are the strengths and weaknesses of machine learning? ›
Strengths: Outputs have a nice probabilistic interpretation, and the algorithm can be regularized to avoid overfitting. Logistic models can be updated easily with new data using stochastic gradient descent. Weaknesses: Logistic regression tends to underperform when there are multiple or non-linear decision boundaries.What are the weaknesses of machine learning? ›
- Ethical concerns. There are, of course, many advantages to trusting algorithms. ...
- Deterministic problems. ...
- Lack of Data. ...
- Lack of interpretability. ...
- Lack of reproducibility. ...
- With all its limitations, is ML worth using?
It is a type of data mining that allows computers to “learn” on their own by analyzing data sets and using pattern recognition. Machine learning has many benefits, including improved accuracy, efficiency, and decision-making.What are the pros and cons of machine learning algorithms? ›
- Pro: Trends and Patterns Are Identified With Ease.
- Con: There's a High Level of Error Susceptibility.
- Pro: Machine Learning Improves Over Time.
- Con: It May Take Time (and Resources) for Machine Learning to Bring Results.
Learning Problems. First, we will take a closer look at three main types of learning problems in machine learning: supervised, unsupervised, and reinforcement learning.What are the 3 basic types of machine learning problems? ›
Machine learning involves showing a large volume of data to a machine so that it can learn and make predictions, find patterns, or classify data. The three machine learning types are supervised, unsupervised, and reinforcement learning.What are the main advantages and disadvantages of using algorithms? ›
Disdvantages of Algorithms:
- Alogorithms is Time consuming.
- Difficult to show Branching and Looping in Algorithms.
- Big tasks are difficult to put in Algorithms.
Random forest is one of the most popular and most powerful machine learning algorithms. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging. The bootstrap is a powerful statistical method for estimating a quantity from a data sample.Which algorithm is better in machine learning? ›
- Understand Your Project Goal. ...
- Analyze Your Data by Size, Processing, and Annotation Required. ...
- Evaluate the Speed and Training Time. ...
- Find Out the Linearity of Your Data. ...
- Decide on the Number of Features and Parameters.
Generally there are two main types of machine learning problems: supervised and unsupervised.
What are 3 advantages to using a machine? ›
Advantages of using simple machines are :
- Simple machine reduces human effort.
- Simple machine increases the speed of work.
- Force can be increased.
- Simple machine reduces time consumption.
- Simple machine can change the direction of the force.
Strong learners are models that have arbitrarily good accuracy. Weak and strong learners are tools from computational learning theory and provide the basis for the development of the boosting class of ensemble methods.What are the disadvantages of these algorithms? ›
- Algorithms are time-consuming.
- Big tasks are difficult to put in algorithms.
- Difficult to show branching and looping in algorithms.
- Understanding complex logic through algorithms can be very difficult.
Four main challenges in Machine Learning include overfitting the data (using a model too complicated), underfitting the data (using a simple model), lacking in data and nonrepresentative data.What problems machine learning Cannot solve? ›
We are listing five such problems in this article.
- Reasoning Power. ...
- Contextual Limitation. ...
- Scalability. ...
- Regulatory Restriction For Data In ML. ...
- Internal Working Of Deep Learning.
There are two main types of errors present in any machine learning model. They are Reducible Errors and Irreducible Errors.What factors affect machine learning? ›
- More data: ...
- Keep the given problem in mind: ...
- Parameters of the method: ...
- The quality of the data: ...
- Features in the data: ...
- Objective/loss function:
There are four basic approaches:supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. The type of algorithm data scientists choose to use depends on what type of data they want to predict.What are the four categories of machine algorithms? ›
As new data is fed to these algorithms, they learn and optimise their operations to improve performance, developing 'intelligence' over time. There are four types of machine learning algorithms: supervised, semi-supervised, unsupervised and reinforcement.What is the biggest drawback of algorithms? ›
Potential Pitfalls When Using Algorithms
The downside of using an algorithm to solve the problem is that this process tends to be very time-consuming. So if you face a situation where a decision needs to be made very quickly, you might be better off using a different problem-solving strategy.
What is the primary disadvantage of using algorithms? ›
What is the primary disadvantage of using algorithms? Solution may take a long time.What is the main disadvantage of using algorithms quizlet? ›
What is the primary disadvantage? Algorithms would take a lot more work and time to lead to the answer to the problem as heuristics are shortcuts to finding the same answer.What are the 5 popular algorithm used in machine learning? ›
To recap, we have covered some of the the most important machine learning algorithms for data science: 5 supervised learning techniques- Linear Regression, Logistic Regression, CART, Naïve Bayes, KNN.Which algorithm is more efficient and why? ›
The most efficient algorithm is one that takes the least amount of execution time and memory usage possible while still yielding a correct answer.What is the most simple machine learning algorithm? ›
K-means clustering is one of the simplest and a very popular unsupervised machine learning algorithms.
1) Linear Regression
Linear regression algorithm is used if the labels are continuous, like the number of flights daily from an airport, etc. The representation of linear regression is y = b*x + c.
We should say that the best algorithm would have time complexity of O(1) which is a constant. This means the run time of the algorithm is either independent of the data size or there is a constant such that the runtime is bounded above by a constant.What are considered strengths and weaknesses? ›
Strengths are defined as character traits or skills that are considered positive. Strengths include knowledge, attributes, skills, and talents. Weaknesses are just the opposite. Weaknesses are defined as character traits or skills that are considered negative or not as well developed.What factors are strengths and weaknesses? ›
Strengths are things that add value or offer your organization a competitive advantage. Weaknesses are those things that detract from the value of your offering or place you at a disadvantage when compared with your competitors.What is the biggest problem with machine learning? ›
The number one problem facing Machine Learning is the lack of good data. While enhancing algorithms often consumes most of the time of developers in AI, data quality is essential for the algorithms to function as intended.
What are the advantages and disadvantages of machine? ›
- Machines help man to reduce his job.
- Machines reduce the time taken to do a job.
- Machines can do the job of more persons in less time.
- But this has increased unemployment in many ways.
- Many unskilled labors are suffering after the improvement in technologies.
My strength is, I am a quick learner, a hard-working and active person. My weakness is I am not felling good until I completed my work on time. My strengths are I'm self-motivated, Hard-working, a quick learner and I am a good team player. My weakness is I'm an overthinker and kind-hearted person.What are 5 examples of weaknesses? ›
- Lack of knowledge of particular software.
- Public speaking.
- Taking criticism.
- Lack of experience.
- Inability to delegate.
- Lack of confidence.
"My strength is my flexibility to handle change. As customer service manager at my last job, I was able to turn around a negative working environment and develop a very supportive team. As far as weaknesses, I feel that my management skills could be stronger, and I am constantly working to improve them."What are examples of strengths and weaknesses of a research study? ›
Strengths of survey research include its cost effectiveness, generalizability, reliability, and versatility. Weaknesses of survey research include inflexibility and lack of potential depth.What is internal strengths and weaknesses? ›
The Internal Analysis of strengths and weaknesses focuses on internal factors that give an organization certain advantages and disadvantages in meeting the needs of its target market. Strengths refer to core competencies that give the firm an advantage in meeting the needs of its target markets.What are the disadvantages of modern machine? ›
Machines are expensive to buy, maintain and repair. Machine with or without uninterrupted use will get broken and worn-out. Their maintenance or repairs are costly, difficult to set up and operate without previous training. The pollution caused by machine increases, generating waste, augmenting power or oil use.What are disadvantages of machine process? ›
- Less surface finish is produced.
- Complex shapes cannot be machined.
- Tool wear frequently occur.
- Low dimensional accuracy.
- Noisy operations result in sound pollution.
- Lubrication is necessary.