Decision Tree in Machine Learning

What is a Decision Tree?

A decision tree is a machine learning technique that can be used for regression and classification. A decision tree gets its name because the algorithm divides the dataset into smaller and smaller pieces until the data has been broken down into single instances, which are then categorized. It’s possible to visualize the outcomes of the algorithm by imagining a tree with many branches.

  • There are internal and leaf nodes in a decision tree that represent tests on features and branches that represent conjunctions of features that lead to class labels. Additionally, there are three different decision three types in machine learning based on the type of question.

To learn more about decision trees, let’s take a closer look at how they work. Your machine learning initiatives will be more successful if you have a deeper understanding of how decision trees work and their applications.

The architecture of Decision Tree

A decision tree algorithm in machine learning is quite similar to a flowchart in that it shows a series of options and their consequences. To use a flowchart, you start at the root of the chart and then go to one of the next possible nodes depending on how you answer the filtering criteria of that starting node. All of these steps must be carried out repeatedly until a conclusion is reached.

  • Decision trees in machine learning use an algorithm to break down a large dataset into individual data points based on several criteria.

Every internal node in a decision tree is a test/filtering criterion, hence they all work in the same way. The “leaves” are the nodes on the exterior of the tree, which are the labels for the datapoint in question. Each of the internal nodes leads to the next node by a series of features or conjunctions of features, which are called branches.

When using a decision tree, the dataset is divided into individual data points based on various criteria. In this case, the dataset is divided into segments based on variables or attributes that are different from one another. So the decision tree in a machine learning example would be if you use variables to detect whether a dog or a cat is being described by the input features.

In other words, what algorithms are utilized to divide the data into branches and leaves in the end? While there are a variety of ways to split a tree, “recursive binary split” is arguably the most commonly used method. These methods start at the root of a dataset and count all of its features to determine how many alternative splits can be made. An algorithm estimates how much accuracy each conceivable split will compromise, and then the split is made based on criteria that sacrifices as little accuracy as possible. Using the same general method, sub-groups are produced by repeating the process.


A cost function is used to calculate the cost of the split. For regression and classification tasks, a separate cost function is utilized. Both cost functions seek to find which branches have the most comparable response values, or which branches are the most homogeneous. Consider the case where you want test data from a specific class to follow specified paths, which seems reasonable.

  • Cost function for binary split: sum(y-prediction)^2

The mean of the replies from the training data for the group is the forecast for that set of data points. The cost function is applied to all of the data points to determine the cost of all possible splits, and the split with the lowest cost is chosen.

  • Cost function for classification: G = sum(pk*(pk-1))

The Gini score is an evaluation of a split’s efficacy based on how many occurrences of different classes are in the groups formed as a result of the split. In other words, it measures the degree to which the groups are intermingled after the break. When all of the groups formed as a result of the split contain only inputs from one class, it is said to be an ideal split. If an optimal split has been found, the value of “pk” will be either 0 or 1, and G will be zero. In the case of binary classification, you might anticipate that the worst-case split is one with a 50-50 representation of the classes in the split. The “pk” value would be 0.5 in this scenario, and G would be 0.5 as well.

When classification is required, but computing time is a major limitation, decision trees can be very helpful. To find out which properties in the given datasets are most predictive, use decision trees. The rules used to classify data can also be difficult to read in machine learning algorithms; decision trees, on the other hand, produce interpretable rules. When compared to algorithms that can only process one of these variable kinds (categorical or continuous), decision trees require less preprocessing.