Logistic Regression Tutorial
After a very long time since its initial discovery Logistic Regression algorithm still finds its way into the Data Scientist’s toolbox. Arguably one of the most efficient machine learning algorithms which also provide probabilistic output, Logistic Regression can be used to crack many classification problems as long as the data has linear relationships.
Logistic Regression is a classification only linear model and when continuous predictions are necessary Ordinary Linear Squares or OLS models can be a great alternative.
Logistic Regression is a linear model that’s very similar to Linear Regression but used for classification.
Ideally Logistic Regression can be done when:
- Data is linear
- Probabilistic results are favored/needed
- Prediction class is binary: 1 vs 0, True vs False, 2 vs 4, default vs no default, malignant vs benign etc
- When more insight is needed regarding the significance of features in the dataset
Why Logistic Regression Algorithm?
Regression is one of the oldest algorithms used in Machine Learning today and Logistic Regression is certainly quite old as well. The fact that it lasted so long and is still actively and beneficially utilized in many domains throughout industries, science and academia suggests that it’s one of the finest and practically useful algorithms out there. But what are some of its instantly recognizable benefits? Here is a short summary of our Logistic Regression pros article:
- Based on fundamental concept of regression, cost function and gradient descent.
- Fast and scalable
- Offers probability reports
- Lots of optimization opportunities with hyperparameters
- Not assumptive
- Accurate prediction capability
Logistic Regression Pros
Logistic Regression lures scientists, entrepreneurs, financiers, data scientists, analysts, teachers and anyone for that matter particularly for its fast, scalable and probability oriented prediction capabilities.
Additionally, Logistic Regression makes very accurate predictions and there are all the regularization related tweaks one can make which gives tremendous range of possibilities for managing fitting related issues.
See full list of: Logistic Regression Advantages
Logistic Regression Cons
Logistic Regression also requires dealing with concepts such as regularization and normalization. On top of that, dataset will need to have linear relationships (generally) for Logistic Regression to be an eligible algorithm to work with.
Also, Logistic Regression can only do classification predictions which restricts the use cases for this exact algorithm but there are plenty of other linear models for regression problems.
See full list of: Logistic Regression Disadvantages
3- Key Industries
- Cyber Security
Who Found Logistic Regression?
4- Logistic Regression History
Belgian mathematician Pierre François Verhulst invented Logistic Regression in 1838. He then went on to publish an updated version of the algorithm in 1845.
You can see a more detailed article about the history of Logistic Regression below:
How fast is Logistic Regression?
5- Logistic Regression Complexity
A great advantage of regression algorithms is their optimization opportunities and usually favorable time complexity hence good scalability. Situation is not so different for Logistic Regression. It will have Linear worst case complexity which can be shown with Big O Notation as O(N).
You can also refer to our Logistic Regression runtime performance tests to have an idea about the algorithm’s scalability and runtime performance in a practical sense. Here is a little summary of the results and you can find the complete list as well as different time complexity scenarios in the Logistic Regression Complexity page below.
Logistic Regression (1 Million rows with 2 features): 0.98 seconds
Logistic Regression (1 Million rows with 50+ features): 4.16 seconds
Tests were done using i7 8th Gen processor and 16GB RAM.
You can read a more detailed tutorial about Logistic Regression Complexity and Runtime Performance in the article below:
How to use Logistic Regression?
6- Scikit-Learn Logistic Regression Implementation
Logistic Regression is a fast classification algorithm that scales well and produces probabilistic reports. A Logistic Regression machine learning model instance can easily be created using Scikit-Learn’s linear module which then can be trained and used to predict outcomes.
You can see the basics of a Logistic Regression Classifier model in the tutorial below:
How can I tune Logistic Regression?
7- Logistic Regression Optimization
Like other Scikit-Learn implementations Logistic Regression model has plenty of hyperparameters which can be tweaked and optimized.
Regularization is a smart technique used to optimize machine learning models to reduce overfitting or underfitting and find a more optimal balance in the algorithm’s fitting based on the problem at hand.
Regularization being such a major concept in Logistic Regression, there are a couple of important parameters that are related to it available for optimization. These are C, penalty parameter, penalty, penalty norm. On the other hand solver selection is another important concept in Logistic Regression which defines the solver algorithm used to solve the cost function of Logistic Regression.
You can find a comprehensive tutorial about Logistic Regression hyperparameters and their optimization techniques below:
Is there a Logistic Regression Implementation Example?
8- Logistic Regression Example
An example can be the best means to truly understand a Logistic Regression implementation. We prepared a Logistic Regression example where you can learn basics of Logistic Regression models as well as intricate details such as logloss visualization, decision borders and optimization of C parameter.
Pipeline For Non-linear Logistic Regression Applications
Please note that, Logistic Regression can also be applied to non-linear data using polynomial transformation.
You can use a pipeline which merges multiple steps and makes machine learning models more easily managable and practical via reducing them to a single object. Here is an example utilization:
pipe = Pipeline([('polynomial_features', poly), ('logistic_regression', lr)]) pipe.fit(X_train, y_train) pipe.score(X_test, y_test)