Root Mean Squared Error & Root Mean Squared Logarithmic Error
Root Mean Squared Error (RMSE) and Root Mean Squared Logarithmic Error (RMSLE) are the techniques to find out the difference between the values predicted by your machine learning model and the actual values.
To understand these concepts and their differences, it is important to know what Mean Squared Error (MSE) means. MSE incorporates both the variance and the bias of the predictor. RMSE is calculated as the square root of the mean of the squared differences between the predicted values and the actual values. It is used to evaluate the performance of a regression model. It is a measure of how well the model is able to predict the target variable, and it is sensitive to the scale of the target variable.
Note: Square root of the variance is the standard deviation.
RMSLE is similar to RMSE, but it is calculated using the logarithmic difference between the predicted values and the actual values. It is used to evaluate the performance of a model when the target variable is skewed or has a large range of values. It is less sensitive to the scale of the target variable compared to RMSE. So basically, what changes is the variance that you are measuring? I believe RMSLE is usually used when you don’t want to penalize huge differences in the predicted and the actual values when both predicted and true values are huge numbers.
- If both predicted and actual values are small: RMSE and RMSLE are the same.
- If either predicted or the actual value is big: RMSE > RMSLE
- If both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)
Here is some example code for calculating RMSE and RMSLE in Python:
import numpy as np
def rmse(predictions, targets):
"""Calculate the root mean squared error between predictions and targets"""
return np.sqrt(np.mean((predictions - targets) ** 2))
def rmsle(predictions, targets):
"""Calculate the root mean squared logarithmic error between predictions and targets"""
return np.sqrt(np.mean((np.log(predictions + 1) - np.log(targets + 1)) ** 2))
To use these functions, you can pass in the predicted values and the actual values as arguments. For example:
predictions = [10, 20, 30, 40]
targets = [9, 19, 29, 39]
rmse_error = rmse(predictions, targets)
print(f'RMSE: {rmse_error:.4f}')
rmsle_error = rmsle(predictions, targets)
print(f'RMSLE: {rmsle_error:.4f}')
This will output the following:
RMSE: 1.4142
RMSLE: 0.0177
Here is some example code for calculating RMSE and RMSLE in Python, and also visualizing the results, and comparing the performance of two different models:
import numpy as np
import matplotlib.pyplot as plt
def rmse(predictions, targets):
"""Calculate the root mean squared error between predictions and targets"""
return np.sqrt(np.mean((predictions - targets) ** 2))
def rmsle(predictions, targets):
"""Calculate the root mean squared logarithmic error between predictions and targets"""
return np.sqrt(np.mean((np.log(predictions + 1) - np.log(targets + 1)) ** 2))
# Generate some example data
predictions_1 = np.random.normal(100, 10, 1000)
targets = np.random.normal(100, 10, 1000)
predictions_2 = np.random.normal(100, 5, 1000)
# Calculate the RMSE and RMSLE for both models
rmse_1 = rmse(predictions_1, targets)
rmse_2 = rmse(predictions_2, targets)
rmsle_1 = rmsle(predictions_1, targets)
rmsle_2 = rmsle(predictions_2, targets)
# Visualize the results
x = np.arange(2)
errors = [rmse_1, rmse_2]
plt.bar(x, errors)
plt.xticks(x, ['Model 1', 'Model 2'])
plt.ylabel('RMSE')
plt.title('RMSE Comparison')
plt.show()
errors = [rmsle_1, rmsle_2]
plt.bar(x, errors)
plt.xticks(x, ['Model 1', 'Model 2'])
plt.ylabel('RMSLE')
plt.title('RMSLE Comparison')
plt.show()
This code will generate two bar plots, one for RMSE and one for RMSLE, comparing the performance of the two models. A lower value on the y-axis indicates a better-performing model.
Another example of regression models:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Generate some fake data for the plot
true = np.random.normal(loc=10, scale=2, size=100)
pred1 = true + np.random.normal(loc=0, scale=1, size=100)
pred2 = true + np.random.normal(loc=0, scale=3, size=100)
# Calculate the RMSE and RMSLE for each prediction
rmse1 = np.sqrt(np.mean((pred1 - true) ** 2))
rmsle1 = np.sqrt(np.mean((np.log(pred1 + 1) - np.log(true + 1)) ** 2))
rmse2 = np.sqrt(np.mean((pred2 - true) ** 2))
rmsle2 = np.sqrt(np.mean((np.log(pred2 + 1) - np.log(true + 1)) ** 2))
# Create a dataframe with the true values and the two predictions
df = pd.DataFrame({'true': true, 'pred1': pred1, 'pred2': pred2})
# Use seaborn to create a scatterplot with regression lines for each prediction
sns.lmplot(x='true', y='pred1', data=df, scatter_kws={'alpha': 0.5})
# Add text labels with the RMSE and RMSLE values for each prediction
plt.text(x=0, y=12, s='RMSE: {:.2f}\nRMSLE: {:.2f}'.format(rmse1, rmsle1), fontsize=12)
# Use seaborn to create a scatterplot with regression lines for each prediction
sns.lmplot(x='true', y='pred2', data=df, scatter_kws={'alpha': 0.5})
# Add text labels with the RMSE and RMSLE values for each prediction
plt.text(x=0, y=10, s='RMSE: {:.2f}\nRMSLE: {:.2f}'.format(rmse2, rmsle2), fontsize=12)
plt.show()
This code generated a scatterplot with two regression lines, one for each prediction. The RMSE and RMSLE values for each prediction are displayed as text labels in the plot.
You can customize the plot further by using the various options available in seaborn
and matplotlib
, such as changing the colors, markers, and formatting of the plot elements.
If you find this helpful, feel free to share it. You can also drop a comment or ping me on LinkedIn if you have any doubts or questions.