Comparison between Sigmoid and Softmax Activation Function with Python

type

Post

Created date

Mar 9, 2024 09:16 AM

Difference between Sigmoid and Softmax

Comparison between Sigmoid and Softmax Activation Function with Python (youtube.com)

A conversation on what use of activation functions in the context of predicting loan defaults.

dialogue between Alex, a Machine Learning Engineer, and Jordan, a Product Manager, discussing the use of activation functions in the context of predicting loan defaults.

Jordan: Alex, I'm trying to understand how we're using neural networks for our loan default prediction model. Specifically, what's this about using different activation functions?

Alex: Sure, Jordan. In our neural network, an activation function decides whether a neuron should be activated or not. It's like deciding if a piece of information is relevant for the prediction.

Jordan: Okay, and what’s the role of the Sigmoid function here?

Alex: The Sigmoid function is perfect when we’re making a binary decision. In the context of loan defaults, it helps us decide between two classes: will default or will not default. It outputs a value between 0 and 1, which we can interpret as a probability.

Jordan: Got it. And the Softmax function?

Alex: Softmax is used when we have more than two classes. Although for loan defaults we generally have a yes or no decision, if we had multiple levels of risk we wanted to classify, like 'low', 'medium', or 'high', Softmax would be suitable as it gives a probability distribution across those classes.

Jordan: That makes sense. But what do you mean by using them in different layers?

Alex: In a neural network, we have an input layer, hidden layers, and an output layer. The Sigmoid can be used in the final layer for binary outcomes like our case, while Softmax is typically used in the final layer for multi-class problems. However, we can also use them in hidden layers to help model complex relationships.

Jordan: So in hidden layers, they help in understanding the complex patterns regarding who might default on a loan?

Alex: Exactly! They determine what information is passed forward through the network, contributing to our final prediction.

Jordan: Makes sense now. The activation function is crucial in shaping the output at each layer, whether it's recognizing simple patterns or making the final prediction in our loan default scenario.

Alex: Precisely. Each function plays a significant role in our model's ability to learn from the data and make accurate predictions.

A dialogue between Taylor, a Data Scientist, and Casey, a Data Analyst, discussing the code

Code from Softmax/SoftmaxActivation.py at main · AIMLModeling/Softmax (github.com)


from numpy import exp
import numpy as np
import matplotlib.pyplot as plt
# calculate the softmax of a vector
def softmax(vector):
	e = exp(vector)
	return e / e.sum()
def sigmoid(x):
    return 1/(1 + np.exp(-x))
# define data
data = [-1.5, 2.2, -0.8, 3.6]
# convert list of numbers to a list of probabilities
print(f"Input vector:{data}")
result_softmax = softmax(data)
# report the probabilities
print(f"softmax result:{result_softmax}")
sum_softmax=0.0
for i in range(0, len(result_softmax)):    
   sum_softmax = sum_softmax + result_softmax[i];    
     
print(f"Sum of all the elements of softmax results: {sum_softmax}");    
print("")
sig_result=[0] *len(data)
sum_sigmoid=0
for i in range(0, len(data)):    
   sig_result[i] = sigmoid(data[i]);
   print(f"Sigmoid result {i}: {sig_result[i]}")
   sum_sigmoid = sum_sigmoid + sig_result[i];    
     
print(f"Sum of all the elements of Sigmoid results: {sum_sigmoid}");    

x = np.linspace(-10, 10, 100)
y = softmax(x)
plt.scatter(x, y) 
plt.title('Softmax Function') 
plt.show()

Casey: Hey Taylor, I came across this code snippet that uses softmax and sigmoid functions, and I'm having trouble understanding it. Can you walk me through it?

Taylor: Of course, Casey! Let's start with the basics. Both softmax and sigmoid are activation functions in neural networks, which you already know. This code defines these functions and then applies them to a data vector.

Casey: Okay, I see two functions defined here, softmax and sigmoid. What's the difference between them?

Taylor: The softmax function is used to convert a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the exponentials of the input numbers. On the other hand, sigmoid function gives us the probability between 0 and 1 for an individual value.

Casey: I see, so softmax is about the whole vector, and sigmoid is for individual values. Why do we need to convert numbers into probabilities?

Taylor: In the context of machine learning, probabilities help us make decisions. For instance, if we're trying to classify data into categories, probabilities give us a measure of confidence about our classifications.

Casey: Got it. Now, the code has a data vector. What does it represent?

Taylor: It's just an example data vector to demonstrate the functions. Think of it as raw scores or logits that you might get from the output layer of a neural network before activation.

Casey: Makes sense. And then we apply softmax to this vector, right?

Taylor: Yes, we pass the data vector through the softmax function which normalizes these values into probabilities that sum up to 1, making it a proper probability distribution.

Casey: The code prints the result and the sum of the softmax results. Why is the sum important?

Taylor: It's to show that softmax has done its job correctly. The sum of probabilities should be 1, which confirms that we have a valid probability distribution.

Casey: Okay, and the sigmoid function is applied in a loop. Why is that?

Taylor: The sigmoid function is meant for individual numbers. The loop applies sigmoid to each number in the data vector separately, giving us a list of probabilities.

Casey: So we end up with two different lists of probabilities, one from softmax and another from sigmoid?

Taylor: Exactly. softmax gives a distribution across our vector, useful for multi-class classification. sigmoid gives individual probabilities, useful for binary classification.

Casey: What about the scatter plot at the end?

Taylor: That's a visual representation of the softmax function. It plots the softmax probabilities for a range of values from -10 to 10. It's useful to see how the function behaves across different inputs.

Casey: Now it's clearer. We're using these functions to understand the probabilities of different outcomes, and the plot is to see how softmax assigns probabilities.

Taylor: You've got it, Casey! And remember, understanding the output of these functions is key in predicting outcomes like whether a loan will default or not, based on the learned patterns.

Casey: Thanks, Taylor. This was really helpful!

I hope this dialogue clarifies the context and functionality of the code for you.

A similar conversation between Taylor and Casey discussing how to apply the softmax and sigmoid functions in the context of loan default predictions.

Refined Code


from numpy import exp
import numpy as np
import matplotlib.pyplot as plt

# Sigmoid function for binary classification
def sigmoid(x):
    return 1/(1 + np.exp(-x))

# Define logits for loan default probabilities
# A real-world model would output these logits based on application data
loan_logits = np.array([0.8, -1.2, 3.0])  # Example logits from our model

# Calculate the probability of default using sigmoid
default_probabilities = sigmoid(loan_logits)

# Print probabilities of default
print(f"Loan default probabilities: {default_probabilities}")

# Plotting the sigmoid function
x = np.linspace(-10, 10, 100)
y = sigmoid(x)
plt.plot(x, y) 
plt.title('Sigmoid Function') 
plt.xlabel('Logits')
plt.ylabel('Default Probability')
plt.show()

Taylor: Here, we have a list of logits, loan_logits, which our neural network has determined based on loan application data. The sigmoid function is then used to calculate the probability of default for each application.

Casey: I understand now. We're not using softmax here because we don't have multiple categories, right?

Taylor: Exactly. We're only predicting if someone will default or not, which is a binary outcome. If we were assigning applications to different risk categories, that's when softmax would come into play.

Casey: And the plot shows us how the sigmoid function translates logits into probabilities?

Taylor: Correct. It's a visual way to understand how changes in logits affect the probability of default.

Casey: Thanks, that makes it clear how we apply these functions to loan defaults.

Reference:

Comparison between Sigmoid and Softmax Activation Function with Python (youtube.com)

Softmax/SoftmaxActivation.py at main · AIMLModeling/Softmax (github.com)