Created date
Apr 1, 2024 06:41 AM
Data Science
Machine Learning
Artificial Intelligence
Applied forecasting
Here lists out the fundamental concepts on Neural Network
Jason Ching Yuen Siu
  1. Weights:
      • like dials that adjust how much influence different pieces of information have on the outcome.
      • The network tunes these dials to get better at making predictions.
  1. Bias:
      • an adjustment knob that lets a part of a neural network make decisions even when all incoming information is zero.
      • Adding bias helps the model ensure that even when all input features are zero, the neuron can still have a non-zero output if the bias is non-zero
      • It helps the network be more flexible and accurate.
  1. Weighted Sum:
      • The weighted sum is the dot product of the inputs to a neuron and its corresponding weights, plus the bias.
      • a calculation that combines all the inputs a neuron gets, each multiplied by its importance (weight), and then adds a little extra (bias) to find the total influence on the neuron's output.
  1. Activation Function:
      • introduce non-linearity into the network, allowing it to model complex relationships.
      • Has many types of AF;
      • Used in decisions that aren't just straight lines or simple yes/no answers;
  1. Gradient of the Loss Function:
      • indicates how the loss / error (the difference between the network's prediction and the actual target values) changes with slight tweak to parameters of weights and biases.
      • crucial for guiding how the network learns and improves from mistakes.
  1. dw = input * error:
    1. during backpropagation
      • calculates how much a weight needs to change to reduce errors.
      • uses the input and how far off the prediction was to adjust the weight properly.
  1. db = derivative of the bias:
      • is the change needed in the bias, not the weight, to help reduce errors in predictions.
  1. dz = gradient of weighted sum = error:
      • Computed as part of the backpropagation process
      • the gradient of the neuron's output before the activation function (the weighted sum) is influenced by the error.
      • part of figuring out how to adjust weights and biases to make better predictions.
  1. da = gradient of the activation function:
      • tells us how changes in the weighted sum affect the output of the activation function, which is key for fine-tuning the network during learning.
      • helps in tuning the weights such that the network's predictions become more accurate over iterations
  1. Backpropagation:
      • is the process of tweaking all the dials (weights) in the network from the output back to the input to reduce errors and make better predictions.
      • involves calculating the gradient of the loss function with respect to each weight by the chain rule, moving backwards from the output layer to the input layer.
  1. Optimization Algorithms:
      • are different strategies for adjusting weights in the best way possible to make a neural network learn efficiently.
      • like SGD (Stochastic Gradient Descent), Adam, RMSprop, etc
  1. Regularization:
      • are techniques to prevent a neural network from overfitting and ensuring that the neural network models generalize well to unseen data.
      • like L1 and L2, dropout, and batch normalization
  1. Loss Functions:
      • are ways of measuring how far off a network's predictions are from the actual answers, guiding it to learn better.
      • (e.g., cross-entropy loss for classification, mean squared error for regression)
  1. Convolutional Neural Networks (CNNs):
      • Specialized for images like object detection
  1. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM):
      • are designed for data that comes in sequences, like language or time series, allowing the network to remember information for a long time.
  1. Transformer Models:
      • great at handling sequences of data by paying attention to the parts that matter most, powering a lot of the latest language understanding systems.
  1. Learning Rate and Learning Rate Schedules:
      • learning rate controls how big of steps to take when adjusting weights.
      • Adaptive learning rate techniques, such as learning rate decay or cyclical learning rates, are important for improving training efficiency and model performance.
      • Different strategies for changing this rate can make learning more efficient.
  1. Embeddings:
      • Transform words into a vector of numeric representation
  1. Transfer Learning and Fine-tuning:
      • involves taking a network that's already been trained on one task and tweaking it to do well on a different, but related, task.
  1. Federated Learning:
      • A new approach where multiple devices or servers learn together without sharing their data, preserving privacy while still improving the model.

Storytime: How cooking a dish of pumpkin risotto can relate to these concept?

The Italian Pumpkin Risotto Adventure of Jamie and Alex

Jamie: Welcome to the culinary adventure, Alex! Today, we're embarking on the journey of creating the perfect Italian pumpkin risotto.
Alex: I'm thrilled, Jamie! I've always wanted to learn how to make a delicious risotto.
Jamie: Let's dive in. The first step in our risotto is choosing our ingredients, much like setting the weights in a recipe. Each spice, the pumpkin, and the rice—each plays a crucial role, like ingredients coming together in harmony.
Alex: So, getting the balance right from the start is key, then?
Jamie: Precisely! And there's always a secret in every great risotto—our baseline flavor or bias, which in this case, is our homemade vegetable broth. It ensures that even before the other flavors meld, our risotto has a foundational taste.
Alex: Ah, the base that carries everything else. What comes after?
Jamie: We combine these flavors, akin to the weighted sum. Imagine stirring our risotto, where the broth, pumpkin, and rice come together, each contributing its flavor, including that foundational broth, to form the base of our dish.
Alex: I see, so it's all about how these flavors mix.
Jamie: Right. Now, deciding when to add the pumpkin and cheese is like our activation function. It's not just about mixing; it's about when and how to add them to bring out the risotto's character, much like choosing the right moment to introduce complexity into our dish.
Alex: Timing is everything, then.
Jamie: Indeed. And if our risotto tastes a bit off, we adjust, akin to the gradient of the loss function. Perhaps it needs a pinch more salt or a dash of pepper—small adjustments based on tasting, to edge closer to our ideal flavor.
Alex: Taste and tweak, got it.
Jamie: If we find the risotto too bland, we might decide it needs more pumpkin. This adjustment is like calculating dw, changing our 'ingredient weight' based on our 'flavor error'.
Alex: So, more pumpkin to enhance the flavor!
Jamie: Exactly. And sometimes, it's the broth that needs adjusting, our bias, to better suit the overall taste. That's adjusting db, changing the baseline to improve the dish.
Alex: Adjusting the foundation if needed.
Jamie: Correct. And if our balance is off before the dish is finished, we're in the realm of dz, adjusting our flavor mix based on feedback before our final seasoning.
Alex: Keeping an eye on the balance as we go.
Jamie: Right. Adjusting how much an ingredient like Parmesan affects the overall flavor is akin to da, the activation gradient. It’s crucial for fine-tuning our risotto during cooking.
Alex: So, the cheese can really change the game.
Jamie: After we finish, we might reflect on the entire cooking process, identifying where changes are needed. This is backpropagation, reviewing and tweaking our approach based on the outcome to make it better next time.
Alex: Learning from the whole cooking experience, then?
Jamie: Yes, and experimenting with different amounts of pumpkin or cooking times is like using optimization algorithms, finding the best way to achieve that perfect creamy texture and rich flavor.
Alex: Finding the secret to the perfect risotto!
Jamie: Remember, not letting any single ingredient dominate is crucial. This balance is like regularization, ensuring our risotto is delightful for everyone, not just us.
Alex: A balanced dish for all tastes.
Jamie: Lastly, evaluating our risotto compared to the ideal flavor we aim for is like the loss functions. It guides us on what to tweak next time.
Alex: So, always striving for that perfect risotto.
Jamie: Exactly, Alex! Through this process, we learn the art of balance, timing, and adjustment, enhancing our culinary skills over time, much like training a neural network.

Explanation and Implications:

Jamie: Our journey in making the Italian pumpkin risotto mirrors the training of neural networks.
The ingredients and their weights represent the input features and their importance.
The baseline flavor, or broth, acts as the bias, providing a starting point for flavor.
The weighted sum is the combination of all these elements before final adjustments, much like the initial output calculation in a neuron.
Alex: So, the cooking process is akin to how AI learns and adjusts?
Jamie: Yes!
The activation function decision—like adding pumpkin and cheese at the right moment—adds complexity and refines our dish, similar to how an activation function introduces non-linearity in a model.
Adjusting based on taste is akin to the gradient of the loss function, guiding how to tweak the recipe to reduce the 'error' or difference from our ideal taste.
Alex: It's fascinating to see how cooking parallels AI training.
Backpropagation in our context means reflecting on and improving each cooking step, much like how a neural network learns from the output back through its layers.
Optimization algorithms are our experiments to find the best cooking method, and
regularization ensures no single flavor overpowers the dish, maintaining general appeal.
The loss functions measure how close we are to the ideal dish, guiding improvements—just as they guide AI models to learn better.
Alex: This makes the concepts so much more relatable. Thanks, Jamie!
Jamie: You're welcome, Alex! Whether in the kitchen or in AI, it's all about learning, adjusting, and striving for perfection.
On LLM Settings行動力不足的六大原因