AI/ML Definitions Summary

A consolidated glossary of key terms and concepts in Artificial Intelligence and Machine Learning.

General AI/ML

Artificial Intelligence (AI): The simulation of human intelligence processes by machines, especially computer systems.
Machine Learning (ML): A subset of AI that enables systems to learn from data and improve from experience without being explicitly programmed.
Deep Learning (DL): A subset of ML utilizing neural networks with three or more layers to learn complex patterns from large amounts of data.
Data Science: The interdisciplinary field of extracting knowledge and insights from data.
Dataset: A collection of data used for training/testing. (Features = Columns, Samples = Rows).
Feature: An individual measurable property or variable (input).
Label/Target: The outcome variable we want to predict (output).
Training/Validation/Test Split: Dividing data to Train (learn), Validate (tune hyperparameters), and Test (evaluate final performance).
Overfitting: Model performs well on training data but poorly on test data (high variance).
Underfitting: Model performs poorly on both training and test data (high bias).
Bias-Variance Tradeoff: The struggle to minimize both bias (error from erroneous assumptions) and variance (error from sensitivity to small fluctuations).

Supervised Learning

The machine learns from labeled data (Input + Correct Output).

1. Regression

Goal: Predict a continuous numerical value. * Linear Regression: Fits a straight line ensuring the sum of squared errors is minimal. Best for simple relationships. * Polynomial Regression: Models non-linear relationships by raising features to a power (e.g., $x^2$, $x^3$). * Ridge (L2) & Lasso (L1) Regression: Linear regression with "Regularization" to prevent overfitting. L1 can zero out coefficients (feature selection); L2 shrinks them. * Support Vector Regression (SVR): Finds a hyperplane deviation boundary to fit the data.

2. Classification

Goal: Predict a categorical class label (Discrete). * Binary Classification: Two classes (e.g., Spam vs. Not Spam). * Multi-Class Classification: More than two classes (e.g., Handwritten digits 0-9). * Logistic Regression: Uses a Sigmoid function to squeeze output between 0 and 1 (probability). used for Binary Classification. * Decision Trees: splits data into branches like a flowchart based on feature values. Prone to overfitting. * Random Forest: An "Ensemble" of many Decision Trees. Reduces overfitting by averaging results (Bagging). * Support Vector Machines (SVM): Finds the "Maximum Margin Hyperplane" that best separates the classes. Effective in high dimensions. * K-Nearest Neighbors (KNN): "Lazy learner" that classifies a point based on the majority class of its 'K' nearest neighbors. * Naive Bayes: Probabilistic classifier based on Bayes' Theorem. Assumes independence between features. Good for text classification.

Unsupervised Learning

The machine learns from unlabeled data (Input only, No Output).

Clustering: Grouping similar data points together.
- K-Means: Partitions data into K distinct clusters.
- Hierarchical Clustering: Builds a tree of clusters.
Dimensionality Reduction: Reducing the number of input variables while retaining important information.
- PCA (Principal Component Analysis): Projects data onto lower dimensions (Principal Components) to maximize variance.
- t-SNE: Non-linear technique mainly for data visualization.

Deep Learning

Neural Network (ANN): Computing system inspired by the biological neural networks of animal brains.
Neuron/Perceptron: The basic unit, applying weights, bias, and an activation function to inputs.
Activation Functions:
- Sigmoid: $1/(1+e^{-x})$. Output 0 to 1. Vanishing gradient problem.
- ReLU (Rectified Linear Unit): $max(0, x)$. Standard for hidden layers.
- Softmax: Converts a vector of numbers into a probability distribution (sum = 1). Used for output layer in multi-class classification.
CNN (Convolutional Neural Network): Specialized for grid data (images). Uses Convolution (filters) and Pooling (downsampling) layers to detect spatial hierarchies.
RNN (Recurrent Neural Network): Specialized for sequential data (time-series, text). Has "memory" of previous inputs. Prone to vanishing/exploding gradients.
LSTM/GRU: Advanced RNNs with "Gates" to handle long-term dependencies.

Generative AI & LLMs

Generative AI: Creates new content (text, image, audio) rather than just classifying existing data.
Transformer: Architecture introduced in 2017 relying entirely on "Attention" mechanisms. Parallelizable and scalable.
- Self-Attention: Weighs the importance of words in a sentence relative to each other.
LLM (Large Language Model): Probabilistic model trained on massive text corpora to predict the next token.
Token: The basic unit of text (word part) processed by LLMs ($~0.75$ words).
Embedding: Vector representation of a token capturing semantic meaning.
Context Window: Limit on the amount of text the model can consider at one time.
Hallucination: Confident but factually incorrect generation.
RAG (Retrieval-Augmented Generation): Fetching external data to include in the prompt context to improve accuracy.
Fine-Tuning: specialized training on a smaller dataset to adapt a pre-trained model to specific tasks.
RLHF: Reinforcement Learning from Human Feedback. Tuning models to be helpful and safe using human preferences.
Prompt Engineering: The art of crafting inputs (prompts) to get the best output from an LLM.

Model Evaluation Metrics

Accuracy: (Correct Predictions) / (Total Predictions).
Precision: (True Positives) / (True Positives + False Positives). "Quality".
Recall (Sensitivity): (True Positives) / (True Positives + False Negatives). "Quantity".
F1-Score: Harmonic mean of Precision and Recall.
Confusion Matrix: Table layout that visualizes the performance of an algorithm.
MSE (Mean Squared Error): Average squared difference between estimated values and the actual value (Regression).