AI/ML Definitions Summary
A consolidated glossary of key terms and concepts in Artificial Intelligence and Machine Learning.
General AI/ML
- Artificial Intelligence (AI): The simulation of human intelligence processes by machines, especially computer systems.
- Machine Learning (ML): A subset of AI that enables systems to learn from data and improve from experience without being explicitly programmed.
- Deep Learning (DL): A subset of ML utilizing neural networks with three or more layers to learn complex patterns from large amounts of data.
- Data Science: The interdisciplinary field of extracting knowledge and insights from data.
- Dataset: A collection of data used for training/testing. (Features = Columns, Samples = Rows).
- Feature: An individual measurable property or variable (input).
- Label/Target: The outcome variable we want to predict (output).
- Training/Validation/Test Split: Dividing data to Train (learn), Validate (tune hyperparameters), and Test (evaluate final performance).
- Overfitting: Model performs well on training data but poorly on test data (high variance).
- Underfitting: Model performs poorly on both training and test data (high bias).
- Bias-Variance Tradeoff: The struggle to minimize both bias (error from erroneous assumptions) and variance (error from sensitivity to small fluctuations).
Supervised Learning
The machine learns from labeled data (Input + Correct Output).
1. Regression
Goal: Predict a continuous numerical value. * Linear Regression: Fits a straight line ensuring the sum of squared errors is minimal. Best for simple relationships. * Polynomial Regression: Models non-linear relationships by raising features to a power (e.g., $x^2$, $x^3$). * Ridge (L2) & Lasso (L1) Regression: Linear regression with "Regularization" to prevent overfitting. L1 can zero out coefficients (feature selection); L2 shrinks them. * Support Vector Regression (SVR): Finds a hyperplane deviation boundary to fit the data.
2. Classification
Goal: Predict a categorical class label (Discrete). * Binary Classification: Two classes (e.g., Spam vs. Not Spam). * Multi-Class Classification: More than two classes (e.g., Handwritten digits 0-9). * Logistic Regression: Uses a Sigmoid function to squeeze output between 0 and 1 (probability). used for Binary Classification. * Decision Trees: splits data into branches like a flowchart based on feature values. Prone to overfitting. * Random Forest: An "Ensemble" of many Decision Trees. Reduces overfitting by averaging results (Bagging). * Support Vector Machines (SVM): Finds the "Maximum Margin Hyperplane" that best separates the classes. Effective in high dimensions. * K-Nearest Neighbors (KNN): "Lazy learner" that classifies a point based on the majority class of its 'K' nearest neighbors. * Naive Bayes: Probabilistic classifier based on Bayes' Theorem. Assumes independence between features. Good for text classification.
Unsupervised Learning
The machine learns from unlabeled data (Input only, No Output).
- Clustering: Grouping similar data points together.
- K-Means: Partitions data into K distinct clusters.
- Hierarchical Clustering: Builds a tree of clusters.
- Dimensionality Reduction: Reducing the number of input variables while retaining important information.
- PCA (Principal Component Analysis): Projects data onto lower dimensions (Principal Components) to maximize variance.
- t-SNE: Non-linear technique mainly for data visualization.
Deep Learning
- Neural Network (ANN): Computing system inspired by the biological neural networks of animal brains.
- Neuron/Perceptron: The basic unit, applying weights, bias, and an activation function to inputs.
- Activation Functions:
- Sigmoid: $1/(1+e^{-x})$. Output 0 to 1. Vanishing gradient problem.
- ReLU (Rectified Linear Unit): $max(0, x)$. Standard for hidden layers.
- Softmax: Converts a vector of numbers into a probability distribution (sum = 1). Used for output layer in multi-class classification.
- CNN (Convolutional Neural Network): Specialized for grid data (images). Uses Convolution (filters) and Pooling (downsampling) layers to detect spatial hierarchies.
- RNN (Recurrent Neural Network): Specialized for sequential data (time-series, text). Has "memory" of previous inputs. Prone to vanishing/exploding gradients.
- LSTM/GRU: Advanced RNNs with "Gates" to handle long-term dependencies.
Generative AI & LLMs
- Generative AI: Creates new content (text, image, audio) rather than just classifying existing data.
- Transformer: Architecture introduced in 2017 relying entirely on "Attention" mechanisms. Parallelizable and scalable.
- Self-Attention: Weighs the importance of words in a sentence relative to each other.
- LLM (Large Language Model): Probabilistic model trained on massive text corpora to predict the next token.
- Token: The basic unit of text (word part) processed by LLMs ($~0.75$ words).
- Embedding: Vector representation of a token capturing semantic meaning.
- Context Window: Limit on the amount of text the model can consider at one time.
- Hallucination: Confident but factually incorrect generation.
- RAG (Retrieval-Augmented Generation): Fetching external data to include in the prompt context to improve accuracy.
- Fine-Tuning: specialized training on a smaller dataset to adapt a pre-trained model to specific tasks.
- RLHF: Reinforcement Learning from Human Feedback. Tuning models to be helpful and safe using human preferences.
- Prompt Engineering: The art of crafting inputs (prompts) to get the best output from an LLM.
Model Evaluation Metrics
- Accuracy: (Correct Predictions) / (Total Predictions).
- Precision: (True Positives) / (True Positives + False Positives). "Quality".
- Recall (Sensitivity): (True Positives) / (True Positives + False Negatives). "Quantity".
- F1-Score: Harmonic mean of Precision and Recall.
- Confusion Matrix: Table layout that visualizes the performance of an algorithm.
- MSE (Mean Squared Error): Average squared difference between estimated values and the actual value (Regression).