Brief Notes on GenAI from April | Blogs

headline image

1 · Machine–Learning Model Taxonomy 🔗

1.1 Discriminative vs Generative 🔗

Discriminative models learn the conditional distribution $P(y\mid x)$ to separate classes.
- Primary use · classification/regression.
- Typical nature · often deterministic at inference.
- Examples · Logistic Regression, SVM, plain Feed‑forward NNs, ResNet‑like CNNs.
Generative models learn the joint distribution $P(x,y)=P(x\mid y)P(y)$ and can synthesize new x.
- Primary use · both classification and data generation/simulation.
- Typical nature · probabilistic/stochastic.
- Examples · Naïve Bayes, Hidden Markov Models, GANs, VAEs, Diffusion models.

Aspect	Discriminative	Generative
Learns	$P(y\mid x)$	$P(x,y)$ or $P(x\mid y)P(y)$
Produces	Class label / score	Sample, likelihood or label
Strength	Decision boundaries	Full data distribution
Weakness	Needs labelled data	Harder to train/scale

1.2 Deterministic vs Probabilistic 🔗

Aspect	Deterministic	Probabilistic
Output	Same value for same input	Distribution or random sample
Pros	Fast, predictable	Express uncertainty, robust to missing data
Cons	Ignores uncertainty	More compute, stochastic results
Examples	Decision‑Tree inference, SVM, frozen CNN	Naïve Bayes, Gaussian Mixture, Bayesian Network, VAE decoder

2 · Vectors & Modalities 🔗

2.1 Vectors (Embeddings) 🔗

Definition · Numeric representations that encode the semantics of entities (words, images, users …).

Enable similarity search (cosine, dot), clustering, recommendation, retrieval‑augmented generation.
Learned via Word2Vec, GloVe, fastText, BERT, CLIP, etc.

2.2 Modal vs Multimodal 🔗

Modal = one input type (text or image or audio).
Multimodal = multiple types jointly (e.g., image + caption).
- Rising trend: vision–language models (e.g., GPT‑4V, Gemini), audio‑text (e.g., Whisper), video‑text.

2.3 Neural Language Models 🔗

Large neural nets pretrained on text with self‑supervised objectives.

Families: GPT‑n (decoder‑only), BERT (encoder‑only), T5 / BART (encoder‑decoder), LLaMA series.
Tasks: text generation, summarization, translation, code completion, search ranking.

3 · Transformer Family 🔗

3.1 Core Architecture (“Attention is All You Need”, 2017) 🔗

Input embeddings + positional encodings.
Repeated N × blocks:
- Multi‑head self‑attention → Add & LayerNorm.
- Position‑wise Feed‑Forward NN → Add & LayerNorm.
(Decoder adds masked self‑attention + cross‑attention to encoder outputs.)

3.2 Self‑Attention (Quick Intuition) 🔗

Each token forms a Query (Q) vector that is matched against Keys (K) of every token; the similarities weight the Values (V) to build a context‑aware representation.

3.3 Transformer Variants 🔗

Variant	Architecture	Flagship Models	Typical Tasks
Encoder‑only	Auto‑encoding	BERT, RoBERTa	Classification, QA, embeddings
Decoder‑only	Auto‑regressive	GPT, LLaMA‑2 Chat	Text/code generation
Encoder‑Decoder	Seq‑to‑Seq	T5, BART, Pegasus	Translation, summarization

3.4 Language‑Modeling Objectives 🔗

Objective	Context Used	Predicts	Archetype
Masked (MLM)	Bidirectional	Masked tokens	BERT
Autoregressive (AR)	Left‑to‑right (or right‑to‑left)	Next token	GPT
Prefix LM	Past tokens (unmasked) + full prefix	Next token	T5 pre‑training

4 · Learning Paradigms 🔗

Paradigm	Labels	Model Learns	Canonical Example
Supervised	Explicit	Map x → y	ImageNet classification
Unsupervised	None	Structure of x	Word2Vec, PCA, clustering
Self‑Supervised	Labels from data	Predict masked / future parts	GPT pre‑training, BERT MLM

5 · Model Adaptation & Fine‑Tuning 🔗

Technique	Data Need	Compute	Brief
Prompt Engineering	None (zero‑shot)	–	Steer behavior via instructions/examples
Supervised Fine‑Tuning (SFT)	Labelled pairs	High	Adjust all weights to task domain
LoRA / Adapters	Labelled pairs	Low–Med	Train tiny rank‑update layers; mergeable
RLHF	Human preference scores	Very High	Align model to helpful/safe outputs

6 · Prompt Engineering 🔗

6.1 Anatomy of a Good Prompt 🔗

Instruction → Context → Input → Output‑format → Tone/Role

6.2 Prompting Techniques 🔗

Technique	Best For	Key Idea
Zero‑Shot	Simple, common tasks	Ask directly
Few‑Shot	Pattern imitation	Give 2–5 exemplars
Chain‑of‑Thought	Reasoning/maths	“Let’s think step by step”
Self‑Consistency	Reliable CoT answer	Sample K reasoning paths, majority vote
ReAct	Tool‑using agents	Interleave reasoning & external actions
Tree‑of‑Thought	Complex planning	Explore multiple branches, backtrack
Retrieval‑Augmented (RAG)	Factual or domain answers	Retrieve docs → feed as context

6.3 Security Concerns in Prompting 🔗

Prompt injection / jailbreaks
Data leakage (keys, PII)
Prompt leakage (system prompt exposure)
Malicious content generation (spam, phishing, code exploits)
Token flooding / prompt DOS

Mitigations · input sanitation, guardrail LLMs, content filters, max‑token limits, rate‑limits, red‑teaming.

7 · Foundation Models 🔗

Large, self‑supervised, general‑purpose models adaptable to many downstream tasks.

Examples : GPT‑4 (text), CLIP (image+text), DALL·E (image), SAM (vision segmentation).
Benefits : reuse, performance, economy of scale.
Risks : bias, compute cost, ecological footprint, opacity.

8 · Scaling Laws & Emergent Abilities 🔗

Empirical power‑law links between loss ↔ parameters, data, compute (OpenAI, DeepMind).
Emergence : qualitative jumps (few‑shot learning, tool use) above certain scale (≈10 B+, 100 B+ params).
Implications : unpredictable behaviours, but strong generalization—drives interest in alignment & evals.

9 · Variational Autoencoder (VAE) — Quick Recap 🔗

Encoder → μ, σ² (latent distribution).
Reparameterization trick → sample z.
Decoder → reconstruct/generate x̂.
Loss = Reconstruction Loss + KL‑Divergence( q(z|x) ‖ N(0,1) ).

Strength	Why It Matters
Generative	New images/text variants
Smooth latent space	Interpolation, arithmetic
Structured	Semi‑supervised & controllable
Stable training	No adversarial collapse (vs GAN)

10 · Quick Cheat‑Sheet — Which Technique When? 🔗

Need a fast tweak? → Prompt Engineering.
Domain‑specific answers? → SFT or LoRA.
Politeness / helpfulness? → RLHF.
Up‑to‑date factuality? → RAG.
Creative image/text synthesis? → Diffusion, VAE, GAN.