Introduction to Large Language Models
ISBN: 9789363864740
484 pages
For more information write to us at: acadmktg@wiley.com

Description
Introduction to Large Language Models (LLMs) is a comprehensive guide for understanding the foundations and advancements of Generative AI for Text. Designed for educators and enthusiasts, the book starts with key linguistic concepts and progresses through NLP fundamentals—from word embeddings to pretrained foundational models.
Readers will learn how LLMs process and generate language, overcome limitations, and enhance performance using techniques like prompt engineering, retrieval-augmented generation, and human alignment. The book uniquely presents cutting-edge research in a concise format, enriched with visual aids, exercises, and practical resources.
Ideal for computer science faculty, this resource offers both theoretical insights and real-world applications, showcasing how LLMs like ChatGPT are transforming technology and advancing AI innovation.
Endorsement
Preface
Acknowledgement
Foreword
1 Introduction
1.1 What is a Language Model?
1.2 Evolution of Language Modelling Technologies
1.3 Scaling Laws in Language Models
1.4 Evolution of LLMs
1.4.1 The Emergence and Development of LLMs
1.4.2 Implications of Encoder-Decoder in LLM Development
1.4.3 Optimising Scale and Resource Efficiency in LLMs
1.5 Organisation of the Book
Additional Resources
Bibliography
2 An Overview of Natural Language Processing and Neural Networks
Part I: Natural Language Processing
2.1 Computational Linguistics and Natural Language Processing
2.2 Overview of the Natural Language Processing Pipeline
2.3 Morphology
2.3.1 Morphemes
2.3.2 Stemming
2.3.3 Lemmatisation
2.3.4 Lexicon
2.4 Tokenisation
2.4.1 Advanced Techniques: Subword Tokenisation
2.5 Syntactics
2.6 Semantics
2.7 Introduction to Language Modelling
Part II: Neural Networks
2.8 The Perceptron
2.8.1 Definition
2.8.2 Implementing AND, OR, and XOR Logic
2.9 Multilayer Perceptron
2.9.1 Neural Networks
2.9.2 Types of Activation Functions
2.10 Training Neural Networks
2.10.1 Backpropagation
2.10.2 Batching
2.10.3 Hyperparameters
2.10.4 Regularisation
2.11 Vanishing and Exploding Gradients
2.12 Evaluation Metrics
2.13 Summary
Additional Resources
Exercises
Bibliography
3 Word Embedding
3.1 Distributional Hypothesis
3.2 Vector Semantics
3.2.1 Defining and Measuring Semantic Similarity
3.3 Types of Word Embedding
3.3.1 Frequency-Based Embeddings
3.3.2 Word2Vec
3.3.3 Global Vectors for Word Representation
3.3.4 FastText
3.4 Bias in Word Embedding
3.5 Limitations of Word Embedding Methods
3.6 Applications of Word Embeddings
3.7 Summary
Additional Resources
Exercises
Bibliography
4 Statistical Language Model
4.1 Statistical Language Model
4.1.1 The Conditional Probability
4.1.2 The Chain Rule of Probability
4.1.3 The Markov Assumption
4.1.4 Unigram Language Model
4.1.5 Bigram Language Model
4.2 Smoothing
4.2.1 The Unknown Tokens
4.2.2 Smoothing
4.2.3 Back-Off
4.2.4 Interpolation
4.2.5 Good-Turing
4.3 Evaluation of Language Model
4.3.1 Extrinsic Evaluation
4.3.2 Intrinsic Evaluation
4.3.3 Human Evaluation
4.3.4 Evaluation Metrics
4.3.5 Benchmark Suits
4.4 Limitations of Statistical Language Models
4.5 Summary
Additional Resources
Exercises
Bibliography
5 Neural Language Models
5.1 Convolutional Neural Networks
5.1.1 Components of CNNs: Kernel, Stride, Pooling, and Padding
5.1.2 Hierarchical and Dilated Convolutions
5.1.3 Applications of CNNs in NLP
5.2 Recurrent Neural Networks
5.2.1 Training RNNs
5.2.2 Applications of RNNs
5.2.3 Challenges in Sequence Modelling
5.2.4 RNN Variants: LSTM, GRU, and Bidirectional RNNs
5.3 Sequence-to-Sequence Models
5.3.1 Training Sequence-to-Sequence Models
5.3.2 Inference Decoding
5.3.3 Applications of Sequence-to-Sequence Models
5.4 Attention Mechanisms
5.4.1 Introduction to Attention
5.4.2 Advantages of Attention
5.4.3 Variants of Attention
5.5 Limitations of Neural Language Models
5.6 Summary
Additional Resources
Exercises
Bibliography
6 Transformers
6.1 Self-Attention
6.1.1 Multi-Head Self-Attention
6.2 Transformer Encoder Block
6.2.1 Components of the Transformer Encoder Block
6.2.2 Feed-Forward Neural Network
6.2.3 Layer Normalisation
6.2.4 Residual Connections
6.3 Transformer Decoder Block
6.3.1 Masked Multi-Head Self-Attention
6.3.2 Cross-Attention (Encoder-Decoder Attention)
6.4 Positional Embeddings
6.4.1 Types of Positional Embeddings
6.4.2 Rotary Position Embedding
6.5 Efficient Attention Mechanisms
6.5.1 KV Caching in Multi-Head Self-Attention
6.5.2 Multi-Query Attention
6.5.3 Grouped-Query Attention
6.5.4 Sliding Window Attention
6.6 An Alternate Formulation of Transformers
6.6.1 Residual Stream Perspective of Transformers
6.6.2 Attention Heads: Reading and Writing
6.6.3 Feed-Forward Networks: Transformation of Residual Streams
6.6.4 Prediction Head: Generating the Next Token
6.6.5 Decomposing the Transformer: Attention and Feed-Forward Contributions
6.6.6 Residual Networks as Shallow Ensembles
6.6.7 Interpreting the Mechanism of LLMs
6.7 Summary
Additional Resources
Exercises
Bibliography
7 Language Model Pretraining
7.1 Embeddings from Language Model
7.1.1 Architecture and Training of ELMo
7.1.2 Applications of ELMo
7.1.3 Limitations of ELMo
7.2 Evaluation Datasets
7.3 Encoder-Based Pretraining
7.3.1 Fundamentals of Encoder-Based Models
7.3.2 Training Paradigm
7.3.3 BERT Pretraining
7.3.4 Applications and Limitations
7.4 Decoder-Based Pretraining
7.4.1 Decoder-Based Architecture
7.4.2 Training Paradigm
7.4.3 GPT Pretraining
7.4.4 Applications and Limitations
7.5 Encoder-Decoder Based Pretraining
7.5.1 Architecture
7.5.2 Joint Pretraining Strategy
7.5.3 T5 Pretraining
7.5.4 Applications and Limitations
7.6 Emergence of Large Language Models
7.7 Limitations of Pretraining
7.8 Summary
Additional Resources
Exercises
Bibliography
8 Fine-Tuning and Alignment of LLMs
8.1 Moving from Pretraining to Fine-Tuning
8.2 Fine-Tuning on Various Task-Specific Applications
8.2.1 Sequence Classification
8.2.2 Pairwise Sequence Classification
8.2.3 Sequence Labelling
8.2.4 Learning Spans
8.2.5 Challenges in Classical Fine-Tuning Methods
8.3 Instruction Tuning
8.4 Alignment Methods
8.4.1 Reinforcement Learning from Human Feedback
8.4.2 Direct Preference Optimisation
8.5 Summary
Additional Resources
Exercises
Bibliography
9 Prompting Strategies in LLMs
9.1 Prompt Engineering
9.1.1 Prompt Shape
9.1.2 Manual Template Engineering
9.1.3 Automated Template Learning
9.1.4 Continuous Prompts
9.2 Prompt Application
9.2.1 In-Context Learning
9.2.2 Knowledge Probing
9.2.3 Classification-Based Tasks
9.2.4 Information Extraction
9.2.5 Reasoning in Natural Language Processing
9.2.6 Question Answering
9.2.7 Text Generation
9.2.8 Automatic Evaluation of Text Generation
9.3 Chain-of-Thoughts
9.4 Tree-of-Thoughts
9.5 Graph-of-Thoughts
9.6 Summary
Additional Resources
Exercises
Bibliography
10 Efficient Methods for Fine-Tuning LLMs
10.1 Model Compression with Knowledge Distillation
10.1.1 White-Box Knowledge Distillation
10.1.2 Meta Knowledge Distillation
10.1.3 Black-Box Knowledge Distillation
10.2 Model Compression Techniques
10.2.1 Model Pruning
10.2.2 Model Quantisation
10.3 Parameter-Efficient Fine-Tuning
10.3.1 Adapters
10.3.2 Prefix Tuning
10.3.3 Prompt Tuning
10.3.4 Selective PEFT Techniques
10.3.5 Reparameterisation-Based PEFT Techniques
10.3.6 Hybrid Approaches for Efficient Fine-Tuning
10.4 Efficient Strategies for Fine-Tuning LLMs
10.4.1 Mixed-Precision Tuning
10.4.2 Data Selection for Efficient Fine-Tuning
10.4.3 Prompt Compression
10.5 Summary
Additional Resources
Exercises
Bibliography
11 Augmented Large Language Models
11.1 Retrieval-Augmented Generation
11.1.1 Indexing in RAGs
11.1.2 Context Searching in RAGs
11.1.3 Prompting in RAGs
11.1.4 Inferencing in RAGs
11.1.5 Comparison of RAGs with LLMs
11.2 Evaluation of RAGs
11.2.1 Assessing of Retrieval Quality
11.2.2 Generation Quality
11.2.3 Knowledge Integration and Factuality Evaluation
11.2.4 Response Time and Efficiency
11.2.5 User Satisfaction
11.2.6 RAGAs Framework for RAG Evaluation
11.3 Tool Calling with LLMs
11.3.1 Autonomously Determining Which Tools to Use and Where
11.3.2 Examples of Different Tools
11.3.3 Evaluation of Code Generation Capabilities of Agents
11.3.4 Error Handling and Optimisation
11.4 LLM Augmentation with Agents
11.4.1 Reasoning in LLM Agents
11.4.2 Planning in LLM Agents
11.4.3 Handling Memory in LLM Agents
11.5 Summary
Additional Resources
Exercises
Bibliography
12 Multilingual and Multimodal LLMs
12.1 Multilingual Language Models
12.1.1 The Evolution of Multilingual NLP
12.1.2 The Need for Multilingual LLMs
12.1.3 Cross-Lingual Representation Learning
12.1.4 Applications
12.2 Multimodal Language Models
12.2.1 Integration of Diverse Modalities
12.2.2 Applications
12.3 Training Multilingual and Multimodal LLMs
12.3.1 Efficient Data Collection and Preprocessing
12.3.2 Model Training Strategies
12.4 Addressing Challenges in Multilingual and Multimodal LLMs
12.4.1 Challenges in Multilingual LLMs
12.4.2 Challenges in Multimodal LLMs
12.5 Future Directions and Emerging Trends
12.6 Limitations of Multilingual and Multimodal LLMs
12.7 Summary
Additional Resources
Exercises
Bibliography
13 Responsible LLMs
13.1 Inaccurate, Inappropriate, and Unethical Behaviour of LLMs
13.2 Responsible AI
13.3 Bias
13.3.1 Visibility of Bias
13.3.2 Source of Bias
13.4 Bias Mitigation
13.5 Summary
Additional Resources
Exercises
Bibliography
14 Advanced Topics in Large Language Models
14.1 Reasoning with LLMs
14.1.1 Advancements in Reasoning Capabilities
14.1.2 Challenges in Reasoning with LLMs
14.1.3 Types of Reasoning Tasks
14.1.4 How Do LLMs Approach Reasoning?
14.1.5 Evaluating Reasoning Abilities in LLMs
14.2 Handling Long Context in LLMs
14.2.1 Challenges in Processing Long Context
14.2.2 Training and Fine-Tuning Approaches to Extend Context Length
14.2.3 Evaluation of Long-Context LLMs
14.3 Model Editing
14.3.1 Conditions for Successful Editing
14.3.2 Methods for Model Editing
14.3.3 Metrics for Evaluation of Model Editing
14.4 Hallucination in LLMs
14.4.1 Definition
14.4.2 Sources of Hallucination
14.4.3 Metrics Measuring Hallucination
14.4.4 Hallucination Mitigation
14.5 Self-Evolving LLMs
14.5.1 Conceptual Framework
14.5.2 Evolution Objectives and Techniques
14.5.3 Challenges
14.6 Summary
Additional Resources
Exercises
Bibliography
15 LLMs in Action
15.1 An Overview of the Landscape
15.1.1 Tracing the Evolution and Importance of LLMs in Contemporary AI
15.1.2 Open-Source vs Closed-Source Paradigms: Benefits and Trade-offs
15.2 A Panoramic View of LLMs
15.2.1 General-Purpose Large Language Models
15.2.2 Language-Specific LLMs
15.2.3 Domain-Specific LLMs
15.2.4 Task-Specific LLMs
15.3 Diverse Applications of LLMs
15.3.1 Healthcare: Enhancing Diagnostics and Patient Care
15.3.2 Finance: Transforming Data Analysis and Risk Management
15.3.3 Legal: Streamlining Research and Case Management
15.3.4 Education: Personalised Learning and Academic Support
15.4 Emerging Trends and Future Directions in LLMs
15.4.1 Beyond Text: The Advent of Multimodal LLMs
15.4.2 Autonomous Agents: The LLM Leap in AI Evolution (AutoGPT)
15.5 Summary
Additional Resources
Exercises
Bibliography
Index