Course Catalog

AI Model Development

95-864

Units: 6

Description

This is a core class for AIM students and they will receive priority for enrollment. If seats remain after AIM students are registered, those on the waitlist will be invited.

Large Language Models (LLMs) are reshaping how we reason, generate structured outputs, and build intelligent systems. At the core of these capabilities are transformer-based architectures and scalable pipelines for adapting, evaluating, and deploying models in production. AI Model Development is a rigorous, hands-on course that teaches students how to design, refine, and operationalize LLM-based systems across diverse real-world applications.

The course opens with a focused review of ML/DL foundations and transitions rapidly into LLM fundamentals—tokenization, embeddings, attention, and decoding. Students learn to engineer prompts, construct retrieval-augmented generation (RAG) pipelines, adapt models using fine-tuning and parameter-efficient methods (LoRA, QLoRA), and evaluate safety, alignment, and performance with tools like MMLU, GSM8K, and pass@k.

In later modules, students develop agentic workflows by integrating tools, memory, and reasoning into LLM systems. Domain-specific applications are explored in software engineering and biosciences, including protein design with AlphaFold and AlphaEvolve. As one student noted, “NLX/LLM is quite useful as its contents are closely related to the latest trends of AI and the job requirements of most AI/ML engineer positions.”

The course is structured around three hands-on assignments and a cumulative final project. These are designed to showcase practical fluency in LLM pipelines—from architecture decisions to fine-tuning, deployment, and evaluation. Projects frequently draw on cutting-edge toolchains and real-world design constraints. One student remarked, “The projects provide good opportunities to apply what we learned into industry-level use cases.”

By the end of the course, students will be able to build, adapt, and evaluate modular LLM systems, balance performance and cost in deployment, and bring architectural clarity to complex design choices. Prior experience with Python and foundational machine learning is required; experience with PyTorch or TensorFlow is recommended.

Learning Outcomes

Upon completion of this course, students will be able to:

Understand the Model Development Lifecycle for Large Language Models (LLMs): Map each stage of developing LLM systems—from tokenization, architecture selection, and adaptation, to evaluation, deployment, and iteration—while identifying key success metrics and risks.
Apply and Explain the Foundations of Transformer Models: Describe the role of embeddings, attention mechanisms, positional encodings, and decoder-based autoregression; implement and analyze core transformer components and decoding strategies.
Engineer Prompts and Build Retrieval-Augmented Systems: Design zero-shot, few-shot, and chain-of-thought prompts; construct and tune RAG (Retrieval-Augmented Generation) pipelines with chunking, dense/hybrid retrieval, reranking, and grounding.
Adapt and Fine-Tune Models with Supervised and Efficient Techniques: Perform supervised and instruction fine-tuning; apply parameter-efficient methods such as LoRA and QLoRA; compare tradeoffs in performance, scalability, and resource usage.
Evaluate Model Performance, Alignment, and Robustness: Use benchmarks such as MMLU, GSM8K, and pass@k; measure toxicity, hallucination, and groundedness; design reproducible evaluation pipelines; align model behavior with task intent.
Integrate Tools, Memory, and Reasoning in LLMs: Develop tool-using agents with structured outputs and planning; extend models with memory and reasoning mechanisms including vector memory, reflection, and tree-of-thought prompting.
Deploy and Optimize Inference Systems: Improve performance through batching, caching, quantization, and speculative decoding; monitor and manage latency, cost, and throughput in real-time deployments.
Explore Advanced and Domain-Specific AI Applications: Assess model use in domains such as healthcare, biosciences, and software engineering; analyze agentic scientific workflows and understand regulatory, privacy, and safety constraints.

This course requires introduction to machine learning and deep learning as prerequisites.

Syllabus

Syllabus (Anand Rao - S26)