CSC 6714 - Large Language Models

4 lecture hours 0 lab hours 4 credits

Course Description
This course introduces large language models (LLMs) from the ground up. Beginning with a discussion of the implementation of a modern LLM and its training, the course includes discussion of the applications of LLMs and a mechanism-aware discussion of prompt engineering. By the end of the course, students will have a concrete understanding of how LLMs perform computations and be able to apply that knowledge to use LLMs to solve real-world tasks.
Prereq: (MTH 2340, MTH 5810, or similar course work) and (CSC 2621, CSC 5610, or similar coursework) or instructor consent
Note: None

Course Learning Outcomes
Upon successful completion of this course, the student will be able to:

Describe the inference process for a token-prediction transformer model including token generation
Explain the structure and mechanics of the transformer architecture and its components, including recent advances
Train a small transformer model from scratch to predict the next token in a sequence
Compare techniques for training transformer models including pretraining, instruction fine-tuning, and reinforcement learning
Demonstrate working applications of LLMs on real-world problems of varying difficulty
Use advanced prompting techniques such as chain of thought and structured output to improve LLM performance on a task
Evaluate the performance of an LLM on tasks using approaches such as perplexity and AI as judge

Prerequisites by Topic

Python programming experience including the use of data science and machine learning libraries
Able to perform and interpret matrix and vector arithmetic including addition, dot products, and matrix-vector multiplication
Able to train and apply various classical machine learning models for classification and regression problems
Able to design and execute experiments to evaluate machine learning models
Able to interpret metrics such as accuracy, precision, and recall to evaluate model prediction performance

Course Topics

The modern transformer architecture including embedding components
Process for inference and token generation from transformer architectures
Computational efficiency of and optimizations for transformer inference
Techniques for training LLMs as foundation and instruction-following models
Examples of LLM applications (varies depending on instructor preference)
Foundational and advanced prompt engineering techniques such as prompt structure, chain of thought, few-shot learning, and structured output
Evaluation of LLMs including challenges caused by and approaches for evaluating open-ended output

Coordinator
Dr. Josiah Yoder

Announcement

The Raider Return

Milwaukee School of Engineering

Connect With MSOE

CSC 6714 - Large Language Models