CSC 6607 - LLMs in Production

4 lecture hours 0 lab hours 4 credits

Course Description
Students will design and implement a holistic software system incorporating large language models (LLMs) components and then deploy and operate the system in a production environment. Real-world use cases of LLMs of varying complexity will be introduced. Workflows and design patterns including those for agentic AI and incorporation of multiple specialized LLMs will be discussed and their application practiced. Infrastructure components for data processing, storage, and retrieval, and libraries and hardware for model fine tuning, serving, and evaluation will be discussed. Emphasis will be placed on system prediction performance, resource usage, operational cost, and reliability. This course builds on and integrates previous course work in machine learning, LLMs, and microservices.
Prereq: CSC 5201 and (CSC 4611/CSC 5611, CSC 6621, or CSC 6714) or consent of instructor.
Note: None

Course Learning Outcomes
Upon successful completion of this course, the student will be able to:

Demonstrate working applications of LLMs on several real-world problems of varying difficulty
Apply prompt engineering techniques to solve a given task using an LLM
Design LLM workflows to solve complex, multi-step tasks using techniques such as prompt routing, prompt rewriting, orchestration patterns, retrieval augmented generation (RAG), tool calling with model context protocol (MCP), and agents
Design and implement a software system that uses a LLM workflow
Describe the impact of model size, model architecture, and serving configuration on storage and compute requirements, token generation rates, and prediction performance
Run, configure, and benchmark software for serving LLMs as RESTful services
Apply strategies to reduce model size and computational requirements such as quantization, pruning, and fine-tuning smaller models for specialized tasks
Deploy and operate a software system with a LLM component in a production environment
Use monitoring systems to detect operational and prediction failures
Identify potential sources of operational failures and mitigation strategies from a given system design
Describe security risks associated with LLMs, RAG, tool calling, and agentic AI

Prerequisites by Topic

Proficiency with Python programming
Proficiency with the Linux command-line
Microservices including distributed architectures and serving and calling RESTful APIs
Cloud computing technologies including packaging and running containerized applications
Applied training and inference using deep learning models

Course Topics

Configuring and operating software for serving LLMs
Considerations of prompt size, model architecture and size, quantization, pruning, fine tuning, and knowledge distillation on computational resource requirements of LLMs along with discussions of tradeoffs on accuracy and cost
Use of common REST APIs for querying LLMs
Patterns such as circuit breakers for reliability in the face of overload or failures
Engineering specialized prompts for individual tasks with considerations for security, output format, and accuracy of results
Formatting inputs to and outputs from LLMs in structured and unstructured formats
Use of memory and context engineering for maintaining session continuity
Use and operation of accelerator hardware (e.g., GPUs) for deploying and running LLMs
LLM workflow orchestration patterns and their applications to different tasks including serial pipelines of LLMs, prompt routing, and delegation of subtasks to specialized LLMs
Implementation of retrieval augmented generation (RAG) including document preparation, embedding, storage, retrieval, and prompting
Using LLMs to call tools and evaluate outputs in service of a task with considerations for security
Agentic AI techniques for allowing LLMs to make decisions and control loops
Monitoring a deployed system’s computational and predictive performance

Coordinator
Dr. RJ Nowling

Announcement

The Raider Return

Milwaukee School of Engineering

Connect With MSOE

CSC 6607 - LLMs in Production