Mar 14, 2026  
2026-2027 Graduate Academic Catalog 
    
2026-2027 Graduate Academic Catalog
Add to Portfolio (opens a new window)

CSC 6607 - LLMs in Production

4 lecture hours 0 lab hours 4 credits
Course Description
Students will design and implement a holistic software system incorporating large language models (LLMs) components and then deploy and operate the system in a production environment.  Real-world use cases of LLMs of varying complexity will be introduced.  Workflows and design patterns including those for agentic AI and incorporation of multiple specialized LLMs will be discussed and their application practiced.  Infrastructure components for data processing, storage, and retrieval, and libraries and hardware for model fine tuning, serving, and evaluation will be discussed. Emphasis will be placed on system prediction performance, resource usage, operational cost, and reliability.  This course builds on and integrates previous course work in machine learning, LLMs, and microservices.
Prereq: CSC 5201  and (CSC  4611/CSC 5611 , CSC 6621 , or CSC 6714 ) or consent of instructor.
Note: None
Course Learning Outcomes
Upon successful completion of this course, the student will be able to:
  • Demonstrate working applications of LLMs on several real-world problems of varying difficulty
  • Apply prompt engineering techniques to solve a given task using an LLM
  • Design LLM workflows to solve complex, multi-step tasks using techniques such as prompt routing, prompt rewriting, orchestration patterns, retrieval augmented generation (RAG), tool calling with model context protocol (MCP), and agents
  • Design and implement a software system that uses a LLM workflow
  • Describe the impact of model size, model architecture, and serving configuration on storage and compute requirements, token generation rates, and prediction performance
  • Run, configure, and benchmark software for serving LLMs as RESTful services
  • Apply strategies to reduce model size and computational requirements such as quantization, pruning, and fine-tuning smaller models for specialized tasks
  • Deploy and operate a software system with a LLM component in a production environment
  • Use monitoring systems to detect operational and prediction failures
  • Identify potential sources of operational failures and mitigation strategies from a given system design
  • Describe security risks associated with LLMs, RAG, tool calling, and agentic AI

Prerequisites by Topic
  • Proficiency with Python programming
  • Proficiency with the Linux command-line
  • Microservices including distributed architectures and serving and calling RESTful APIs
  • Cloud computing technologies including packaging and running containerized applications
  • Applied training and inference using deep learning models

Course Topics
  • Configuring and operating software for serving LLMs
  • Considerations of prompt size, model architecture and size, quantization, pruning, fine tuning, and knowledge distillation on computational resource requirements of LLMs along with discussions of tradeoffs on accuracy and cost
  • Use of common REST APIs for querying LLMs
  • Patterns such as circuit breakers for reliability in the face of overload or failures
  • Engineering specialized prompts for individual tasks with considerations for security, output format, and accuracy of results
  • Formatting inputs to and outputs from LLMs in structured and unstructured formats
  • Use of memory and context engineering for maintaining session continuity
  • Use and operation of accelerator hardware (e.g., GPUs) for deploying and running LLMs
  • LLM workflow orchestration patterns and their applications to different tasks including serial pipelines of LLMs, prompt routing, and delegation of subtasks to specialized LLMs
  • Implementation of retrieval augmented generation (RAG) including document preparation, embedding, storage, retrieval, and prompting
  • Using LLMs to call tools and evaluate outputs in service of a task with considerations for security
  • Agentic AI techniques for allowing LLMs to make decisions and control loops
  • Monitoring a deployed system’s computational and predictive performance

Coordinator
Dr. RJ Nowling



Add to Portfolio (opens a new window)