CSC 4241 - GPU Programming

2 lecture hours 2 lab hours 3 credits
Course Description
This course provides an introduction to GPU programming. Topics include parallel programming paradigms, CUDA programming model and libraries, code profiling, optimization strategies, GPU architecture, parallel algorithms, and applications of GPU acceleration. Students will implement linear algebra operations, image processing algorithms, application case studies, parallel algorithm patterns, and compare their implementations with CUDA libraries. The course ends with a multi-week team-based project.
Prereq: CSC 2210  or CPE 2600  (quarter system prereq: CS 2040 or CS 3210)
Note: None
This course meets the following Raider Core CLO Requirement: None
Course Learning Outcomes
Upon successful completion of this course, the student will be able to:
  • Learn to program massively parallel processors using the CUDA programming API, tools, and techniques
  • Design and develop algorithms that take advantage of highly parallel co-processors to solve technical and scientific problems
  • Learn principles and patterns of parallel algorithms
  • Learn NVIDIA GPU architecture features and constraints
  • Understand data-parallel hardware in order to develop efficient algorithms
  • Design numerical methods optimized for data parallel architectures
  • Leverage data-parallel hardware to process large data sets common in modern big data applications
  • Implement and analyze parallel algorithm patterns in the CUDA programming model
  • Identify performance bottlenecks in parallel code
  • Improve performance by applying common parallel techniques

Prerequisites by Topic
  • A working knowledge of the C/C++ programming language
  • Familiarity with basic linear algebra

Course Topics
  • Heterogeneous parallel computing
  • GPU architecture
  • Data parallelism
  • CUDA programming structure
  • Mapping threads to multidimensional data
  • Memory access efficiency
  • CUDA memory types
  • Warps and SIMD hardware
  • Floating-point data representation
  • Numerical stability
  • Parallel patterns

Laboratory Topics
  • Query of hardware resources and coding environment/tools
  • Linear algebra operations such as vector addition, matrix multiplication
  • Performance optimization leveraging shared and constant memory
  • Image processing algorithms, such as image blur, color to grayscale conversion, convolution, stencil
  • Using CUDA libraries
  • Common parallel programming patterns
  • Application case studies

Coordinator
Dr. Sebastian Berisha


Print-Friendly Page (opens a new window)