CSC 5241 - GPU Programming

4 lecture hours 0 lab hours 4 credits
Course Description
This course provides an introduction to GPU programming. Topics include parallel programming paradigms, CUDA programming model and libraries, code profiling, optimization strategies, GPU architecture, parallel algorithms, and applications of GPU acceleration. Students will implement linear algebra operations, image processing algorithms, application case studies, parallel algorithm patterns, and compare their implementations with CUDA libraries. The course ends with a multi-week team-based project.
Prereq: CSC 2210 or instructor consent (quarter system prereq: CS 2040 or consent of instructor)
Note: This course is open to qualified undergraduate students.
Course Learning Outcomes
Upon successful completion of this course, the student will be able to:
  • Learn to program massively parallel processors using the CUDA programming API, tools, and techniques
  • Design and develop algorithms that take advantage of highly parallel co-processors to solve technical and scientific problems
  • Learn principles and patterns of parallel algorithms
  • Learn NVIDIA GPU architecture features and constraints
  • Understand data-parallel hardware in order to develop efficient algorithms
  • Design numerical methods optimized for data parallel architectures
  • Leverage data-parallel hardware to process large data sets common in modern big data applications
  • Implement and analyze parallel algorithm patterns in the CUDA programming model
  • Identify performance bottlenecks in parallel code.
  • Improve performance by applying common parallel techniques
  • Compare the performance of a from-scratch parallel implementation to an existing CUDA library when applied for accelerating a unique applicatio
  • Review and analyze latest research papers on using GPU programming for accelerating scientific computing applications

Prerequisites by Topic
  • A working knowledge of the C/C++ programming language
  • Familiarity with basic linear algebra

Course Topics
  • Heterogeneous parallel computing
  • GPU architecture
  • Data parallelism
  • CUDA programming structure
  • Mapping threads to multidimensional data
  • Memory access efficiency
  • CUDA memory types
  • Warps and SIMD hardware
  • Floating-point data representation
  • Numerical stability
  • Parallel patterns

Laboratory Topics
  • Query of hardware resources and coding environment/tools
  • Linear algebra operations such as vector addition, matrix multiplication
  • Performance optimization leveraging shared and constant memory
  • Image processing algorithms, such as image blur, color to grayscale conversion, convolution, stencil
  • Using CUDA libraries
  • Common parallel programming patterns
  • Application case studies

Coordinator
Dr. Sebastian Berisha


Print-Friendly Page (opens a new window)