NVIDIA CUDA

GPU Computing using NVIDIA CUDA

Click here to now more about CUDA Click here

NVIDIA CUDA is one of the favorite GPU Computing platform for researchers.

Click here to know more about our training programs in OpenCL

GPGPU Using NVIDIA CUDA

Course Overview :

This course (HBCS102) is divided into two modules: Level "A" and Level "B" being the most advance course.

Level "A" is an introductory course on parallel programming with about 40% of the time devoted for CUDA programming. This level does not require any parallel computing knowledge. Only a Data structures level course is required. The course starts from C programming language, and covers the detail of Graphics card hardware ( GPU architecture, DRAM, PCIe, etc). Apart from these concepts we also cover elementary concepts in parallel programming and CUDA including "computational thinking", Algorithms, and some discussion on shared memory usage. Programming on Windows and Linux environment is also taught. Linux introduction is included so that the students and professionals can get all the benefits of working on open source OS.

Learning Outcome :The course aims at making the trainee understand how to write simple program such as squaring of (say) first 10000 integers, and such other simple CUDA programs., and compile the same on Linux and Windows.In short the candidate learns how to write simple CUDA programs and understand basic hardware and software details, without bothering about the performance. Also the discussion on Algorithms builds a solid base for getting a head in the world of parallel programming.
Level B is an advance course on CUDA programming. The training comprises of most of the features of CUDA 2.3 and 3.0.
Learning Outcome: The trainee learns Implementation and various optimization of complex algorithms, such as reduction and Pre-fix Sum (Scan), Matrix Transpose. This level aims at making the candidates ready to develop their own applications. Please refer to the detailed syllabus below.

SYLLABUS

LEVEL A

Contents

1.Fundamentals of C Programming Language

Building C Program on Linux
  • Linux basics
  • Compiling
  • Running a Program
2-Parallel Algorithms and Computational Thinking
3-Introduction to GPU Hardware
  • Modern GPU Architecture
  • Type of Memory
  • Difference between CPU and GPU
  • PCI-Express Vs PCI
4- Getting Started With Cuda

Installation, Driver, Sdk, Toolkit, Basic Programming Concepts, Mode of Parallel programming
CUDA Programming Model, Kernel, Calling Kernel on Device, Compiling and running a CUDA Program

5- Shared Memory Usage in CUDA

LEVEL B

LEVEL "B" discusses parallel programming concepts in detail giving specific focus on CUDA programming.Specifically you are exposed to the following special topics:
  • Performance metrics - speed-up, utilization, efficiency
  • Transparent Scalability
  • Memory organization in CUDA, Discussion on Pinned memory, texture memory & constant memory usage
  • Error Handling
  • CUDA events
  • Models of Parallel Computation: SIMD (Single Instruction Multiple Data), MIMD (Multiple Instruction Multiple Data),
  • GPU Compute Architecture
  • Memory Optimization (Removing Bank Conflicts in Shared Memory, Partition camping, Global Memory Optimization)
  • Using streams: Overlapping GPU and CPU tasks, Overlapping Computation with Memory Copy
  • Atomic Operations and their limitations
  • using occupancy calculator, CUDA profiler, Debugger
  • Performance Guidelines
  • CUBLASS & CUFFT usage
  • Thrust Library & CUDA Data Parallel Primitives Library (CuDPP)
  • Implementation of fast Matrix Multiplication, SCAN(Pre-fix sum) and reduction algorithms , matrix transpose . These algorithms are basic building blocks of many of the complex applications being developed today.
Each concept is explained with practical examples. In short this complete course will expose you to almost all of the features of CUDA 2.3, and 3.0 making you ready to write and optimise your own applications. The course will end up in a project as well.

After the training we can demonstrate CUDA applications to the following domains:
  1. Image Processing
  2. Artificial Neural Network
  3. Computational Finance
  4. Linear Algebra
  5. Pattern Matching

Target Audience:

Professionals, researchers and students with background in Mathematics, Computer Science, IT, Electrical, Electronics & Communications, and similar fields can enrol for this course.

Prerequisites:

For Level "A", the person should be familiar with the concepts of C programming language. Although the parallel programming will be taught in the training in Level "A", but some exposure to it will help you grasp the concept quickly.

Reference Books

Introduction to Algorithms, Third Edition

Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest and Clifford Stein

Introduction to Parallel Computing by Ananth Grama, George Karypis, Vipin Kumar and Anshul Gupta (Pearson)

CUDA programming Guide, CUDA Best Practice Guide (Download from nvidia.com)

GPU GEMS 3 by Hubert Nguyen

For any specific information or query contact us at info@hbeongpgpu.com