

The term GPU was popularized by NVIDIA in 1999, who marketed a computer graphics card called GeForce 256 as “The world’s first GPU”. However, the card was mainly designed for rendering of high-end computer graphics and enhancing computer-based gaming performance. In contrast, today’s GPUs also provide an inexpensive platform for developing and executing high-performance non-graphical applications. The development of general-purpose applications on GPUs is often termed as GPGPU (General Purpose Computing on GPUs) and a number companies have started to produce GPUs that are capable of general purpose computation.
CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. When it was first introduced, the name was an acronym for Compute Unified Device Architecture, but now it’s only called CUDA. Among the various kinds of GPUs, Tesla is a popular development platform, however latest GPUs like Fermi K80, Pascal, Volta are also not very different in terms of basics. All these devices allow programmers to develop applications using an easily programmable C-like language, called CUDA.
CUDA has several advantages over traditional general-purpose computation on GPUs (GPGPU) using graphics APIs: Scattered reads, Unified virtual memory (CUDA 4.0 and above), Unified memory (CUDA 6.0 and above), Shared memory, faster downloads and readbacks to and from the GPU, full support for integer and bitwise operations, including integer texture lookups.
The CUDA platform is designed to work with programming languages such as C, C++, and Fortran. It has wrappers for many other languages including Python, Java and so on. This accessibility makes it easier for specialists in parallel programming to use GPU resources, in contrast to prior APIs like Direct3D and OpenGL, which required advanced skills in graphics programming.