GPU are initials standing for Graphic Processing Unit, a single-chipped processor that boosts and manages graphics and video performance (AMD 34). Its features include texture mapping, graphics which are either 2-D or 3-D, digital outputs that are used to display monitors of flat panels, application support used for graphics software which are of high intensity, including AutoCAD, YUV color space support, MPEG decoding, hardware overlays, and rendering polygons (Lindholm & Nickolls 21). The features are meant to produce faster graphics and videos while at the same time lessening the CPU work. The processor also finds its use in display 1adaptors, mobile phones, game consoles, and workstations. At times, Visual processing unit is used to mean GPU (Asanovic, Bodic, & Catanzaro 71).
Overview of GPU
Development of the first GPU occurred back in 1999 (Moss 33). It was designed by NVIDIA and named GeForce 256. The number of polygons that this model was able to process in a single second was 10 million (AMD 34). The number of transistors in this model was over 22 million. This was a single-chipped processor with BitBLT support, triangle set-up, lighting effect, and rendering engines that were integrated. The popularity of GPUs occurred following demand increase for graphic applications. Owing to this, they became a necessity for PC’s optimum performance (AMD 34). Chips which have specialized log in today enhance faster video as well as graphic implementation. GPU is attached to CPU. It does not have a connection to motherboard. The connection between random access memory (RAM) and GPU is through Peripheral component interconnect express (Moss 33). When there is connection between GPU and the motherboard, the performance of the GPU becomes slow and poor.
A large number of GPUs have their transistors used for 3-D computer graphics (Asanovic, Bodic, & Catanzaro 76). However, some contain accelerated memory that aids in mapping vehicles. A good example is applications in Geographical information system (Lindholm & Nickolls 12). GPU has numerous uses in computers, yet its cost is cheap. Presently, new models of GPU have been developed which manifest greater quality. Some of the applications, for example, Computer-aided Design (CAD) have the capacity to process operations that exceed 200 billion in a single second. In one second, CAD has the ability to deliver polygons no less than 17 million. GPU is used by scientists and engineers for calculated studies which are more in-depth. This group of people use matrix as well as vector features of the GPU (Asanovic, Bodic, & Catanzaro 73).
GPU has been taking advantage of increased number of transistors. This is despite the fact that there may have been a change in the on-chip and off-chip memory balance. Moore’s Law having dominated computing is expected to double the number of transistors in the future (Moss 33). This will lead to a sustained rapid rise in processing cores number in each GPU (AMD 34). There is feasibility of rise of processing core, high above what is achieved today. The affordability of GPU has led to sale of over 100 million of them. This figure is far much higher than the previous machines that were parallel to it. Such machines included MP-2 and MasPar’s. There is a possibility that software engineers will incorporate GPU to teraflop performance. This is for other use apart from entertainment.
History of GPU
The invention of GPU occurred in 1999 by NVIDIA. GPU has continued being the most persistent parallel processor (Moss 33). Its evolvement as a processor has been driven by the desire for up-to-date graphics. GPU outdoes CPU as far as arithmetic output and memory bandwidth are concerned. This makes GPU the most ideal processor that has the ability to accelerate different types of applications parallel data. Since 2003, effort has been made to utilize GPU for applications which are non-graphical (Moss 33). There have been porting of numerous data to GPU, which has been made possible through the use of languages which are of high-level shading. Such languages include OpenGL, Cg, as well as DirectX. Numerous problems that include SQL queries, protein folding, and MRI, among others, have been solved through the use of GPU. Prior to development of GPU, there was use of GPDPU (Asanovic, Bodic, & Catanzaro 72). Though the model of GPGPU proven to have great speedups, it was faced with several setbacks.
These setbacks included the need to have problems expressed with the consideration of vertex coordinates and shader programs. This made the program more complex (AMD 33). GPU also lacked double precision support. This meant that running of a number of scientific programs on GPU would not have been possible (Moss 33). Two vital technologies were introduced by NVIDIA with the aim of addressing the problem of using GPU. The first technology was computer architectures and G80 unified graphics. They were introduced in Quodro FX 560, GeForce 8800, and Tesla C870 GPU. The second technology was CUDA, a hardware and software architecture, which allowed GPU run programs based on programmable languages (Lindholm & Nickolls 24). These technologies added ingredient to GPU use. The GPU was improved in such a way that, through the use of CUDA extension, C programming could now be written. The name given to the new development was GPU computing (Moss 34). This new development signified application support that was broader, a separation from the previous GPGPU programing model, and a broader programming language support (Asanovic, Bodic, & Catanzaro 71).
The computing model of GPU was developed by NVIDIA from GeForce 8800. This model of computing came into existence in November, 2006. GPU computing received a boost in terms of innovations brought by G80 GeForce 8800 (AMD 33). G80 led to development of single instruction multiple threads (SIMT). Apart from that, there was introduction of barrier synchronization as well as shared memory for communication between threads. This was by G80 which was also the first one to give support to C. This support allowed programmers to use GPU power (Lindholm & Nickolls 22).
Major revisions were introduced by NVIDIA to the architecture of GPU. This led to the development of unified architecture that was of second generation-G200 and led to an increase in the number of CUDA cores to 140 from 128. In the development of GPU of new generation, the philosophy of NVIDIA has been improvement of already existing program (Asanovic, Bodic, & Catanzaro 75).In the year 2001, application developers were exposed by NVIDIA to VS/T &L stage instructions (Moss 33). There was extension of programmability to the stage of shader. Exploration of GPUs use in solving computing problems led to the development of GPGPU. The reason behind designing GPGPU was data graphic processing. GPGPU had several limitations. It experienced problems when API came into the picture. There was limited texture in the addressing mode. Apart from this, the capability of the shader experienced limited outputs. The bit ops as well as integer were missing in the instruction sets. Apart from that, there was limited communication between pixels as well as between scatter (AMD 33).
History of GPGPU
GPU originally was a disruptive, inflexible, and inaccurate technology. This meant that it needed a new design. The initial design of GPU generated slow speed and low performance in computing (AMD 33). The design was also not perfect for market (Moss 33). There was, however, evolvement of a new model which was better fit for the market. NVIDIA carried out a research that aimed at improving the old model of GPU (Moss 33). This occurred in 2006. The new model which was given the name GPGPU had far more advantages compared to the previous model (Asanovic, Bodic, & Catanzaro 71). This new model was perfect to the market and was also demonstrating high speed. Currently, all the supercomputers are connected to the GPU and this has given them much of their power. Currently, both software and hardware have harmoniously developed as a result of research, which has contributed to the enhancement of the model.
There have been numerous adjustments as far as this technology is concerned. GPU has a complete new program, totally different from the traditional GPU and CPU. The new program has an enhanced speed compared to CPU and GPU (Moss 33). The execution model of GPU is likened to the idea kernel launching on a grid consisting of blocks. Each of these blocks has a number of threads. Threads sharing the same block are bound to synchronize and cooperate through the use of shared memory, which is fast (Asanovic, Bodic, & Catanzaro 74). The hardware is then mapped making the block run on a single multiprocessor. A single multiprocessor has the ability to execute several blocks in a time-sliced fashion (Lindholm & Nickolls 16). The grids and blocks determine the thread number to be used. There is a unique identifier for each thread attached to the block. On the other hand, a global identifier, which is unique, identifies each block (Moss 36). All these are combined to develop a unique identifier for each thread that is global. GPU’s architecture, which is massively threaded, is used to make memory latencies appear hidden. Though the memory bandwidth of CPU is vastly superior, it also takes a long time to connect to the memory (AMD 33).
There has been an active research in the field of computers. The research dates back in 1978, when machines such as Ikonaand pixel-planes 5 were widely researched on. The last several years have seen a wide use of GPU-related machines, which has contributed to rise in the count of research in the field of graphic hardware (Asanovic, Bodic, & Catanzaro 71). This research has been based on how the model of the GPU could be improved to ensure that it is suited to the continuous change in technology (Moss 33). There has been use of graphic hardware that are programmable and this has been occurring in graphic application ream. There have been several other researches touching on graphic hardware. They include rasterization hardware usage for planning robot motion and pixel SIMD computer graphics, among others.
NVIDIA has been in the center of most of these researches. The discovery of GPGPU did not stop the company from doing more research (Lindholm & Nickolls 21). The driving force behind this research was the desire to see improved products in the market. GPU hardware evolution has moved from single core to highly parallel, as well as programmable set of core. The GPU technology trend has kept on increasing programmability and parallelism to a core of GPU architecture, which is evolving to a direction of more CPU-like cores of general purpose (Moss 33). The generation of future GPU is likely to look like a general purpose CPU. There was yet another development in the field of computing, following the introduction of CTM (for close to metal) in November 2006 (Asanovic, Bodic, & Catanzaro 71). There was a change of interface by this industry following this development. Effective use of GPU and CTM became powerful; open architectures were programmable and similar to today’s CPUs. Following the opening of the architecture, developers were provided with deterministic, low-level, and repeatable hardware success, which was conducive to development of essential tools, for example debuggers, compliers, application platforms, and math libraries (AMD 33). Following CTM introduction, a new class of applications was availed. There has been establishment of General purpose on GPU (GPGPU) recently. The use of GPGPU in the computer science field has continued to rise. The use of GPGPU as artificial intelligent (AI) has been documented. Recently, at GECCO and CIGPU conferences held in 2010, about 25 papers on GPGPU use in AI were published (Moss 35). The use of Generic Program as part of GPU is new. There were various early implications in joining GPU to GP. There is consideration by game manufacturers to utilize GPU in a bid to reduce CPU load. With the user, the load is the result of artificial intelligence that is essential for driving autonomous agents, which are provided by the computer to compete with the user during the game. GPU has also been used to speed up neutral network. A general purpose GPU is fast, reliable, and powerful (Lindholm & Nickolls 24).
Overview of general purpose GPU
A general purpose GPU was developed by NVIDIA following intensive research. It had emerged that GPU was faced with numerous challenges, such as low speed. This model, in contrast, has a high processing power (Asanovic, Bodic, & Catanzaro 71). Developers ensured that it led to the betterment of AI and game physics. The model has a limitation when it comes to the processing of non-graphics. There has been a lot of development as far as the graphic development is concerned (AMD 33). Present architectures of graphics provide remarkable computational horsepower and memory bandwidth. A good example is NVIDIA GeForce 6800 ultra which has the ability to achieve 35.2GB/sec, while playing the role of memory bandwidth (Chong, 34).
GPUs are currently closing the gap with processor technology (Moss 33). An example of this is GPGPU whose number of transistors exceeds 300 million (Asanovic, Bodic, & Catanzaro 71). GPGPU is not only fast but also accelerating quickly. This hardware’s performance has been boosted by the new discoveries that are being witnessed each now and then. Currently, GPU lack some essential computer constructs, for example, operands of integer data (Moss 33).This plus several operations, such as bit-shifts and bit-wise logical operations, reduce the working capacity of GPU. GPGPU do experience various challenges despite programmability advance, high language level and graphic hardware (Moss 33). There must be recasting of computing by the use of a programmer of underlying hardware. However, the advantages of GPGPU are more than the disadvantages (Asanovic, Bodic, & Catanzaro 71). This application makes use of 3D application and requires high computing rate (Lindholm & Nickolls 24). Making custom hardware, which takes native parallelism advantage, ensures that there is high graphic performance compared to GPU. The graphic computation of GPGPU has led to an increase in visual realism, which is vital for rendered images. In the last six years, there has been transformation of pipeline to make it a programmable pipeline (Moss 33). Focus of this transformation has been on geometry and fragment stage. There is use of fragment processor by a typical GPGPU (Lindholm & Nickolls 24). The fragment is used by GPGPU as an engine in GPU. GPGPU has thus developed into stream processors. As opposed to GPU, which make use of 2D application, this programmer uses 3D (AMD 33).
Apart from videogames and scientific research¸ GPU has several other uses. There are high chances that people’s daily life is in one way or another affected by GPU (Chong 34). GPU has been part of the computers we use today, cars, and videos, among other uses. The running service of GPU is what most mobile phone applications rely upon (Aslett 12). There is also use of GPU by stores while doing analysis of retail or web data. GPU also finds its use in websites, mainly for accuracy purposes. Another group of people who use GPU are engineers, who prefer it due to its efficacy (Chong 34). The benefits of GPU are increasing each now and then.
The cores of CUDA are similar to those of graphic processors (Howes 23). CUDA has been mostly used in games to make the quality of the game better. This technology is among the latest to be developed by this company and it has been catching up so fast (Heaton 55). There are various CUDA products that are enabled by CUDA. They include GeForce, Quadro, and Tesla. The purpose of Quadro is to provide high resolution graphics (Aslett 12). GeForce comes in variety of high performance products at very low prices due to a stiff competition in the market of gaming (Chong, 34). Finally, Tesla also supports high performance computing and parallel processing. Tesla has been made available in form of plug-in cards as well as several other things brought together by CUDA, including:
- CUDA is a hardware which is massively parallel and its purpose is to run non-graphic (generic) codes. It has the right drivers and proper design, which allow it to perform that role (Howes 23).
- The language used for programming in CUDA is C. Its assembly language is designed in the way that it can be a target by many programming languages.
- CUDA is a form of software with the ability to develop kits. Among the kits developed by the use of CUDA are numerous debugging, libraries, compiling various tools, as well as profiling.
The main reason behind development of CUDA is writing codes which run on parallel SIMD, which have massively parallel architectures (Aslett 12). Among them are GPU types and other hardware which are non-GPU such as NVIDIA Tesla. This massively parallel hardware has the capacity to run huge numbers of operations in one second, unlike CPU (Chong, 34). The cost of running as well as the one for purchase is fare compared to that of CPU. It is thus better to consider buying this hardware. Apart from that, the hardware is known to improve the performance which is 50 times more compared to the performance of CPU (Aslett, 12). In order to avoid emulating other hardware, this hardware avoids the use of pixel that is common in other hardware, which occurs depending on the situation. NVIDIA has even been willing to give support to general-purpose parallelism (Heaton 55). This support is to benefit its hardware (Chong 34). It appears as if there will be hacking specifically of the GPU, if such a move happens. It also appears that if the move aims at using a vendor-supported technology. Owing to this, its adoption appears easier, mainly when stakeholders who are non-technical are present (Howes, 23).
CUDA works as follows. The first step is downloading the SDK after carefully reading the manual (Howes 23). Performing this task becomes easier if one is familiar with the C (Chong 34). One is required to purchase a hardware which is CUDA-compatible. If not in position to do so, it is important to make use of emulator. The ultimate point here is the performance. Another step is trying out the codes (Howes 23). It is important to note that not every program can be benefited by CUDA. Instead, one can make use of CPU, which has the ability to perform numerous complex operations in a number that is relatively small (Aslett 12). On the other hand, it becomes much better to use GPU instead of CPU. This is because GPU operates at a faster speed compared to CPU. GPU also has the ability to perform simple tasks at a very fast pace (Heaton, 55). That is why it becomes vital to use CUDA together with GPA, as this will greatly improve the performance. CUDA is a vital tool chain for development that has the ability to create programs, which can be run on both NVidia API and GPU, which is mostly common when one is controlling those programs from CPU (Aslett 11). In considering the benefits of programming by looking at either GPU or CPU, one will notice some problems which are highly parallelizable. One can also gain substantial speedups, with the magnitude of approximately two times (Howes, 23). However, there are numerous problems which are either impossible or difficult to resolve when it comes to formulation. In a way, that ensures there is suitability when it comes to parallelization.
To some extent, CUDA appears to be straightforward, as one can use C, which is a regular tool for program creation (Chong 34). It is, however, important to take into account several things, which include numerous details of architecture of Tesla GPU account. These details are of low level and are aimed at achieving good performance. Many people experience confusion when it comes to CUDA and API languages (Howes 23). CDA has the ability to make GPGPU computing elegant as well as simple, because it is a parallel computing program. The supported languages used for programming CUDA are FORTRAN, C, and C++, among others (Howes 23). This is what most programmers use, though at times, they employ supported language that has been expanding (Heaton 55). There is also the incorporation of language extension, which occurs through the use of several key words (Howes 23). The key words allow the developer to express substantial amount of parallelism. In addition, the compiler is directed to application portion that has the GPU mapping (Chong 34).
It is easy to learn how one can use CUDA for programming. This is because of the availability of self-study webinars and exercises available at CUDA’s website for developers (Howes, 23). Apart from FORTRAN, C, and C++ tool kits, there are numerous libraries that are vital for GPU, as well as other approaches for programming (Aslett 12). CUDA has been shown to increase the performance of computing, as it has the ability to harness GPU processing (Chong 34). The program was introduced in 2006, and its deployment has since then been wide. This has been through numerous applications and research papers that have been published (Heaton 55). In addition to this, there has been a support of CUDA and this has been from a base that has more than one million GPUs that are CUDA enabled. This occurs in form of notebooks, supercomputers, and computer clusters (Aslett 12). Developers of software, researchers, and scientists can also support GPU acceleration and this can occur through various approaches (Chong 34). These individuals can replace or supplement CPU libraries, for example, MKL, IPP, BLAS, and FFTW, among others. In addition, they have the ability to parallelize FORTRAN or C code loops automatically through the use of directives of open ACC for acceleration (Chong 34). Finally, they have the ability to develop algorithms that are custom parallel through the use of programming language that is familiar, for example, Python, C, and C++ (Howes, 23).
CUDA has various uses. It is used together with GPU and, owing to this, the performance and latency of the computer become higher (Aslett 13). CUDA has the ability to intensively offload computational algorithm portions. CUDA and GPU also find their use in medical fields, where they aid in accelerating algorithmic processing of images (Howes 23). CUDA is also simple and this has made many people like it (Aslett 12). When it comes to current speed-up, CUDA and GPU become the best choices. Due to this, they have found their uses in the field of flight operations. Their use in this field occurs when the company uses its codes on GPU multiples by using MPI, open MP, as well as CUDA (Howes 23). In addition to that, CUDA and Tesla have been qualified by various researchers as a disruptive technology used for solving various problems in clinical research (Heaton, 55). This is because it has the ability to do multiple tasks, such as doubling the number of cells done research on. The results can now be submitted within a short period of time (Chong 34).
In addition, GPU and CUDA have been used in motion designs through the use 3D ray-traces, which are GPU-accelerated, thus compositing capability of workflow (Howes 23). As a result, artists of motion graphics can now quickly design geometric texts and quick shapes (Chong, 35). They do this in a 3D space. Owing to this, there has been reduction in time costs and the price of the process. Another area that CUDA and GPPU have found their use in is the development of photorealistic computer-generated images. They can also be used in seismic applications, which are most common in oil and gas industry (Aslett 12).
Advantages of using CUDA with GPGPA
- The cost of buying CUDA-GPU products is low compared to other products. The reduced cost is due to completion which comes from other gamers. A good example is a plug-in card with NVIDIA GPU called is multiple processors and priced below $500.
- When it comes to revolution, the technology of NVIDIA CUDA has greatly helped in the revolution of computing since its invention in 2006 (Howes 23).
- The technology is fast and cheap and has a high potential to produce the desired results. NVIDIA CUDA technology plays an important role in computing. It is essential that computers should be enabled with CUDA prior to buying, as this will be of great benefit to the users.
- Computer Unified Device architecture (CUDA) enables developers of applications in writing codes. This can then be developed into an NVIDIA-based card through uploading for excursion done by NVIDIA’s GPUs, which are massively parallel.
- As compared to CPU, GPU-enabled CUDA has the ability to process large amounts of data.
- CUDA can perform visual and physics simulations much faster compared to the CPU. It allows for solving so many mathematic problems, as well as physics plus games simulation.
- With little amount of money, it is possible to use CUDA in achieving true performance of supercomputer on desktop.
- CUDA can offer a great performance gain
Limitations of using GPU-enabled CUDA
- Cards which are CUDA-enabled have a high price because of what they bring in return. This is more so because NVIDIA reaps a lot when it comes to gaming.
- There is a need to have NVIDIA cards installed in various systems (Chong 34). The best CUDA-enabled cards that are used for writing are mainly of 200 and 400 series. There is a growing pressure by the gaming industry on CUDA to produce more cards.
- There is a need for sufficient speed for various PC system and power support for NVIDIA card. For instance, there is the need for the memory card to be faster so as to handle the bandwidth of CUDA (Howes 23). In addition, suppliers of power must be in a position to offer enough power for running the NVIDIA cards. In addition, there is a need for extra power cables of PCI-E. There is also a need to seek for advice from gamer which is technologically obsessed (Aslett 12).
- The up-to-date drivers of NVidia widows must be installed. Such drivers can be downloaded from the NVidia website. The drivers install automatically the required software to be used by CUDA.
- When running Windows at 64 bits, there is also need for installation of 64-bit NVidia, as well as the drivers which are CUDA enabled (Chong 34).
- It is extremely difficult to write massive algorithms that are parallel and aimed at implementing special functions. Owing to this, there is only a small fraction of them getting implemented within the manifold that has the ability to leverage CUDA. Numerous more are underway (Howes 23).
- Existing functions which are CUDA enabled in the manifold are dialogue operations that are service transforming. As part of the service tool, the surface-transform dialogue ensures that manifolds are extended. Lack of service tool extension will lead to the inability to use the dialogue and hence the CUDA leverage inability (Chong 34). New updates and releases of future manifolds are likely to increase CUDA usage (Heaton 55).This is on top of operators of surface-transform dialogue. CUDA cards executed functions are practically instantaneous, when they are compared to execution speed in the key processor (Howes 23). Nevertheless, the stream processors of NVidia have the ability to execute tasks at a fast pace. It is thus difficult to offer data at a fast rate from the disk and memory with the main of making sure that processors are occupied (Aslett 11). What results is the limited performance when it comes to data drawing from other memory or hard disk, since the processor speed does not limit the performance (Chong 34). Apart from that, there is a large portion of several tasks which computation does not bind, but rather involves tasks which are overhead, e.g., re-computing level, result writing to the disk, and other tasks which CUDA processors does not accelerate (Heaton 55). As a consequence, the processors which are CUDA-enabled will increase their speed in a visible way (Howes 23).
- Much can be drawn from CUDA, but this will depend on the capacity of other machines in use not slowing down the capability to feed stream processors of NVidia with insatiable power. For maximal speeds, it is vital to use Windows of 64 bits on machines which are quad core and have several RAMS (Chong 34). In addition, the machines should have fast and large disk drives. Prior to the configuration of a new system of 64 bits, it is important for one to check the website of NVIDIA to ensure that there is availability of the 64-bit drivers. These drivers are important for operating systems of Windows that one is planning to install (Howes 23).
Though faced with the above limitations, CUDA remains a groundbreaking technology. (Aslett 12). Although many developers have commented that the NVIDIA processor is superfluous, the GPUs of NVIDIA are very fast and consistently rated as fastest processors. They have the ability to perform jobs 300 times faster compared to Intel CPUs, which are known to be fastest.
Programming Concept of GPGPU
GPGPU is presently a vital, high performing platform for scientific computing. By definition, CUDA is a language of high level. There is various software that needs installation prior to using GPU (Chong 34). There is need for openCL driver to complement GPU. OpenCL defines the C-like language that has its use in GPU’s small programs writing (Aslett 12). The name given to the small programs is kernels. There can never be direct communication between java as well as C# and openCL. There is thus a need to use an intermediary by name binding. For java number one, JOCL binding is used; CLOO binding is used for C # 1.
The programming of GPU is about parallel programming (Aslett 12). This programming is a computer programming key trend. With GPU programming, there is a chance to use a graphic card. This is similar to using CPU. Computing and graphic cards can be used together. There are several limitations to GPGPU and they include the following (Chong 34):
- Lack of official language’s programming languages
- Need for programming GPU portions in a language which is C-like.
- Isolation in execution of kernels
- GPU is structured through the use of graphic pipeline. The pipeline is meant to improve the computing rate of the hardware.
There are several stages dividing the hardware, with each stage based on geothermal primitive (Chong 34). The input is composed of geometric list, which is given an expression of object coordinates (AMD 33). The pipe’s stage one ensures that each vertex is transformed to become screen space. It also ensures that there is assembly of vertices into triangles. The output, on the other hand, is composed of triangles in screen spacing. Stage two involves determination of the screen position which each triangle covers (Asanovic, Bodic, & Catanzaro 71). This stage is commonly called rasterization. Stage three is called fragmentation. This stage is involved in color computing for all stages. It makes use of interpolated values derived from geometry stage. Global memory can also act as a source of value for this computation (Moss 33). Global memory values are in texture form. In general, the purpose of the fragment stage is generation of addresses for texture memory, fetching of texture values associations, and use of texture values for computing fragment color. The final stage entails assembling of compositions and fragments into pixel images (Moss 35).
It is hard to do parallel programming (Aslett 12). In most cases, the programming of GPU is similar to a grid, as opposed to traditional thread process which is multicore. It is most apparent in mixing GPGPU tasks with those of CPU. When one is dealing with a task which is not parallel, the use of GPU is likely to be difficult. The use of GPGPU in tasks which are mathematic intensive is clear (Chong 34). OpenCL applications can be destroyed by some programming tasks that occur in java. A good example of that is the if-statement. There is also a challenge when it comes to management of local memory (Aslett 12).
General Trend in GPGPU
The general purpose GPUs is gaining popularity, which has raised questions as to how the architectural evolvement of GPGPU will be like in future (Moss 33). NVIDIA, which developed GPGPU in 2006, has continued researching on how to improve the existing model. Single-core processors’ performance has grown tremendously with time. The development of GPU has paved way for more technological advancement. Since 1999, when GPU was developed, there has been a growth as far as computation is concerned. The introduction of a general-purpose GPU has seen an improvement in graphics (AMD 33). This growth has led to the heterogeneous environment of computing. There is increased accessibility of GPGPU programmers and this has led to a rise in GPGPU use in accelerating computations that are parallel; for example, image processing, linear algebra, molecular dynamics, finance, as well as graph traversal (Lindholm & Nickolls 24). Moore’s Law will continue finding its use in GPGPU. As a way of ensuring that GPGPU growth trend is steady, NVIDIA is planning to use dark-silicon, which will improve the architectural design of GPGPU (Esmaeilzadeh 12). Architectural parameters that are of high level will be the focus of future GPGPU. Several improvements have been made on GPGPU. The old versioned CPU is currently being wiped out. Apart from GPGPU development, several other accessories have been developed and they include FPGA. Those new developments have led to an increased computing speed (Asanovic, Bodic, & Catanzaro 71). Networking solutions have also been made. GPGPU technology has continued making an impact on various fields, such as business, medicine, and engineering, among other fields. This is because of its ability to solve complex problems using little amount of time (Moss 36). Apart from that, it has managed to constrain memory (Lindholm & Nickolls 24). As earlier indicated, GPGPU has been using 3D graphics. The core numbers are high while using GPGPU. There is also the reduction of cost while using GPGPU. Apart from that, power consumption has decreased while using this programmer. In conclusion, a lot of changes have occurred in the field of computing. The use of GPGPU plus other programmers, such as CUDA, have continued gaining tremendous popularity (AM 35).