{"id":1752,"date":"2016-09-09T12:05:29","date_gmt":"2016-09-09T03:05:29","guid":{"rendered":"http:\/\/163.152.162.219\/wordpress\/?p=1752"},"modified":"2020-01-02T16:37:56","modified_gmt":"2020-01-02T07:37:56","slug":"visual-and-science-computing-in-mapreduce-framework","status":"publish","type":"post","link":"https:\/\/hvcl.korea.ac.kr\/?p=1752","title":{"rendered":"Visual and Science computing in MapReduce framework"},"content":{"rendered":"<h4><strong>\u00a0Vispark : GPU-Accelerated distributed visual computing using spark<\/strong><\/h4>\n<p>With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework su\u000bers from in exibility and ine\u000eciency inherent to its programming model and system architecture. In order to address these problems, we propose Vispark, a novel extension of Spark for GPU-accelerated MapReduce processing on array-based scienti\fc computing and image processing tasks. Vispark provides an easy-to-use, Python-like high-level language syntax and a novel data abstraction for MapReduce programming on a GPU cluster system. Vispark introduces a programming abstraction for accessing neighbor data in the mapper function, which greatly simpli\fes many image processing tasks using MapReduce by reducing memory footprints and bypassing the reduce stage. Vispark provides socket-based halo communication that synchronizes between data partitions transparently from the users, which is necessary for many scienti\fc computing problems in distributed systems. Vispark also provides domain-speci\fc functions and language supports speci\fcally designed for high-performance computing and image processing applications.<\/p>\n<pre>def meanfilter (data , x, y):\n  u = point_query_2d (data , x , y +1)\n  d = point_query_2d (data , x , y -1)\n  r = point_query_2d (data , x+1, y )\n  l = point_query_2d (data , x -1, y )\n  ret = (u+d+r+l) \/4.0\n  return ((x,y),ret )\n\nif __name__ == \" __main__ \":\n  sc = SparkContext ( appName =\" meanfilter_vispark \")\n  img = np. fromstring ( Image . open (\" lenna . png \"). tostring ())\n  imgRDD = sc. parallelize (img , Tag =\" VISPARK \")\n  imgRDD = imgRDD . vmap ( meanfilter (data , x, y). range (512 , 512) )\n  ret = np. array ( sorted ( imgRDD . collect ())) [: ,1]. astype (np. uint8 )\n  Image . fromstring (\"L\", (512 ,512) , ret . tostring ()). save (\" out .png \")<\/pre>\n<h6 style=\"text-align: center;\"><span style=\"color: #333333;\"><strong>Simple Vispark Mean image filter example code<\/strong><\/span><\/h6>\n<div>[bibtex file=woohyuk_2016_vispark.bib]<\/div>\n<div><\/div>\n<div>\n<hr \/>\n<\/div>\n<div>\n<h4 class=\"entry-title\"><strong>GPU in-memory processing using Spark for iterative computation<\/strong><\/h4>\n<\/div>\n<div>Due to its simplicity and scalability, MapReduce has become a de facto standard computing model for big data processing. Since the original MapReduce model was only appropriate for embarrassingly parallel batch processing, many follow-up studies have focused on improving the efficiency and performance of the model. Spark follows one of these recent trends by providing in-memory processing capability to reduce slow disk I\/O for iterative computing tasks. However, the acceleration of Spark\u2019s in-memory processing using graphics processing units (GPUs) is challenging due to its deep memory hierarchy and host-to-GPU communication overhead. In this paper, we introduce a novel GPU-accelerated MapReduce framework that extends Spark\u2019s in-memory processing so that iterative computing is performed only in the GPU memory. Having discovered that the main bottleneck in the current Spark system for GPU computing is data communication on a Java virtual machine, we propose a modification of the current Spark implementation to bypass expensive data management for iterative task offloading to GPUs. We also propose a novel GPU in-memory processing and caching framework that minimizes host-to-GPU communication via lazy evaluation and reuses GPU memory over multiple mapper executions. The proposed system employs message-passing interface (MPI)-based data synchronization for inter-worker communication so that more complicated iterative computing tasks, such as iterative numerical solvers, can be efficiently handled.<\/div>\n<div><\/div>\n<div><\/div>\n<p><a href=\"https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2016\/09\/\uc774\ubbf8\uc9c0-482.png\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-5103 aligncenter\" src=\"https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2016\/09\/\uc774\ubbf8\uc9c0-482-1024x442.png\" alt=\"\" width=\"800\" height=\"345\" srcset=\"https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2016\/09\/\uc774\ubbf8\uc9c0-482-1024x442.png 1024w, https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2016\/09\/\uc774\ubbf8\uc9c0-482-300x129.png 300w, https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2016\/09\/\uc774\ubbf8\uc9c0-482-768x331.png 768w, https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2016\/09\/\uc774\ubbf8\uc9c0-482.png 1417w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p>[bibtex file=hong_ccgrid_2017.bib]<\/p>\n<hr \/>\n<h4><strong>Distributed Interactive Visualization using GPU-optimized Spark<\/strong><\/h4>\n<div>With the advent of advances in imaging and computing technology, large-scale data acquisition and processing has become commonplace in many science and engineering disciplines. Conventional workflows for large-scale data processing usually rely on in-house or commercial software that is designed for domain-specific computing tasks. Recent advances in MapReduce, which was originally developed for batch processing textual data via a simplified programming model of the map and reduce functions, have expanded its applications to more general tasks in big-data processing, such as scientific computing and biomedical image processing. However, as shown in previous work, volume rendering and visualization using MapReduce is still considered challenging and impractical due to the disk-based, batch-processing nature of its computing model. In this paper, contrary to this common belief, we show that the MapReduce computing model can be effectively used for interactive visualization. Our proposed system is a novel extension of Spark, one of the most popular open-source MapReduce frameworks, that offers GPU-accelerated MapReduce computing. To minimize CPU-GPU communication and overcome slow, disk-based shuffle performance, the proposed system supports GPU in-memory caching and MPI-based direct communication between compute nodes. To allow for GPU-accelerated in-situ visualization using raster graphics in Spark, we leveraged the CUDA-OpenGL interoperability, resulting in faster processing speeds by several orders of magnitude compared to conventional MapReduce systems.<\/div>\n<div><\/div>\n<div style=\"text-align: center;\"><a href=\"https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2018\/09\/teaser.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-2293\" src=\"https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2018\/09\/teaser-1024x709.png\" alt=\"\" width=\"800\" height=\"554\" srcset=\"https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2018\/09\/teaser-1024x709.png 1024w, https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2018\/09\/teaser-300x208.png 300w, https:\/\/hvcl.korea.ac.kr\/wp-content\/uploads\/2018\/09\/teaser.png 1301w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/div>\n<div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u00a0Vispark : GPU-Accelerated distributed visual computing using spark With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework su\u000bers from in exibility and ine\u000eciency inherent to its programming model <a class=\"read-more\" href=\"https:\/\/hvcl.korea.ac.kr\/?p=1752\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[],"_links":{"self":[{"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/posts\/1752"}],"collection":[{"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1752"}],"version-history":[{"count":17,"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/posts\/1752\/revisions"}],"predecessor-version":[{"id":5109,"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=\/wp\/v2\/posts\/1752\/revisions\/5109"}],"wp:attachment":[{"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1752"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1752"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hvcl.korea.ac.kr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1752"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}