Exploiting parallelism in processing large scale multi-dimensional datasets
▣ Title : Exploiting parallelism in processing
large scale multi-dimensional datasets
▣ Speaker
: Beomseok Nam (UNIST Assistant Professor)
▣ Date
& Time : Friday, March 22 (2:00 ~ 3:30pm)
▣ Place
: LG Research Building, Room #101
▣ Host
: Prof. Sungjoo Yoo (Tel. 2379)
▣ Abstract :
This
talk will present two different ways of exploiting parallelism in processing
large-scale multi-dimensional datasets. The general purpose computing on
graphics processing unit (GP-GPU) has emerged as a new cost effective parallel
computing paradigm in high performance computing research that enables large
amount of scientific data to be processed in parallel. A common access pattern
into such scientific data analysis applications is multi-dimensional range
query, but inherently multi-dimensional indexing trees such as R-Trees are not
well suited for GPU environment because of their irregular tree traversal
patterns. Traversing irregular tree search path makes it hard to maximize the
utilization of massively parallel processing units in GPU. In this talk, I
would introduce two novel R-tree traversal algorithms for traversing
multi-dimensional indexes, which convert recursive access to sequential access
into hierarchical tree nodes.
The
second half of this talk would discuss how to leverage cached data in
distributed cache infrastructure using task parallelism. As more servers are added to distributed and
parallel systems, larger memory space becomes available for caching data
objects. However the cached objects are dispersed and traditional query
scheduling policies that take into account only load balancing do not
effectively utilize the increased cache space. This talk would introduce and
compare batch job scheduling policies that employ statistical prediction
methods and probability distribution estimations derived from recent queries in
order to improve both load balancing and cache hit ratio in shared-nothing
environment.