UNLV National Supercomputing Institute

NSI Home About Us Our Affiliates Resources Documentation Current Research Current Issues New Account Reset Password Request Help Contact Us

If you want to install local R packages on Cherry Creek, click here:
https://www.nscee.edu/cherry-creek-info-R-local.html

Using R 3.2.2 on Cherry Creek

Overview

The R package has been configure to use the Intel XEON PHI coprocessor
cards. In order successfully utilize the PHI cards, you must: request a PHI
(nmics=1) and include the "module" & "export" statements must be included
in the job script before R is started.

Note: Except for short testing DO NOT run R directly on cherry-creek's
front-end, you must use one of the compute nodes.

Recommend Method for running R on cherry-creek

Using the following commands will allow you to use the PHI coprocessors for
increased performance (this is highly recommended -- see below for the speed
difference with and without using the PHIs).

The recommended method for running R is to submit a PBS batch job. However,
If you need to interact with your R script you will need to request an interactive PBS
session.

Running an Interactive R session

To run R interactive, you need to request that an interactive job be started.

NOTE: Running a job interactively requires that the resources that you
request to be available BEFORE the job will start! If there are not enough
resources to honor your request, the qsub command will "hang" on the
"waiting for job..." until they are available.

From cherry-creek's head-node, issue a "qsub -I" command (all on one line) similar to:

qsub -V -I -q small -l ncpus=1,mem=10gb,nmics=1,cput=15:0 -l walltime=15:0 module load intel intelmpi R export MKL_MIC_ENABLE=1 export MIC_OMP_NUM_THREADS=224 export OFFLOAD_REPORT=2 R

After you q() to exit R, you will need to exit the interactive PBS session by typing exit.
This will put you back on cherry-creek's head-node.

Performance comparison between "default gcc" build of R versus the PHI
optimized build.

The prebuilt executables for the Linux versions of R are built with the GNU* tools.
Unfortunately, this results in single-thread performance, even on multicore systems
with matrix operations that could be performed in parallel. The chart below shows
performance of R built with the Intel 14.0.1 compilers and Intel® Math Kernel Library
on Red Hat* 6.3 compared to a "default" build (i.e. no config options) using gcc 4.4.6.
The build with Intel® MKL runs matrix operations on multiple cores, so it is much faster
on those operations. The R benchmark-2.5 used is available at

http://r.research.att.com/benchmarks/R-benchmark-25.R

The matrix sizes were increased to reflect a larger workload size. The results show
R built with Intel® MKL is up to 15x faster than the gcc build. These results are
consistent with Intel's benchmark results.
(https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors

A copy of the modified version of the benchmark with the increased matrix sizes, is available at:
https://www.nscee.edu/R-benchmark-25.R-big

Test	Time for gcc build	Time for icc/MKL build
Creation, transp., deformation of a 5000x5000 matrix	3.75	3.14
5000x5000 normal distributed random matrix ^1000	3.34	1.59
Sorting of 14,000,000 random values	1.95	1.83
5600x5600 cross-product matrix (b = a' * a)	110.94	71.26
Linear regr. over a 4000x4000 matrix (c = a \ b')	51.46	30.35
FFT over 4,800,000 random values	0.74	0.67
Eigenvalues of a 1200x1200 random matrix	6.37	3.05
Determinant of a 5000x5000 random matrix	39.37	20.33
Cholesky decomposition of a 6000x6000 matrix	42.82	16.15
Inverse of a 3200x3200 random matrix	33.30	23.24
3,500,000 Fibonacci numbers calculation (vector calc)	0.79	0.40
Creation of a 6000x6000 Hilbert matrix (matrix calc)	0.78	0.81
Grand common divisors of 400,000 pairs (recursion)	0.45	0.29
Creation of a 1000x1000 Toeplitz matrix (loops)	1.89	1.76
Escoufier's method on a 90x90 matrix (mixed)	11.77	6.30
Total	309.71	181.16

The following commands were used to run the benchmark tests on the PHI:

qsub -I -l ncpus=1,mem=100gb,nmics=2,cput=1000:0:0 -l walltime=1:00:00 /bin/bash module load R intel intelmpi export MKL_MIC_ENABLE=1 export MIC_OMP_NUM_THREADS=224 export OFFLOAD_REPORT=2 R source("R-benchmark-25.R-big") q()

How the R package was built

The R 3.2.2 package was built using the following configuration:

module load intel intelmpi ./configure --prefix=/share/apps/R-3.2.2 --with-blas="-L/share/apps/intel/composerxe/mkl/lib/intel64 -lmklintel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm" --with-lapack CC=icc CFLAGS=-O2 CXX=icpc CXXFLAGS=-O2 F77=ifort FFLAGS=-O2 FC=ifort FCFLAGS=-O2 make make check sudo make install

NOTE: Running a job interactively requires that the resources that you request to be available BEFORE the job will start! If there are not enough resources to honor your request, the qsub command will "hang" on the "waiting for job..." until they are available.

NOTE: Running a job interactively requires that the resources that you
request to be available BEFORE the job will start! If there are not enough
resources to honor your request, the qsub command will "hang" on the
"waiting for job..." until they are available.