If you want to install local R packages on Cherry Creek, click here:
https://www.nscee.edu/cherry-creek-info-R-local.html
Using R 3.2.2 on Cherry Creek
Overview
The R package has been configure to use the Intel XEON PHI coprocessor
cards. In order successfully utilize the PHI cards, you must: request a PHI
(nmics=1) and include the "module" &
"export" statements must be included
in the job script before R is started.
Note: Except for short testing DO NOT run R directly on cherry-creek's
front-end, you must use one of the compute nodes.
Recommend Method for running R on cherry-creek
Using the following commands will allow you to use the PHI coprocessors for
increased performance (this is highly recommended -- see below for the speed
difference with and without using the PHIs).
The recommended method for running R is to submit a PBS batch job. However,
If you need to interact with your R script you will need to request an interactive PBS
session.
Running an Interactive R session
To run R interactive, you need to request that an interactive job be started.
From cherry-creek's head-node, issue a "qsub -I" command (all on one line) similar to:
qsub -V -I -q small -l ncpus=1,mem=10gb,nmics=1,cput=15:0
-l walltime=15:0
module load intel intelmpi R
export MKL_MIC_ENABLE=1
export MIC_OMP_NUM_THREADS=224
export OFFLOAD_REPORT=2
R
After you q() to exit R, you will need to exit the interactive
PBS session by typing exit.
This will put you back on cherry-creek's
head-node.
Performance comparison between "default gcc" build of R versus the PHI
optimized build.
The prebuilt executables for the Linux versions of R are built with the GNU* tools.
Unfortunately, this results in single-thread performance, even on multicore systems
with matrix operations that could be performed in parallel. The chart below shows
performance of R built with the Intel 14.0.1 compilers and Intel® Math Kernel Library
on Red Hat* 6.3 compared to a "default" build (i.e. no config options) using gcc 4.4.6.
The build with Intel® MKL runs matrix operations on multiple cores, so it is much faster
on those operations. The R benchmark-2.5 used is available at
http://r.research.att.com/benchmarks/R-benchmark-25.R
The matrix sizes were increased to reflect a larger workload size. The results show
R built with Intel® MKL is up to 15x faster than the gcc build. These results are
consistent with Intel's benchmark results.
(https://software.intel.com/en-us/articles/running-r-with-support-for-intel-xeon-phi-coprocessors
A copy of the modified version of the benchmark with the increased matrix sizes, is available at:
https://www.nscee.edu/R-benchmark-25.R-big
Test | Time for gcc build |
Time for icc/MKL build |
---|---|---|
Creation, transp., deformation of a 5000x5000 matrix |
3.75 | 3.14 |
5000x5000 normal distributed random matrix ^1000 | 3.34 | 1.59 |
Sorting of 14,000,000 random values | 1.95 | 1.83 |
5600x5600 cross-product matrix (b = a' * a) | 110.94 | 71.26 |
Linear regr. over a 4000x4000 matrix (c = a \ b') | 51.46 | 30.35 |
FFT over 4,800,000 random values | 0.74 | 0.67 |
Eigenvalues of a 1200x1200 random matrix | 6.37 | 3.05 |
Determinant of a 5000x5000 random matrix | 39.37 | 20.33 |
Cholesky decomposition of a 6000x6000 matrix | 42.82 | 16.15 |
Inverse of a 3200x3200 random matrix | 33.30 | 23.24 |
3,500,000 Fibonacci numbers calculation (vector calc) | 0.79 | 0.40 |
Creation of a 6000x6000 Hilbert matrix (matrix calc) | 0.78 | 0.81 |
Grand common divisors of 400,000 pairs (recursion) | 0.45 | 0.29 |
Creation of a 1000x1000 Toeplitz matrix (loops) | 1.89 | 1.76 |
Escoufier's method on a 90x90 matrix (mixed) | 11.77 | 6.30 |
Total | 309.71 | 181.16 |
The following commands were used to run the benchmark tests on the PHI:
qsub -I -l ncpus=1,mem=100gb,nmics=2,cput=1000:0:0 -l walltime=1:00:00 /bin/bash
module load R intel intelmpi
export MKL_MIC_ENABLE=1
export MIC_OMP_NUM_THREADS=224
export OFFLOAD_REPORT=2
R
source("R-benchmark-25.R-big")
q()
How the R package was built
The R 3.2.2 package was built using the following configuration:
module load intel intelmpi
./configure --prefix=/share/apps/R-3.2.2 --with-blas="-L/share/apps/intel/composerxe/mkl/lib/intel64 -lmklintel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm" --with-lapack CC=icc CFLAGS=-O2 CXX=icpc CXXFLAGS=-O2 F77=ifort FFLAGS=-O2 FC=ifort FCFLAGS=-O2
make
make check
sudo make install