Wednesday, March 25, 2015

CBLAS compilation as a shared library

Reference CBLAS from Netlib compiles into static library (http://www.netlib.org/blas/blast-forum/cblas.tgz). I needed a shared library. 
First, you need to download and compile BLAS, since CBLAS is just C interface to Fortran BLAS:
wget http://www.netlib.org/blas/blas.tgz
tar xzvf blas.tgz
cd BLAS
make
It will produce static library "blas_LINUX.a".
Next, you need to download and configure CBLAS:
wget http://www.netlib.org/blas/blast-forum/cblas.tgz
tar xzvf cblas.tgz
cd CBLAS
Replace the corresponding variables in "Makefile.in" with:
BLLIB = /path_to_compiled_BLAS/blas_LINUX.a CBLIB = ../lib/cblas_$(PLAT).so
CFLAGS = -O3 -DADD_ -fPIC
FFLAGS = -O3 -fPIC
ARCH = gcc
ARCHFLAGS = -shared -o
Finally, make CBLAS:
make
It will produce shared library "cblas_LINUX.so".

Monday, March 9, 2015

Serialize classes or models is Apache Spark

Normal serialization does work but the deserialized objects cannot be mapped to RDD, i.e. their functions cannot be applied to RDD. Hack:
sc.parallelize(Seq(model), 1).saveAsObjectFile("path") val sameModel = sc.objectFile[YourCLASS]("path").first()

Monday, March 2, 2015

Unzip files from network drive and copy to hadoop

Data is zipped on some shared drive. Goal is to unzip and copy some files to Hadoop.
Mount the drive:
mkdir data
sudo mount --verbose -t cifs //some.net/shared/data -o username=user,password=**** data
Unzip&copy script:
#!/bin/bash
for ARCHFILE in ~/data/*.7z
do
   echo $ARCHFILE;
   7za l $ARCHFILE | grep '.txt\|.bin' | awk -F' ' '{print $NF}' |
   while read -r -a MYFILE ; do
      7za e $ARCHFILE $MYFILE;
      FF=$(basename $MYFILE);
      echo $FF
# place here the corresponding Hadoop folder name
      $HADOOP_HOME/bin/hadoop dfs -copyFromLocal $FF /data/$FF ;
      rm $FF;
   done
done