Monday, March 2, 2015

Unzip files from network drive and copy to hadoop

Data is zipped on some shared drive. Goal is to unzip and copy some files to Hadoop.
Mount the drive:
mkdir data
sudo mount --verbose -t cifs //some.net/shared/data -o username=user,password=**** data
Unzip&copy script:
#!/bin/bash
for ARCHFILE in ~/data/*.7z
do
   echo $ARCHFILE;
   7za l $ARCHFILE | grep '.txt\|.bin' | awk -F' ' '{print $NF}' |
   while read -r -a MYFILE ; do
      7za e $ARCHFILE $MYFILE;
      FF=$(basename $MYFILE);
      echo $FF
# place here the corresponding Hadoop folder name
      $HADOOP_HOME/bin/hadoop dfs -copyFromLocal $FF /data/$FF ;
      rm $FF;
   done
done

No comments:

Post a Comment