Wednesday, August 6, 2014

How to use Apache Spark libraries that were compiled locally in your project

Officially released versions of Apache Spark libraries are in maven http://search.maven.org/, so you can always add dependencies to them in your project and maven will download them. See how to make a maven project that uses Spark libraries at avulanov.blogspot.com/2014/07/how-to-create-scala-project-for-apache.html. I want to use the latest build of Spark in my maven project, moreover, my custom build of Spark. There are at least two options of doing this. First is building Apache Spark with install or running `install` target for a particular Spark project:
  1. mvn -Dhadoop.version=1.2.1 -DskipTests clean install
  • Compile your local version of Apache Spark with  
  1. mvn -Dhadoop.version=1.2.1 -DskipTests clean package
  1. mvn install:install-file -Dfile=/spark/core/target/spark-core_2.10-1.1.0-latest.jar -DpomFile=/spark/core/pom.xml -DgroupId=org.apache.spark -Dversion=1.1.0-latest -DartifactId=spark-core_2.10
    • Reference the new version of this library in your pom.xml (1.1.0-latest)
    • There might be a problem with imports and there versions, so try to run mvn install (I run it in Idea IDE). In my case maven didn't like that asm and fz4 dependencies didn't have versions specified. Specify them if needed.

    No comments:

    Post a Comment