Technical notes: August 2015

The process of Spark docs generation for Linux is described in https://github.com/apache/spark/blob/master/docs/README.md. It might be a little tricky on Windows. You need to install Ruby 2.0, two jems then Python 2.7 and two jems. Nice docs for the first step is at http://jekyll-windows.juthilo.com/.

Get Ruby 2.0.0 for your architecture http://rubyinstaller.org/downloads/. Install it and check "Add Ruby executables to your PATH"
Get RubyDev from the same site. Unzip it to C:\RubyDevKit. Install it as follows. Before running the "install" command (last one), check that "config.yml" contains the path to your Ruby installation. Add it, if missing.

cd C:\RubyDevKit
ruby dk.rb init
ruby dk.rb install

Install "jekyll" as follows specifying your proxy if needed:

gem install --http-proxy http://proxy:port jekyll
gem install --http-proxy http://proxy:port jekyll-redirects-from
groupadd hadoop
usermod -a -G hadoop hduser

Get Python 2.7 from https://www.python.org/ and install it. Make sure that "Python" and "Python\Scripts" folders were added to your PATH.
Install "pygments" and "sphinx". To use proxy, you need to have environment variable "http_proxy" with your proxy:port.

pip install pygments
pip install sphinx

If everything was OK, you will be able to generate docs from docs folder. Lets' skip API docs:

cd %SPARK_HOME%\docs
set SKIP_API=1
jekyll build

It will create a folder "_site" with all docs generated in HTML

Technical notes

Monday, August 17, 2015

Generating Spark docs on Windows