The process of Spark docs generation for Linux is described in https://github.com/apache/spark/blob/master/docs/README.md. It might be a little tricky on Windows. You need to install Ruby 2.0, two jems then Python 2.7 and two jems. Nice docs for the first step is at http://jekyll-windows.juthilo.com/.
- Get Ruby 2.0.0 for your architecture http://rubyinstaller.org/downloads/. Install it and check "Add Ruby executables to your PATH"
- Get RubyDev from the same site. Unzip it to C:\RubyDevKit. Install it as follows. Before running the "install" command (last one), check that "config.yml" contains the path to your Ruby installation. Add it, if missing.
cd C:\RubyDevKit
ruby dk.rb init
ruby dk.rb install
- Install "jekyll" as follows specifying your proxy if needed:
gem install --http-proxy http://proxy:port jekyll gem install --http-proxy http://proxy:port jekyll-redirects-from groupadd hadoop usermod -a -G hadoop hduser
- Get Python 2.7 from https://www.python.org/ and install it. Make sure that "Python" and "Python\Scripts" folders were added to your PATH.
- Install "pygments" and "sphinx". To use proxy, you need to have environment variable "http_proxy" with your proxy:port.
pip install pygments pip install sphinxIf everything was OK, you will be able to generate docs from docs folder. Lets' skip API docs:
cd %SPARK_HOME%\docs set SKIP_API=1 jekyll buildIt will create a folder "_site" with all docs generated in HTML