Apache Hadoop is the implementation of MapReduce and Apache YARN is a job scheduler for Hadoop. YARN will take care of running your Hadoop jobs in parallel on all your worker nodes.
You can run the Quobyte services on the same machines as the Hadoop workers (hyperconverged) or on dedicated machines. Both scenarios have advantages, see our blog post on Dedicated storage nodes or not?
Step 1: Prerequisites
Step 2: Install and Configure Quobyte
Step 3: Quobyte Policies for Hadoop
Step 4: Prepare the Hadoop Install
Before we can start with the actual installation we have to prepare the cluster by installing java, creating a user hadoop and enable ssh login with a key from the master node. Pick any of the machines to be your master node, then execute the follwing steps:
Step 5: Install Hadoop
Finally, we can install Hadoop itself:
Step 6: Configure Hadoop and YARN
Next, we have to configure Hadoop to use Quobyte as the default file system and YARN. Execute the following steps on the master node.
Step 6: Test the Quobyte HDFS Driver
Step 7: Start YARN and run a Test
The last step is to start YARN and submit a mapreduce job to the scheduler.
You can now run distributed hadoop jobs directly on Quobyte and benefit from the easy data sharing between Hadoop, S3 and the Linux world.