How does the Native Quobyte HDFS Driver work?
The driver implements the HDFS API, which means that it has the same API as HDFS but replaces it completely. Your big data applications, such as Hadoop, Spark or Hive, talk to Quobyte instead of HDFS, but won’t notice the difference. This also means that you won’t have to change your applications at all.
The native Quobyte HDFS driver comes as a rpm or deb package and contains the Java based plugin, the actual libquobyte that implements the Quobyte client and the necessary dependencies.
The applications talk to the Quobyte driver via one of the two file system interfaces in Hadoop: The FileSystem or the AbstractFileSystem. The Quobyte driver translates the calls from the applications into RPCs to the Quobyte Registry, Metadata and Data services. The communication between the native Quobyte HDFS driver and the Quobyte services is via TCP.
Benefits of using the native Quobyte HDFS driver
- Delivers the scale-out performance that Hadoop/Spark… needs, no bottlenecks through NFS or object storage gateways.
- Support for locality, i.e. the Quobyte driver tells Hadoop/Spark/Yarn where replicas or stripes of a file are located so jobs get scheduled on the same machine. In addition, Quobyte supports policies that place replicas or stripes locally.
- Easy data sharing through all the other interfaces Quobyte supports: POSIX and Windows file system, Object/S3, NFS.
How to configure the Driver with Hadoop/Spark/HBase and other tools?
Each tool in the analytics ecosystem requires a slightly different configuration. Check out our tutorials for hadoop, spark, HBase and Apache Drill for step-by-step instructions. Generally speaking you need to do three things:
-
Copy the Quobyte native HDFS driver plugin JAR into the library directory of your tool. Once you install the quobyte-hadoop package you can find the JAR in /opt/quobyte/hadoop/quobyte_hadoop_plugin.jar
-
Configure Quobyte as the default storage to use and which drivers to use for “quobyte://”. In hadoop you configure this in the core-site.xml config file
<property> <name>fs.defaultFS</name> <value>quobyte:///</value> </property> <property> <name>fs.quobyte.impl</name> <value>com.quobyte.hadoop.interfaces.QuobyteFileSystemAdapter</value> </property> <property> <name>fs.AbstractFileSystem.quobyte.impl</name> <value>com.quobyte.hadoop.interfaces.QuobyteAbstractFileSystemAdapter</value> </property> <property> <name>com.quobyte.hadoop.backend</name> <value>JNI</value> </property>
-
Configure the Quobyte driver by adding the registry DNS record for your Quobyte cluster and the volume to use:
<property> <name>com.quobyte.hadoop.registry</name> <value>abcdefg.myquobyte.net</value> </property> <property> <name>com.quobyte.hadoop.volume</name> <value>demo-1</value> </property>