Tech

Data Migration with Quobyte

Posted by  Quobyte  on  

Trucks Getting a lot of files from A to B is still not an easy task. While rsync is versatile it’s not the fastest and parallelizing rsync is no fun. Quobyte comes with two mechanisms to make data migrations easy. The command line tool qcopy used to move data onto a Quobyte cluster from any (Linux) file system and the Quobyte Data Mover used to move data between Quobyte clusters.

Migrating files onto Quobyte with qcopy

qcopy works on any file system you can mount on Linux, including NFS. This tool does a tree walk on the mounted file system and synchronizes the data onto a Quobyte volume. The file copying happens in parallel. You can control the parallelism but by default it will use 30 threads to copy files in parallel. On the backend it uses a built-in Quobyte client and does not go through the Linux kernel, improving performance.

qcopy also copies extended attributes (xattrs) and POSIX acls. Quobyte supports POSIX and NFSv4 ACLs and automatically translates between the two representations. You don’t have to manage different ACLs for the same file or directory manually.

For a live migration you can run qcopy to copy all files in the background, then stop access to the source file system and run qcopy to copy the latest changes over. It automatically replaces the destination file if the source version had been modified (timestamps) or has a different size. As with any synchronization tool: Make sure the clocks on both systems are synchronized!

All you need to use qcopy is to install the quobyte-client package. The tool is also part of our free edition. Once you have it installed, running qcopy is simple:

qcopy /path/to/loca/mount abcd.myquobyte.net/volume

Replace abc.myquobyte.net with your registry DNS record. By default, qcopy will run with 30 parallel worker threads. You can change this with -t <num-threads>.

If you want to check what files need to be copied you can use the –dry-run or –verify options.

Moving or Copying files between Quobyte clusters

The Quobyte data mover is a distributed and highly parallel and reliable data transfer service that can copy data inside a Quobyte cluster or between multiple independent and geographical distributed Quobyte clusters. Despite its name, the data mover can also re-code files in transit, e.g. from replication to erasure coding or to a wider erasure coding scheme for archival.

The data mover is policy based, like the rest of Quobyte. You can control through policies which files are moved or copied, and how they are protected (EC, replication) on the destination cluster. The policies also define how to resolve conflicts – two files with the same path and name: “last writer wins” or rename.

Data in transit is protected with cryptographic checksums (e.g. md5) and files are verified after transfer, i.e. they are read and the checksums are compared. This ensures that data isn’t corrupted in transit where the TCP checksums are not strong enough for larger data blocks (read more in our blog post here or in this publication by Google on DRAM errors).

Apart from the versatility the data mover is also highly efficient and fast: The data mover works directly on the Quobyte metadata and doesn’t require costly tree-walks like rsync. The actual data transfer happens between Quobyte data services on the source cluster and on the destination server, highly parallelized. The Quobyte data mover can go as fast as your network link. 

This parallelism comes in handy when you enable TLS for communication between Quobyte clusters. The expensive computation for cryptographic checksums and encryption in the TLS protocol is spread out across your storage servers. You don’t need an expensive VPN gateway that adds another congestion point. You can find out more about Quobyte’s security features like selective TLS or X.509 certificates.

Eventual Consistency with the Data Mover

The Quobyte Data Mover allows you to regularly synchronize one or more volumes bidirectionally between Quobyte clusters. With the “last writer wins” conflict resolution you can use it to implement eventual consistency between those clusters, similar to an object store.

Hydrating Temporary Quobyte Clusters on the Cloud 

When you have use-cases where you need a cluster on the cloud temporarily, e.g. to speed up a computation or because your on prem cluster is full, Quobyte’s automated install and the data mover come in handy.

Installing a Quobyte storage cluster on the public cloud can easily be automated. Check out our ansible scripts for automated installations here (link github). Once your cloud cluster is up and running, you can use the data mover to hydrate it, i.e. tell the data mover to copy relevant volumes over. You can query the Quobyte API to start the data mover and also to check for completion. Once done, you have a clone of your on-prem environment on the cloud.

Once the job is done you can copy the results back with the data mover and then terminate your cloud cluster.

Here is an example job description file:

job: {
  source: {
    quobyte: {
      volume: "0c4b1be1-0b11-475f-a0dc-2bb02f5c1555"
    }
  }
  destination: {
    quobyte: {
      volume: "db242b0b-44da-41ae-87c0-ce2427d09c17"
      registry: "abcd.myquobyte.net"
    }
  }
  destination_file_settings: {
    create_behavior: FAIL_IF_FILE_EXISTS;
  }
}

To run the job you can submit it via qgmgt with

qmgmt files copy /path/to/jobfile.json

To monitor the progress you can check the task status via

qmgmt task show <taskId>

If you want to automate it you can submit the task via the Quobyte REST API by calling createCopyFilesTask. The CreateTaskResponse object contains the task_id, which you can use to query the task status via the getTaskList call with the task_id set in the GetTaskListRequest object.

 

Photo of Quobyte

Posted by

Quobyte enables companies to run their storage with Google-like efficiency and ease. It offers a fully automated, scalable, and high-performance software for implementing enterprise-class storage infrastructures for all workloads on standard server hardware.