Cloud Deployment Guide
Note
Be sure to read the Deployment Overview first.
- Scenario 1: Deploy your own instance of VIAME Web to GCP Compute Engine.
- Scenario 2: Run VIAME pipelines on a GCP Compute Engine VM from the command line.
- Scenario 3: Run a Private GPU worker in GCP to process jobs from any VIAME Web instance including viame.kitware.com (standalone mode)
The terraform section is the same for all scenarios. The Ansible section will have differences.
Before you begin
You'll need a GCP Virtual Machine (VM) with the features listed below. This section will guide you through creating one and deploying VIAME using Terraform and Ansible.
- Terraform automates the process of creating and destroying cloud resources such as VMs.
- Ansible automates configuration, such as software installation, on newly created machines.
Together, these tools allow you to quickly create a reproducible environment. If you do not want to use these tools, you can create your own VM manually through the management console and skip to the docker documentation instead.
Feature | Recommended value |
---|---|
Operating system | Ubuntu 20.04 |
Instance Type | n1-standard-4 or larger |
GPU Type | nvidia-tesla-t4 , nvidia-tesla-p4 , or similar |
Disk Type | SSD, 128GB or more depending on your needs |
Install dependencies
To run the provisioning tools below, you need the following installed on your own workstation.
Note
Google Cloud worker provisioning can only be done from an Ubuntu Linux 18.04+ host. Ansible and terraform should work on Windows Subsystem for Linux (WSL) if you only have a windows host. You could also use a cheap CPU-only cloud instance to run these tools.
- Install Google Cloud SDK
- Install Terraform
- Install Ansible
- Find your google cloud project id. It looks like
project-name-123456
.
Tip
Google Cloud imposes GPU Quotas. You may need to request a quota increase. Anecdotally, request increases of 1 unit are approved automatically, but more are rejected.
Creating a VM with Terraform
1 2 3 4 5 6 |
|
Run Terraform
Warning
GPU-accelerated VMs are significantly more expensive than typical VMs. Make sure you are familiar with the cost of the machine and GPU you choose. See main.tf for default values.
See devops/main.tf
for a complete list of variables. The default machine_type
, gpu_type
, and gpu_count
can be overridden.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Destroy the stack
Later, when you are done with the server and have backed up your data, use terraform to destroy your resources.
1 |
|
Configure with Ansible
This step will prepare the new host to run a VIAME worker by installing nvidia drivers, docker, and downloading VIAME and all optional addons.
Warning
The playbook may take 30 minutes or more to run because it must install nvidia drivers and download several GB of software packages.
Ansible Extra Vars
These are all the variables that can be provided with --extra-vars
.
Variable | Default | Description |
---|---|---|
run_server | no |
Set run_server=yes for scenario 1 (Web Instance) (Fastest option) |
run_viame_cli | no |
Set run_viame_cli=yes for scenario 2 (VIAME CLI) |
run_worker_container | no |
Set run_worker_container=yes for scenario 3 (Standalone Worker) |
viame_bundle_url | latest bundle url | Optional for scenario 2 & 3. Change to install a different version of VIAME. This should be a link to the latest Ubuntu Desktop (18/20) binaries from viame.kitware.com (Mirror 1) |
DIVE_USERNAME | null | Required for scenario 3. Username to start private queue processor |
DIVE_PASSWORD | null | Required for scenario 3. Password for private queue processor |
WORKER_CONCURRENCY | 2 |
Optional for scenario 3. Max concurrnet jobs. Change this to 1 if you run training |
DIVE_API_URL | https://viame.kitware.com/api/v1 |
Optional for scenario 3. Remote URL to authenticate against. |
KWIVER_DEFAULT_LOG_LEVEL | warn |
Optional for scenario 3. kwiver log level |
Run Ansible
The examples below assumes the inventory
file was created by Terraform above.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Once provisioning is complete, jobs should begin processing from the job queue. You can check viame.kitware.com/#/jobs to see queue progress and logs.
Tip
This Ansible playbook is runnable from any Ubuntu 18.04+ host to any Ubuntu 18.04+ target. To run it locally, use the inventory.local
file instead. If you already have nvidia or docker installed, you can comment out these lines in the playbook.
If you run locally you'll need to restart the machine and run the playbook a second time. The playbook will do this automatically for remote provisioning, but cannot restart if you're provisioning localhost.
1 |
|
Tip
You may need to run through the docker post-install guide if you have permissions errors when trying to run docker
.
Check that it worked
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
You can enable your private queue on the jobs page and begin running jobs.
Next Steps
- Scenario 1: Proceed to Docker Compose Deployment.
- Scenario 2: Setup is complete. Proceed to the VIAME Documentation.
- Scenario 3: Setup is complete. Make sure your private queue is enabled.
Troubleshooting
- Ansible provisioning is idempotent. If it fails, run it again once or twice.
- You may need to change the global
GPUS_ALL_REGIONS
quota in IAM -> Quotas - Nvidia drivers may not install correctly the first time. Try installing manually using
ubuntu-drivers