Performance analysis

This document aims to provide informations regarding the performance you can expect from a ParaViewWeb setup vs similar configuration using the client/server architecture of ParaView and its Qt client.
The test were performed on:

Client - New Mexico (map)

MacBook Pro (15-inch, 2016) - macOS High Sierra (10.13.3)
Processor: 2.9 GHz Intel Core i7
Memory: 16 GB 2133 MHz LPDDR3
GPU: Radeon Pro 460 4 GB

Server - Amazon EC2 - N Virginia (map)

Amazon EC2 - g2.2xlarge ($0.650) - US East (N Virginia)
ParaView 5.5 / EGL build

Speed test results

Ping: 22ms
Download: 170 Mps
Upload: 22 Mbps

Idle resource usage

	MacBookPro Memory	MacBookPro Real Memory
ParaView Qt client	344.2 MB	540.8 MB
pvserver	22.2 MB	57.1 MB
pvpython	13.0 MB	40.6 MB
pvpython + pv lib (1)	83.6 MB	147.4 MB
visualizer (server) (2)	123.4 MB	195.5 MB

Real Memory: Total Memory currently consumed by an application (including Virtual pages)
Memory: Memory used in RAM

Interactive resource usage

The test was done on the EC2 server using top to monitor the resources taken by ParaView.
The loaded data was the Lidar one which report 228 MB in the information panel.
The filter was a clip which created a new dataset that was 181.3 MB in the information panel.

ParaView - Visualizer on EC2	CPU	Memory
Idle	1%	const (6.5%)
Interacting 30 FPS	265%	const (6.5%)
Apply a filter	100%	const + filter data (10.2%)

Loading cost analysis

This section only focus on the intial cost of starting a given process.

	MacBookPro	EC2
ParaView Qt client	~ 2.4s
pvserver	< 1s	< 1s
pvpython + pv lib (1)	~ 1s	~ 1s
visualizer (server) (2)	~ 3.4s
visualzier (server+client)		~ 4s

Starting Python interpreted and loading the ParaView library.
Starting Python interpreted, loading the ParaView library and starting web server.

Then loading data will add-up to those numbers

Loaded data type	Load time
1 MB exodus file	+ 0.4s
4 MB data + state	+ 1.5s
2.25 GB data + extract surface	+ 34.5s

Rendering performances with GPU

For the rendering performances we’ve loaded the same dataset of 4.8 Million point cloud and interact with it. The times reported are while interacting leaving the still render out of the picture. In either case the last render does not affect how the tools performance is percived.

Running on localhost

Image resolution	ParaView Qt client	ParaView* (client/server)	ParaView* - Visualizer
1280 x 720	+ 600 fps	27/22/21/4 fps	30 fps
1920 x 1080	+ 600 fps	16/12/11/4 fps	30 fps

Compression modes*:

ParaView* (client/server)	ParaView* - Visualizer
No compression (~ BMP)	50% JPEG / Ratio 1 - Default
LZ4 (default settings)	25% JPEG / Ratio 1
Squirt (default settings)	50% JPEG / Ratio 0.5
zlib (default settings)	25% JPEG / Ratio 0.25

Running on EC2

Image resolution	ParaView* (client/server)	ParaView - Visualizer
1280 x 720	3/3/3/2 fps	30/30/30/30 fps
1920 x 1080	3/1/2/2 fps	23/23/30/30 fps

Compression modes*:

ParaView* (client/server)	ParaView* - Visualizer	Web image 1280x720	Web image 1920x1080
No compression (~ BMP)	50% JPEG / Ratio 1 - Default	45.6 KB vs 298.2 KB	83.5 KB vs 563.6 KB
LZ4 (default settings)	25% JPEG / Ratio 1	31.0 KB vs 295.4 KB	56.9 KB vs 562.0 KB
Squirt (default settings)	50% JPEG / Ratio 0.5	17.0 KB vs 293.7 KB	29.9 KB vs 560.8 KB
zlib (default settings)	25% JPEG / Ratio 0.25	4.6 KB vs 293.9 KB	7.82 KB vs 560.7 KB

Note:

ParaViewWeb targets 30 FPS hence the constant 30 FPS value.
When increasing the server FPS value, I was able to reach approximately 45 FPS with an image of 1280x720 and a JPEG Quality of 50% (Ratio 1 => same image resolution).
When lowering even more the quality of the transfered image I was getting the 60 FPS which was the targeted framerate set on the server side.

Rendering performances with CPU

For software rendering performance analysis we are going to compare both llvm and OpenSWR backend across a various set of dataset and hardware. Then we will tune some parameters to see how they affect interactive rendering performances.
All the testings will be done with ParaView Visualizer.

Datasets

Name	Size	Purpose
disk_out_ref.ex2	700 KB / 7x10^3 Cells	Small dataset not stressing rendering
lidar.vtp	61 MB / 5x10^6 Cells	Point cloud dataset with a decent number of points
Enclosure.vtm	260 MB / 3x10^6 Cells	Surface mesh with a decent number of triangles

Hardware

MacBook Pro (15-inch, 2016) - macOS High Sierra (10.13.3)
Processor: 2.9 GHz Intel Core i7
Memory: 16 GB 2133 MHz LPDDR3
GPU: Radeon Pro 460 4 GB

Dell Precision Tower 7910 - Ubuntu 16.04 LTS
Processor: Intel® Xeon(R) CPU E5-2640 v3 @ 2.60GHz × 32
Memory: 128 GB
Graphics: Quadro K2200/PCIe/SSE2

Amazon EC2 - US East (N Virginia)
ParaView 5.5 / OSMesa build
c5.2xlarge ($0.34) | c5.4xlarge ($0.68) | c5.9xlarge ($1.53)
r4.2xlarge ($0.532)| r4.4xlarge ($1.064)

Rendering with osmesa-llvm

Docker command line used to run the following tests

docker run                           \
    -v /test-data:/data               \
    -e "SERVER_NAME=localhost:8081"    \
    -e "PROTOCOL=ws"                    \
    -e "EXTRA_PVPYTHON_ARGS=--mesa-llvm" \
    -p 0.0.0.0:8081:80                    \
    -ti kitware/paraviewweb:pvw-visualizer-osmesa-5.5.0
open http://localhost:8081

On local network

Hardware	Image resolution	disk_out_ref.ex2	lidar.vtp	Enclosure.vtm
MacBook Pro	1280 x 720	20/23/20/20/20/20 fps	1/1.2/1.2/1.1/0.9/0.8 fps	1/ 1/.9/1.1/ 1/ 1 fps
Dell 7910	1280 x 720	26/26/20/20/20/20 fps	.5/0.5/0.5/0.4/0.4/0.4 fps	.5/.5/.5/0.5/.6/.6 fps

On EC2 with DSL network

Hardware	Image resolution	disk_out_ref.ex2	lidar.vtp	Enclosure.vtm
c5.2xlarge	1280 x 720	30/30/20/20/20/20 fps	1.3/1.3/1.3/1.2/0.9/0.9 fps	1.2/1.2/1.2/1.3/1.4/1.3 fps
c5.4xlarge	1280 x 720	20/30/20/20/20/20 fps	1.3/1.3/1.3/1.2/0.9/0.9 fps	1.2/1.2/1.2/1.3/1.4/1.4 fps
c5.9xlarge	1280 x 720	30/30/20/20/20/20 fps	1.3/1.3/1.4/1.2/0.9/0.9 fps	1.3/1.2/1.2/1.3/1.4/1.4 fps
r4.2xlarge	1280 x 720	26/30/20/20/20/20 fps	0.8/1.0/1.0/0.8/0.6/0.6 fps	0.9/0.9/0.9/1.0/1.1/1.1 fps
r4.4xlarge	1280 x 720	30/30/20/20/20/20 fps	0.8/1.0/1.0/0.7/0.6/0.6 fps	0.9/0.9/0.9/0.9/1.0/1.0 fps

Rendering configuration settings

Setting name	a	b	c	d	e	f
Max interactive server FPS	30	30	20	20	20	20
Interactive image quality (JPEG)	50	50	80	80	80	80
Interactive image ratio	1	1	1	.5	.1	.1
Mouse event throttling per second	60	60	40	40	40	40
Use FXAA	1	0	0	0	0	1

Rendering with osmesa-swr

Docker command line used to run the following tests

docker run                          \
    -v /test-data:/data              \
    -p 0.0.0.0:8081:80                \
    -e "SERVER_NAME=localhost:8081"    \
    -e "PROTOCOL=ws"                    \
    -e "EXTRA_PVPYTHON_ARGS=--mesa-swr"  \
    -ti kitware/paraviewweb:pvw-visualizer-osmesa-5.5.0
open http://localhost:8081

On local network

Hardware	Image resolution	disk_out_ref.ex2	lidar.vtp	Enclosure.vtm
MacBook Pro	1280 x 720	12/28/20/20/20 fps	0.8/0.9/0.9/0.4/0.3/0.2 fps	2.2/2.6/ 3/ 5/12/11 fps
Dell 7910	1280 x 720	30/30/20/20/20 fps	2.6/3.1/3.1/1.6/0.3/0.3 fps	8/ 9/10/15/20/20 fps

On EC2 with DSL network

Hardware	Image resolution	disk_out_ref.ex2	lidar.vtp	Enclosure.vtm
c5.2xlarge	1280 x 720	15/28/20/20/20/20 fps	0.9/0.9/0.9/0.5/0.4/0.3 fps	3/ 4/ 4/ 5/14/14 fps
c5.4xlarge	1280 x 720	30/30/20/20/20/20 fps	1.8/2.0/2.1/1.0/0.4/0.4 fps	6/ 7/ 7/12/20/20 fps
c5.9xlarge	1280 x 720	30/30/20/20/20/20 fps	4.6/5.0/5.0/2.0/0.4/0.3 fps	13/15/15/20/20/20 fps
r4.2xlarge	1280 x 720	11/30/20/20/20/20 fps	0.6/0.7/0.7/0.3/0.3/0.2 fps	2/ 2/ 2/ 4/10/10 fps
r4.4xlarge	1280 x 720	22/30/20/20/20/20 fps	1.2/1.6/1.6/0.8/0.3/0.3 fps	4/ 5/ 5/ 9/20/20 fps

Rendering configuration settings

Setting name	a	b	c	d	e	f
Max interactive server FPS	30	30	20	20	20	20
Interactive image quality (JPEG)	50	50	80	80	80	80
Interactive image ratio	1	1	1	.5	.1	.1
Mouse event throttling per second	60	60	40	40	40	40
Use FXAA	1	0	0	0	0	1

Performance comments

The framerate is noticeably impacted by the resolution of the image that needs to be generated, that’s why by reducing the image ratio while interacting greatly increase the framerate which is not the case for GPU rendering.

WebGL download time

Name	Size	Purpose
disk_out_ref.ex2	700 KB / 7x10^3 Cells	Small dataset not stressing rendering
lidar.vtp	61 MB / 5x10^6 Cells	Point cloud dataset with a decent number of points
Enclosure.vtm	260 MB / 3x10^6 Cells	Surface mesh with a decent number of triangles

Download time

Dataset	Initial render	Clip half	FPS on MacBookPro
disk_out_ref.ex2	209 ms	116 ms	60/60 fps
lidar.vtp	47963 ms	31197 ms	30/60 fps
Enclosure.vtm	37123 ms	22841 ms	60/60 fps