## Batch Size 128 Compile false

| Experiment | Warmup_latency (s) | Average_latency (s) | Throughput (samples/sec) | GPU Utilization (%) |
| ---------- | ------------------ | ------------------- | ------------------------ | ------------------- |
| original | 5.355 +/- 0.293 | 14.172 +/- 0.267 | 518.884 +/- 7.854 | 56.108 +/- 0.862 |
| h2d_d2h_threads | 3.810 +/- 0.319 | 14.146 +/- 1.145 | 551.909 +/- 34.079 | 52.057 +/- 4.121 |
| 2_predict_workers | 3.639 +/- 0.037 | 11.161 +/- 0.160 | 636.701 +/- 14.753 | 53.279 +/- 2.659 |
| 3_predict_workers | 4.930 +/- 0.060 | 10.532 +/- 0.801 | 677.736 +/- 25.115 | 53.806 +/- 1.324 |
| 4_predict_workers | 3.819 +/- 0.253 | 11.451 +/- 0.439 | 638.146 +/- 22.611 | 50.129 +/- 1.764 |
