## Batch Size 32 Compile true

| Experiment | Warmup_latency (s) | Average_latency (s) | Throughput (samples/sec) | GPU Utilization (%) |
| ---------- | ------------------ | ------------------- | ------------------------ | ------------------- |
| original | 13.559 +/- 0.183 | 4.756 +/- 0.960 | 401.554 +/- 58.539 | 43.026 +/- 1.221 |
| h2d_d2h_threads | 12.471 +/- 0.819 | 5.596 +/- 1.180 | 340.906 +/- 69.513 | 32.313 +/- 8.138 |
