ollama

mirror of https://github.com/tcsenpai/ollama.git synced 2025-07-24 02:00:19 +00:00

History

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896 )

* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc

2024-01-10 19:08:51 -05:00

gpu_darwin.go

calculate overhead based number of gpu devices (#1875 )

2024-01-09 15:53:33 -05:00

gpu_info_cpu.c

calculate overhead based number of gpu devices (#1875 )

2024-01-09 15:53:33 -05:00

gpu_info_cuda.c

Harden GPU mgmt library lookup

2024-01-10 15:06:41 -08:00

gpu_info_cuda.h

Harden GPU mgmt library lookup

2024-01-10 15:06:41 -08:00

gpu_info_rocm.c

Harden GPU mgmt library lookup

2024-01-10 15:06:41 -08:00

gpu_info_rocm.h

Harden GPU mgmt library lookup

2024-01-10 15:06:41 -08:00

gpu_info.h

calculate overhead based number of gpu devices (#1875 )

2024-01-09 15:53:33 -05:00

gpu_test.go

calculate overhead based number of gpu devices (#1875 )

2024-01-09 15:53:33 -05:00

gpu.go

Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896 )

2024-01-10 19:08:51 -05:00

types.go

calculate overhead based number of gpu devices (#1875 )

2024-01-09 15:53:33 -05:00