mirror of
https://github.com/tcsenpai/ollama.git
synced 2025-06-06 03:05:22 +00:00

Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.