Jesse Gross eccd4dd8d2 runner.go: Use correct JSON field names for runners
The fields for inference parameters are very similar between the
Ollama API and Ollama/runners. However, some of the names are
slightly different. For these fields (such as NumKeep and
NumPredict), the values from Ollama were never read properly and
defaults were always used.

In the future, we can share a single interface rather than duplicating
structs. However, this keeps the interface consistent with minimal
changes in Ollama as long as we continue to use server.cpp
2024-09-03 21:15:14 -04:00
..
2024-09-03 21:15:13 -04:00
2024-09-03 21:15:13 -04:00
2024-09-03 21:15:13 -04:00

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings

TODO

  • Parallization
  • More tests