Daniel Hiltgen 46c5f5fd9e Runtime selection of new or old runners
This adjusts the new runners to comingle with existing runners so we can use an
env var to toggle the new runners on.
2024-08-01 09:06:01 -07:00
..
2024-07-29 15:38:51 -07:00
2024-07-29 15:38:51 -07:00
2024-07-29 15:38:51 -07:00

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings

TODO

  • Parallization
  • More tests