176 Commits

Author SHA1 Message Date
jmorganca
f1f54c5bd5 fix README.md 2024-09-03 21:15:13 -04:00
jmorganca
18662d1180 consistent whitespace 2024-09-03 21:15:13 -04:00
jmorganca
083a9e9b4e link metal 2024-09-03 21:15:13 -04:00
jmorganca
d0703eaf44 wip 2024-09-03 21:15:13 -04:00
jmorganca
ce00e387c3 wip meta 2024-09-03 21:15:13 -04:00
jmorganca
763d7b601c sync 2024-09-03 21:15:13 -04:00
jmorganca
4d0e6c55b0 remove perl docs 2024-09-03 21:15:13 -04:00
jmorganca
3375b82c56 remove build scripts 2024-09-03 21:15:13 -04:00
jmorganca
b8c1065ab6 remove need for perl 2024-09-03 21:15:13 -04:00
jmorganca
a632a04426 fix output 2024-09-03 21:15:13 -04:00
jmorganca
110f37ffb0 arch build 2024-09-03 21:15:13 -04:00
jmorganca
f2f03ff7f2 add temporary makefile 2024-09-03 21:15:13 -04:00
jmorganca
ba0ff1c46a fix cuda and rocm builds 2024-09-03 21:15:13 -04:00
jmorganca
9966a055e5 fix cgo flags for darwin amd64 2024-09-03 21:15:13 -04:00
jmorganca
7aa7a3c1e5 remove -fPIC from build_hipblas.sh 2024-09-03 21:15:13 -04:00
jmorganca
de634b7fd7 fix issues with runner 2024-09-03 21:15:13 -04:00
jmorganca
795753be7e move sync script back in for now 2024-09-03 21:15:13 -04:00
jmorganca
0eed68fed4 llama: sync 2024-09-03 21:15:13 -04:00
jmorganca
783134a3bb update to d5c938cd 2024-09-03 21:15:13 -04:00
jmorganca
74a158a79e add patches 2024-09-03 21:15:13 -04:00
jmorganca
8f79a2e86a cleanup stop code 2024-09-03 21:15:13 -04:00
jmorganca
a4d402c403 fix example 2024-09-03 21:15:13 -04:00
jmorganca
e1dfc757b3 revert llm changes 2024-09-03 21:15:13 -04:00
jmorganca
7d0a452938 num predict 2024-09-03 21:15:13 -04:00
jmorganca
43efc893d7 basic progress 2024-09-03 21:15:13 -04:00
jmorganca
20afaae020 add more runner params 2024-09-03 21:15:13 -04:00
jmorganca
72f3fe4b94 truncate stop properly 2024-09-03 21:15:13 -04:00
jmorganca
a379d68aa9 wip stop tokens 2024-09-03 21:15:13 -04:00
jmorganca
b2ef3bf490 embeddings 2024-09-03 21:15:12 -04:00
jmorganca
ce15ed6d69 remove dependency on llm 2024-09-03 21:15:12 -04:00
jmorganca
c0b94376b2 grammar 2024-09-03 21:15:12 -04:00
jmorganca
72be8e27c4 sampling 2024-09-03 21:15:12 -04:00
jmorganca
d12db0568e better example module, add port 2024-09-03 21:15:12 -04:00
jmorganca
ec17359a68 wip 2024-09-03 21:15:12 -04:00
jmorganca
fbc8572859 add llava to runner 2024-09-03 21:15:12 -04:00
jmorganca
87af27dac0 fix output in build_hipblas.sh 2024-09-03 21:15:12 -04:00
jmorganca
54f391309f mods to build_hipblas.sh for linux 2024-09-03 21:15:12 -04:00
jmorganca
28bedcd807 wip 2024-09-03 21:15:12 -04:00
jmorganca
922d0acbdb improve cuda and hipblas build scripts 2024-09-03 21:15:12 -04:00
jmorganca
b22d78720e cuda linux 2024-09-03 21:15:12 -04:00
Jeffrey Morgan
905568a47f Update README.md 2024-09-03 21:15:12 -04:00
Jeffrey Morgan
a15ac52fbe Update README.md 2024-09-03 21:15:12 -04:00
jmorganca
9547aa53ff disable log file 2024-09-03 21:15:12 -04:00
jmorganca
e29205ad6d fix readme for llava 2024-09-03 21:15:12 -04:00
jmorganca
a8f91d3cc1 add llava 2024-09-03 21:15:12 -04:00
jmorganca
a9884ae136 llama: add clip dependencies 2024-09-03 21:15:12 -04:00
jmorganca
e37651cca0 add clip and parallel requests to the todo list 2024-09-03 21:15:12 -04:00
jmorganca
593d6836ab fix cuda build 2024-09-03 21:15:12 -04:00
jmorganca
533a7e7d50 fix build on windows 2024-09-03 21:15:12 -04:00
jmorganca
0873d28b16 fix ggml-metal.m build constraints 2024-09-03 21:15:12 -04:00
jmorganca
bb795faa6c fix ggml-metal.m 2024-09-03 21:15:12 -04:00
jmorganca
e86db9381a avx2 should only add avx2 2024-09-03 21:15:12 -04:00
jmorganca
4a5633e4bc fix sync script 2024-09-03 21:15:12 -04:00
jmorganca
86f453252b fix ggml-metal.m 2024-09-03 21:15:12 -04:00
jmorganca
dfd8f34806 fix ggml-metal.m 2024-09-03 21:15:12 -04:00
jmorganca
beb847b40f add license headers 2024-09-03 21:15:12 -04:00
jmorganca
785f76d390 pre-patch 2024-09-03 21:15:12 -04:00
jmorganca
9fe48978a8 move runner package down 2024-09-03 21:15:12 -04:00
jmorganca
01ccbc07fe replace static build in llm 2024-09-03 21:15:12 -04:00
jmorganca
ec09be97e8 fix build 2024-09-03 21:15:12 -04:00
jmorganca
6129f30479 wip... 2024-09-03 21:15:12 -04:00
jmorganca
eb1aa97961 rename server to runner 2024-09-03 21:15:12 -04:00
Jeffrey Morgan
5e921e06ac Update README.md 2024-09-03 21:15:12 -04:00
Jeffrey Morgan
02089baf70 Update README.md 2024-09-03 21:15:12 -04:00
Jeffrey Morgan
870e91be76 Update README.md 2024-09-03 21:15:12 -04:00
Jeffrey Morgan
7ecc8e86c4 Update README.md 2024-09-03 21:15:12 -04:00
jmorganca
b1696e308e Add missing hipcc flags 2024-09-03 21:15:12 -04:00
jmorganca
0110994d06 Initial llama Go module 2024-09-03 21:15:12 -04:00
jmorganca
2ef3a217d1 add sync of llama.cpp 2024-09-03 21:15:12 -04:00
Michael Yang
fccf8d179f partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00
Bruce MacDonald
984c9c628c fix embeddings invalid values 2023-08-09 16:50:53 -04:00
Bruce MacDonald
09d8bf6730 fix build errors 2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile 2023-08-09 10:26:19 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83 embed text document in modelfile 2023-08-08 11:27:17 -04:00
Jeffrey Morgan
5eb712f962 trim whitespace before checking stop conditions
Fixes #295
2023-08-08 00:29:19 -04:00
Michael Yang
4dc5b117dd automatically set num_keep if num_keep < 0
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Michael Yang
b9f4d67554 configurable rope frequency parameters 2023-08-03 22:11:58 -07:00
Michael Yang
c5bcf32823 update llama.cpp 2023-08-03 11:50:24 -07:00
Michael Yang
0e79e52ddd override ggml-metal if the file is different 2023-08-02 12:50:30 -07:00
Michael Yang
74a5f7e698 no gpu for 70B model 2023-08-01 17:12:50 -07:00
Michael Yang
7a1c3e62dc update llama.cpp 2023-08-01 16:54:01 -07:00
Michael Yang
319f078dd9 remove -Werror
there are compile warnings on Linux which -Werror elevates to errors,
preventing compile
2023-07-31 21:45:56 -07:00
Jeffrey Morgan
7da249fcc1 only build metal for darwin,arm target 2023-07-31 21:35:23 -04:00
Bruce MacDonald
184ad8f057 allow specifying stop conditions in modelfile 2023-07-28 11:02:04 -04:00
Jeffrey Morgan
dffc8b6e09 update llama.cpp to d91f3f0 2023-07-28 08:07:48 -04:00
Michael Yang
3549676678 embed ggml-metal.metal 2023-07-27 17:23:29 -07:00
Michael Yang
fadf75f99d add stop conditions 2023-07-27 17:00:47 -07:00
Michael Yang
ad3a7d0e2c add NumGQA 2023-07-27 14:05:11 -07:00
Michael Yang
18ffeeec45 update llama.cpp 2023-07-27 14:05:11 -07:00
Michael Yang
cca61181cb sample metrics 2023-07-27 09:31:44 -07:00
Michael Yang
c490416189 lock on llm.lock(); decrease batch size 2023-07-27 09:31:44 -07:00
Michael Yang
f62a882760 add session expiration 2023-07-27 09:31:44 -07:00
Michael Yang
3003fc03fc update predict code 2023-07-27 09:31:44 -07:00
Michael Yang
35af37a2cb session id 2023-07-27 09:31:44 -07:00
Michael Yang
726bc647b2 enable k quants 2023-07-25 08:39:58 -07:00
Michael Yang
cb55fa9270 enable accelerate 2023-07-24 17:14:45 -07:00
Michael Yang
b71c67b6ba allocate a large enough tokens slice 2023-07-21 23:05:15 -07:00
Michael Yang
8526e1f5f1 add llama.cpp mpi, opencl files 2023-07-20 14:19:55 -07:00
Michael Yang
a83eaa7a9f update llama.cpp to e782c9e735f93ab4767ffc37462c523b73a17ddc 2023-07-20 11:55:56 -07:00