Unify prompt config for user and system (#151)

* fix list number in readmes

* fix list number in readmes

* unify prompt config for role user and system

* update json sample file

* update documents and add test

* update readmes
This commit is contained in:
Conan 2023-03-12 01:48:24 -05:00 committed by GitHub
parent 1c3fe7e55d
commit ae3e3ba558
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 140 additions and 46 deletions

View File

@ -43,6 +43,7 @@ jobs:
run: |
python3 make_book.py --book_name "test_books/lemo.epub" --test --test_num 5 --language zh-hans
python3 make_book.py --book_name "test_books/animal_farm.epub" --test --test_num 5 --language ja --model gpt3 --prompt prompt_template_sample.txt
python3 make_book.py --book_name "test_books/animal_farm.epub" --test --test_num 5 --language ja --prompt prompt_template_sample.json
- name: Rename and Upload ePub

View File

@ -17,7 +17,7 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制
1. `pip install -r requirements.txt`
2. 使用 `--openai_key` 指定 OpenAI API key如果有多个可以用英文逗号分隔(xxx,xxx,xxx),可以减少接口调用次数限制带来的错误。
或者,指定环境变量 `OPENAI_API_KEY` 来略过这个选项。
或者,指定环境变量 `BMM_OPENAI_API_KEY` 来略过这个选项。
3. 本地放了一个 `test_books/animal_farm.epub` 给大家测试
4. 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型,用 `--model gpt3` 来使用 gpt3 模型
5. 使用 `--test` 命令如果大家没付费可以加上这个先看看效果(有 limit 稍微有些慢)
@ -31,13 +31,17 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制
10. 请使用 --book_from 选项指定电子阅读器类型(现在只有 kobo 可用),并使用 --device_path 指定挂载点。
11. 如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。
**请注意此处你输入的api应该是'`https://xxxx/v1`'的字样,域名需要用引号包裹**
11. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
12. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
13. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
14. 如果你想调整 prompt你可以使用 `--prompt` 参数。该参数可以是提示模板字符串,也可以是模板 `.txt` 文件的路径。有效的占位符包括 `{text}``{language}`
15. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
16. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
17. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
12. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
13. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
14. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
15. 如果你想调整 prompt你可以使用 `--prompt` 参数。有效的占位符包括 `{text}``{language}`。你可以用以下方式配置 prompt
如果您不需要设置 `system` 角色,可以这样:`--prompt "Translate {text} to {language}" 或者 `--prompt prompt_template_sample.txt`(示例文本文件可以在 [./prompt_template_sample.txt](./prompt_template_sample.txt) 找到)。
如果您需要设置 `system` 角色,可以使用以下方式配置:`--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'`,或者 `--prompt prompt_template_sample.json`(示例 JSON 文件可以在 [./prompt_template_sample.json](./prompt_template_sample.json) 找到)。
你也可以用环境以下环境变量来配置 `system``user` 角色 prompt`BBM_CHATGPTAPI_USER_MSG_TEMPLATE` 和 `BBM_CHATGPTAPI_SYS_MSG`
该参数可以是提示模板字符串,也可以是模板 `.txt` 文件的路径。
16. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
17. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
18. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
e.g.
```shell

View File

@ -21,7 +21,7 @@ The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist u
1. `pip install -r requirements.txt`
2. Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits.
Or, just set environment variable `OPENAI_API_KEY` to ignore this option.
Or, just set environment variable `BMM_OPENAI_API_KEY` instead.
3. A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
4. The default underlying model is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently. Use `--model gpt3` to change the underlying model to `GPT3`
5. Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time.
@ -35,14 +35,16 @@ The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist u
10. Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.
11. If you want to change api_base like using Cloudflare Workers, use `--api_base <URL>` to support it.
**Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.**
11. Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
12. If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
13. If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
14. To tweak the prompt, use the `--prompt` parameter. The parameter can be a prompt template string or a path to the template `.txt` file. Valid placeholders for the template include `{text}` and `{language}`.
15. Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
16. If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
17. If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
12. Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
13. If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
14. If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
15. To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt:
If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt` (example of a text file can be found at [./prompt_template_sample.txt](./prompt_template_sample.txt)).
If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json` (example of a JSON file can be found at [./prompt_template_sample.json](./prompt_template_sample.json)).
You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`.
16. Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
17. If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
18. If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
### Eamples
@ -65,7 +67,10 @@ python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags di
# Tweaking the prompt
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.txt
# or
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.json
# or
python3 make_book.py --book_name test_books/animal_farm.epub --prompt "Please translate \`{text}\` to {language}"
# Translate books download from Rakuten Kobo on kobo e-reader
python3 make_book.py --book_from kobo --device_path /tmp/kobo

View File

@ -1,6 +1,7 @@
import argparse
import os
from os import environ as env
import json
from book_maker.loader import BOOK_LOADER_DICT
from book_maker.translator import MODEL_DICT
@ -12,16 +13,42 @@ def parse_prompt_arg(prompt_arg):
prompt = None
if prompt_arg is None:
return prompt
if not prompt_arg.endswith(".txt"):
prompt = prompt_arg
if not any(prompt_arg.endswith(ext) for ext in [".json", ".txt"]):
try:
# user can define prompt by passing a json string
# eg: --prompt '{"system": "You are a professional translator who translates computer technology books", "user": "Translate \`{text}\` to {language}"}'
prompt = json.loads(prompt_arg)
except json.JSONDecodeError:
# if not a json string, treat it as a template string
prompt = {"user": prompt_arg}
else:
if os.path.exists(prompt_arg):
with open(prompt_arg, "r") as f:
prompt = f.read()
if prompt_arg.endswith(".txt"):
# if it's a txt file, treat it as a template string
with open(prompt_arg, "r") as f:
prompt = {"user": f.read()}
elif prompt_arg.endswith(".json"):
# if it's a json file, treat it as a json object
# eg: --prompt prompt_template_sample.json
with open(prompt_arg, "r") as f:
prompt = json.load(f)
else:
raise FileNotFoundError(f"{prompt_arg} not found")
if prompt is None or not (all(c in prompt for c in ["{text}", "{language}"])):
if prompt is None or not (
all(c in prompt["user"] for c in ["{text}", "{language}"])
):
raise ValueError("prompt must contain `{text}` and `{language}`")
if "user" not in prompt:
raise ValueError("prompt must contain the key of `user`")
if (prompt.keys() - {"user", "system"}) != set():
raise ValueError("prompt can only contain the keys of `user` and `system`")
print("prompt config:", prompt)
return prompt
@ -124,9 +151,9 @@ def main():
)
parser.add_argument(
"--prompt",
dest="prompt_template",
dest="prompt_arg",
type=str,
metavar="PROMPT_TEMPLATE",
metavar="PROMPT_ARG",
help="used for customizing the prompt. It can be the prompt template string, or a path to the template file. The valid placeholders are `{text}` and `{language}`.",
)
@ -139,7 +166,15 @@ def main():
translate_model = MODEL_DICT.get(options.model)
assert translate_model is not None, "unsupported model"
if options.model in ["gpt3", "chatgptapi"]:
OPENAI_API_KEY = options.openai_key or env.get("OPENAI_API_KEY")
OPENAI_API_KEY = (
options.openai_key
or env.get(
"OPENAI_API_KEY"
) # XXX: for backward compatability, deprecate soon
or env.get(
"BBM_OPENAI_API_KEY"
) # suggest adding `BBM_` prefix for all the bilingual_book_maker ENVs.
)
if not OPENAI_API_KEY:
raise Exception(
"OpenAI API key not provided, please google how to obtain it"
@ -183,7 +218,7 @@ def main():
test_num=options.test_num,
translate_tags=options.translate_tags,
allow_navigable_strings=options.allow_navigable_strings,
prompt_template=parse_prompt_arg(options.prompt_template),
prompt_config=parse_prompt_arg(options.prompt_arg),
)
e.make_bilingual_book()

View File

@ -10,6 +10,7 @@ from rich import print
from tqdm import tqdm
from .base_loader import BaseBookLoader
from book_maker.utils import prompt_config_to_kwargs
class EPUBBookLoader(BaseBookLoader):
@ -25,12 +26,15 @@ class EPUBBookLoader(BaseBookLoader):
test_num=5,
translate_tags="p",
allow_navigable_strings=False,
prompt_template=None,
prompt_config=None,
):
self.epub_name = epub_name
self.new_epub = epub.EpubBook()
self.translate_model = model(
key, language, model_api_base, prompt_template=prompt_template
key,
language,
api_base=model_api_base,
**prompt_config_to_kwargs(prompt_config),
)
self.is_test = is_test
self.test_num = test_num

View File

@ -2,6 +2,7 @@ import sys
from pathlib import Path
from .base_loader import BaseBookLoader
from book_maker.utils import prompt_config_to_kwargs
class TXTBookLoader(BaseBookLoader):
@ -17,10 +18,15 @@ class TXTBookLoader(BaseBookLoader):
model_api_base=None,
is_test=False,
test_num=5,
prompt_template=None,
prompt_config=None,
):
self.txt_name = txt_name
self.translate_model = model(key, language, model_api_base)
self.translate_model = model(
key,
language,
api_base=model_api_base,
**prompt_config_to_kwargs(prompt_config),
)
self.is_test = is_test
self.p_to_save = []
self.bilingual_result = []

View File

@ -6,15 +6,39 @@ from os import environ
from .base_translator import Base
PROMPT_ENV_MAP = {
"user": "BBM_CHATGPTAPI_USER_MSG_TEMPLATE",
"system": "BBM_CHATGPTAPI_SYS_MSG",
}
class ChatGPTAPI(Base):
def __init__(self, key, language, api_base=None, prompt_template=None):
DEFAULT_PROMPT = "Please help me to translate,`{text}` to {language}, please return only translated content not include the origin text"
def __init__(
self,
key,
language,
api_base=None,
prompt_template=None,
prompt_sys_msg=None,
**kwargs,
):
super().__init__(key, language)
self.key_len = len(key.split(","))
if api_base:
openai.api_base = api_base
self.prompt_template = (
prompt_template
or "Please help me to translate,`{text}` to {language}, please return only translated content not include the origin text"
or environ.get(PROMPT_ENV_MAP["user"])
or self.DEFAULT_PROMPT
)
self.prompt_sys_msg = (
prompt_sys_msg
or environ.get(
"OPENAI_API_SYS_MSG"
) # XXX: for backward compatability, deprecate soon
or environ.get(PROMPT_ENV_MAP["system"])
)
def rotate_key(self):
@ -22,20 +46,23 @@ class ChatGPTAPI(Base):
def get_translation(self, text):
self.rotate_key()
messages = []
if self.prompt_sys_msg:
messages.append(
{"role": "system", "content": self.prompt_sys_msg},
)
messages.append(
{
"role": "user",
"content": self.prompt_template.format(
text=text, language=self.language
),
}
)
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": environ.get("OPENAI_API_SYS_MSG") or "",
},
{
"role": "user",
"content": self.prompt_template.format(
text=text, language=self.language
),
},
],
messages=messages,
)
t_text = (
completion["choices"][0]

View File

@ -8,7 +8,7 @@ class Google(Base):
google translate
"""
def __init__(self, key, language, api_base=None, prompt_template=None):
def __init__(self, key, language, **kwargs):
super().__init__(key, language)
self.api_url = "https://translate.google.com/translate_a/single?client=it&dt=qca&dt=t&dt=rmt&dt=bd&dt=rms&dt=sos&dt=md&dt=gt&dt=ld&dt=ss&dt=ex&otf=2&dj=1&hl=en&ie=UTF-8&oe=UTF-8&sl=auto&tl=zh-CN"
self.headers = {

View File

@ -5,7 +5,7 @@ from .base_translator import Base
class GPT3(Base):
def __init__(self, key, language, api_base=None, prompt_template=None):
def __init__(self, key, language, api_base=None, prompt_template=None, **kwargs):
super().__init__(key, language)
self.api_url = (
f"{api_base}v1/completions"

View File

@ -117,3 +117,11 @@ TO_LANGUAGE_CODE = {
"sinhalese": "si",
"castilian": "es",
}
def prompt_config_to_kwargs(prompt_config):
prompt_config = prompt_config or {}
return dict(
prompt_template=prompt_config.get("user", None),
prompt_sys_msg=prompt_config.get("system", None),
)

View File

@ -0,0 +1,4 @@
{
"system": "You are a professional translator.",
"user": "Translate the given text to {language}. Be faithful or accurate in translation. Make the translation readable or intelligible. Be elegant or natural in translation. If the text cannot be translated, return the original text as is. Do not translate person's name. Do not add any additional text in the translation. The text to be translated is:\n{text}"
}