feat: support deepl

This commit is contained in:
yihong0618 2023-03-13 23:09:30 +08:00
parent 20b4d59b70
commit ea3c3b2f95
10 changed files with 180 additions and 57 deletions

View File

@ -11,6 +11,7 @@ env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
BBM_CAIYUN_API_KEY: ${{ secrets.BBM_CAIYUN_API_KEY }}
BBM_DEEPL_API_KEY: ${{ secrets.BBM_DEEPL_API_KEY }}
jobs:
testing:
@ -51,8 +52,13 @@ jobs:
run: |
python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --batch_size 30 --test_num 20 --model caiyun
- name: make deepl translator test
if: env.BBM_CAIYUN_API_KEY != null
run: |
python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --batch_size 30 --test_num 20 --model deepl
- name: make openai key ebook test
if: env.OPENAI_API_KEY != null
if: env.BBM_DEEPL_API_KEY != null
run: |
python3 make_book.py --book_name "test_books/lemo.epub" --test --test_num 5 --language zh-hans
python3 make_book.py --book_name "test_books/animal_farm.epub" --test --test_num 5 --language ja --model gpt3 --prompt prompt_template_sample.txt

View File

@ -15,34 +15,39 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制
## 使用
1. `pip install -r requirements.txt``pip install -U bbook_maker`
2. 使用 `--openai_key` 指定 OpenAI API key如果有多个可以用英文逗号分隔(xxx,xxx,xxx),可以减少接口调用次数限制带来的错误。
- `pip install -r requirements.txt``pip install -U bbook_maker`
- 使用 `--openai_key` 指定 OpenAI API key如果有多个可以用英文逗号分隔(xxx,xxx,xxx),可以减少接口调用次数限制带来的错误。
或者,指定环境变量 `BMM_OPENAI_API_KEY` 来略过这个选项。
3. 本地放了一个 `test_books/animal_farm.epub` 给大家测试
4. 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型,用 `--model gpt3` 来使用 gpt3 模型
5. 使用 `--test` 命令如果大家没付费可以加上这个先看看效果(有 limit 稍微有些慢)
6. 使用 `--language` 指定目标语言,例如: `--language "Simplified Chinese"`,预设值为 `"Simplified Chinese"`.
- 本地放了一个 `test_books/animal_farm.epub` 给大家测试
- 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型,用 `--model gpt3` 来使用 gpt3 模型
- 可以使用 DeepL 封装的 api 进行翻译,需要付费,[DeepL Translator](https://rapidapi.com/splintPRO/api/deepl-translator) 来获得 token `--model deepl --deepl_key ${deepl_key}`
- 可以使用 google 来翻译 `--model google`
- 可用使用彩云进行翻译 `--model caiyun --caiyun_key ${caiyun_key}`
- 使用 `--test` 命令如果大家没付费可以加上这个先看看效果(有 limit 稍微有些慢)
- 使用 `--language` 指定目标语言,例如: `--language "Simplified Chinese"`,预设值为 `"Simplified Chinese"`.
请阅读 helper message 来查找可用的目标语言: `python make_book.py --help`
7. 使用 `--proxy` 参数,方便中国大陆的用户在本地测试时使用代理,传入类似 `http://127.0.0.1:7890` 的字符串
8. 使用 `--resume` 命令,可以手动中断后,加入命令继续执行。
9. epub 由 html 文件组成。默认情况下,我们只翻译 `<p>` 中的内容。
- 使用 `--proxy` 参数,方便中国大陆的用户在本地测试时使用代理,传入类似 `http://127.0.0.1:7890` 的字符串
- 使用 `--resume` 命令,可以手动中断后,加入命令继续执行。
- epub 由 html 文件组成。默认情况下,我们只翻译 `<p>` 中的内容。
使用 `--translate-tags` 指定需要翻译的标签。使用逗号分隔多个标签。例如:
`--translate-tags h1,h2,h3,p,div`
10. 请使用 --book_from 选项指定电子阅读器类型(现在只有 kobo 可用),并使用 --device_path 指定挂载点。
11. 如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。
- 请使用 --book_from 选项指定电子阅读器类型(现在只有 kobo 可用),并使用 --device_path 指定挂载点。
- 如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。
**请注意此处你输入的api应该是'`https://xxxx/v1`'的字样,域名需要用引号包裹**
12. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
13. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
14. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
15. 如果你想调整 prompt你可以使用 `--prompt` 参数。有效的占位符包括 `{text}``{language}`。你可以用以下方式配置 prompt
- 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
- 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
- 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
- 如果你想调整 prompt你可以使用 `--prompt` 参数。有效的占位符包括 `{text}``{language}`。你可以用以下方式配置 prompt
如果您不需要设置 `system` 角色,可以这样:`--prompt "Translate {text} to {language}" 或者 `--prompt prompt_template_sample.txt`(示例文本文件可以在 [./prompt_template_sample.txt](./prompt_template_sample.txt) 找到)。
如果您需要设置 `system` 角色,可以使用以下方式配置:`--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'`,或者 `--prompt prompt_template_sample.json`(示例 JSON 文件可以在 [./prompt_template_sample.json](./prompt_template_sample.json) 找到)。
你也可以用环境以下环境变量来配置 `system``user` 角色 prompt`BBM_CHATGPTAPI_USER_MSG_TEMPLATE` 和 `BBM_CHATGPTAPI_SYS_MSG`
该参数可以是提示模板字符串,也可以是模板 `.txt` 文件的路径。
16. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
17. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
18. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
19. 使用`--batch_size` 参数,指定批量翻译的行数(默认行数为10目前只对txt生效)
- 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书
- 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了
- 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书**
- 使用`--batch_size` 参数,指定批量翻译的行数(默认行数为10目前只对txt生效)
### 示范用例
**如果使用 `pip install bbook_maker` 以下命令都可以改成 `bbook args`**
@ -60,6 +65,10 @@ export OPENAI_API_KEY=${your_api_key}
# 或使用 gpt3 模型
python3 make_book.py --book_name test_books/animal_farm.epub --model gpt3 --language ja
# Use the DeepL model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_token ${deepl_token}--language ja
# Translate contents in <div> and <p>
python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags div,p
@ -78,7 +87,7 @@ python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch
# 使用彩云小译翻译(彩云api目前只支持: 简体中文 <-> 英文, 简体中文 <-> 日语)
# 彩云提供了测试token3975l6lr5pcbvidl6jl2
# 你可以参考这个教程申请自己的token (https://bobtranslate.com/service/translate/caiyun.html)
python3 make_book.py --model caiyun --openai_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub
python3 make_book.py --model caiyun --caiyun_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub
# 可以在环境变量中设置BBM_CAIYUN_API_KEY略过--openai_key
export BBM_CAIYUN_API_KEY=${your_api_key}
@ -96,8 +105,6 @@ python make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_b
1. Free trail 的 API token 有所限制,如果想要更快的速度,可以考虑付费方案
2. 欢迎提交 PR
3. 尤其是 batch translate 做完效果会好很多
4. DeepL 模型稍后更新
# 感谢

View File

@ -15,33 +15,34 @@ The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist u
## Use
1. `pip install -r requirements.txt` or `pip install -U bbook_maker`(you can use)
2. Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits.
- `pip install -r requirements.txt` or `pip install -U bbook_maker`(you can use)
- Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits.
Or, just set environment variable `BMM_OPENAI_API_KEY` instead.
3. A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
4. The default underlying model is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently. Use `--model gpt3` to change the underlying model to `GPT3`
5. Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time.
6. Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`.
- A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
- The default underlying model is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently. Use `--model gpt3` to change the underlying model to `GPT3`
5. support DeepL model [DeepL Translator](https://rapidapi.com/splintPRO/api/deepl-translator) need pay to get the token use `--model deepl --deepl_key ${deepl_key}`
- Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time.
- Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`.
Read available languages by helper message: `python make_book.py --help`
7. Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`.
8. Use `--resume` option to manually resume the process after an interruption.
9. epub is made of html files. By default, we only translate contents in `<p>`.
- Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`.
- Use `--resume` option to manually resume the process after an interruption.
- epub is made of html files. By default, we only translate contents in `<p>`.
Use `--translate-tags` to specify tags need for translation. Use comma to seperate multiple tags. For example:
`--translate-tags h1,h2,h3,p,div`
10. Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.
11. If you want to change api_base like using Cloudflare Workers, use `--api_base <URL>` to support it.
- Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.
- If you want to change api_base like using Cloudflare Workers, use `--api_base <URL>` to support it.
**Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.**
12. Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
13. If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
14. If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
15. To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt:
- Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
- If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
- If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
- To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt:
If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt` (example of a text file can be found at [./prompt_template_sample.txt](./prompt_template_sample.txt)).
If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json` (example of a JSON file can be found at [./prompt_template_sample.json](./prompt_template_sample.json)).
You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`.
16. Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
17. If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
18. If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
19. Use the `--batch_size` parameter to specify the number of lines for batch translation (default is 10, currently only effective for txt files).
- Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
- If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
- If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
- Use the `--batch_size` parameter to specify the number of lines for batch translation (default is 10, currently only effective for txt files).
### Examples
@ -60,6 +61,10 @@ export OPENAI_API_KEY=${your_api_key}
# Use the GPT-3 model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model gpt3 --language ja
# Use the DeepL model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_token ${deepl_token}--language ja
# Translate contents in <div> and <p>
python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags div,p
@ -135,7 +140,6 @@ docker run --rm --name bilingual_book_maker --mount type=bind,source=/home/user/
1. API token from free trial has limit. If you want to speed up the process, consider paying for the service or use multiple OpenAI tokens
2. PR is welcome
3. The DeepL model will be updated later.
# Thanks

View File

@ -52,6 +52,7 @@ def parse_prompt_arg(prompt_arg):
def main():
translate_model_list = list(MODEL_DICT.keys())
parser = argparse.ArgumentParser()
parser.add_argument(
"--book_name",
@ -73,6 +74,7 @@ def main():
type=str,
help="Path of e-reader device",
)
########## KEYS ##########
parser.add_argument(
"--openai_key",
dest="openai_key",
@ -81,6 +83,19 @@ def main():
help="OpenAI api key,if you have more than one key, please use comma"
" to split them to go beyond the rate limits",
)
parser.add_argument(
"--caiyun_key",
dest="caiyun_key",
type=str,
help="you can apply caiyun key from here (https://dashboard.caiyunapp.com/user/sign_in/)",
)
parser.add_argument(
"--deepl_key",
dest="deepl_key",
type=str,
help="you can apply deepl key from here (https://rapidapi.com/splintPRO/api/deepl-translator",
)
parser.add_argument(
"--test",
dest="test",
@ -100,7 +115,7 @@ def main():
dest="model",
type=str,
default="chatgptapi",
choices=["chatgptapi", "gpt3", "google", "caiyun"], # support DeepL later
choices=translate_model_list, # support DeepL later
metavar="MODEL",
help="model to use, available: {%(choices)s}",
)
@ -162,12 +177,6 @@ def main():
default=10,
help="how many lines will be translated by aggregated translation(This options currently only applies to txt files)",
)
parser.add_argument(
"--caiyun_key",
dest="caiyun_key",
type=str,
help="you can apply caiyun key from here (https://dashboard.caiyunapp.com/user/sign_in/)",
)
options = parser.parse_args()
PROXY = options.proxy
@ -196,6 +205,10 @@ def main():
API_KEY = options.caiyun_key or env.get("BBM_CAIYUN_API_KEY")
if not API_KEY:
raise Exception("Please provid caiyun key")
elif options.model == "deepl":
API_KEY = options.deepl_key or env.get("BBM_DEEPL_API_KEY")
if not API_KEY:
raise Exception("Please provid deepl key")
else:
API_KEY = ""

View File

@ -5,6 +5,7 @@ from copy import copy
from pathlib import Path
from bs4 import BeautifulSoup as bs
from bs4.element import NavigableString
from ebooklib import ITEM_DOCUMENT, epub
from rich import print
from tqdm import tqdm
@ -114,8 +115,12 @@ class EPUBBookLoader(BaseBookLoader):
if self.resume and index < p_to_save_len:
new_p.string = self.p_to_save[index]
else:
new_p.string = self.translate_model.translate(p.text)
self.p_to_save.append(new_p.text)
if type(p) == NavigableString:
new_p = self.translate_model.translate(p.text)
self.p_to_save.append(new_p)
else:
new_p.string = self.translate_model.translate(p.text)
self.p_to_save.append(new_p.text)
p.insert_after(new_p)
index += 1
if index % 20 == 0:
@ -164,7 +169,10 @@ class EPUBBookLoader(BaseBookLoader):
# PR welcome here
if index < p_to_save_len:
new_p = copy(p)
new_p.string = self.p_to_save[index]
if type(p) == NavigableString:
new_p = self.p_to_save[index]
else:
new_p.string = self.p_to_save[index]
p.insert_after(new_p)
index += 1
else:

View File

@ -71,7 +71,11 @@ class TXTBookLoader(BaseBookLoader):
if self.resume and index < p_to_save_len:
pass
else:
temp = self.translate_model.translate(batch_text)
try:
temp = self.translate_model.translate(batch_text)
except Exception as e:
print(str(e))
raise Exception("Something is wrong when translate")
self.p_to_save.append(temp)
self.bilingual_result.append(batch_text)
self.bilingual_result.append(temp)

View File

@ -2,11 +2,13 @@ from book_maker.translator.chatgptapi_translator import ChatGPTAPI
from book_maker.translator.google_translator import Google
from book_maker.translator.gpt3_translator import GPT3
from book_maker.translator.caiyun_translator import Caiyun
from book_maker.translator.deepl_translator import DeepL
MODEL_DICT = {
"chatgptapi": ChatGPTAPI,
"gpt3": GPT3,
"google": Google,
"caiyun": Caiyun
"caiyun": Caiyun,
"deepl": DeepL,
# add more here
}

View File

@ -1,5 +1,83 @@
import json
import time
import requests
from book_maker.utils import TO_LANGUAGE_CODE, LANGUAGES
from .base_translator import Base
class DeepL(Base):
pass
"""
caiyun translator
"""
def __init__(self, key, language, **kwargs):
super().__init__(key, language)
self.api_url = "https://deepl-translator.p.rapidapi.com/translate"
self.headers = {
"content-type": "application/json",
"X-RapidAPI-Key": "",
"X-RapidAPI-Host": "deepl-translator.p.rapidapi.com",
}
l = None
if language in LANGUAGES:
l = language
else:
l = TO_LANGUAGE_CODE.get(language)
if l not in [
"bg",
"zh",
"cs",
"da",
"nl",
"en-US",
"en-GB",
"et",
"fi",
"fr",
"de",
"el",
"hu",
"id",
"it",
"ja",
"lv",
"lt",
"pl",
"pt-PT",
"pt-BR",
"ro",
"ru",
"sk",
"sl",
"es",
"sv",
"tr",
"uk",
"ko",
"nb",
]:
raise Exception(f"DeepL do not support {l}")
self.language = l
def rotate_key(self):
self.headers["X-RapidAPI-Key"] = f"{next(self.keys)}"
def translate(self, text):
self.rotate_key()
print(text)
payload = {"text": text, "source": "EN", "target": self.language}
try:
response = requests.request(
"POST", self.api_url, data=json.dumps(payload), headers=self.headers
)
except Exception as e:
print(str(e))
time.sleep(30)
response = requests.request(
"POST", self.api_url, data=json.dumps(payload), headers=self.headers
)
t_text = response.json().get("text", "")
print(t_text)
return t_text

View File

@ -2,6 +2,7 @@
LANGUAGES = {
"en": "english",
"zh-hans": "simplified chinese",
"zh": "simplified chinese",
"zh-hant": "traditional chinese",
"de": "german",
"es": "spanish",

View File

@ -4,7 +4,7 @@ from setuptools import find_packages, setup
setup(
name="bbook_maker",
description="The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt files and books.",
version="0.1.0",
version="0.2.0",
license="MIT",
author="yihong0618",
author_email="zouzou0208@gmail.com",