diff --git a/.github/workflows/make_test_ebook.yaml b/.github/workflows/make_test_ebook.yaml index 23c0105..6c77353 100644 --- a/.github/workflows/make_test_ebook.yaml +++ b/.github/workflows/make_test_ebook.yaml @@ -11,6 +11,7 @@ env: ACTIONS_ALLOW_UNSECURE_COMMANDS: true OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} BBM_CAIYUN_API_KEY: ${{ secrets.BBM_CAIYUN_API_KEY }} + BBM_DEEPL_API_KEY: ${{ secrets.BBM_DEEPL_API_KEY }} jobs: testing: @@ -51,8 +52,13 @@ jobs: run: | python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --batch_size 30 --test_num 20 --model caiyun + - name: make deepl translator test + if: env.BBM_CAIYUN_API_KEY != null + run: | + python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --batch_size 30 --test_num 20 --model deepl + - name: make openai key ebook test - if: env.OPENAI_API_KEY != null + if: env.BBM_DEEPL_API_KEY != null run: | python3 make_book.py --book_name "test_books/lemo.epub" --test --test_num 5 --language zh-hans python3 make_book.py --book_name "test_books/animal_farm.epub" --test --test_num 5 --language ja --model gpt3 --prompt prompt_template_sample.txt diff --git a/README-CN.md b/README-CN.md index e65a9b8..58ba915 100644 --- a/README-CN.md +++ b/README-CN.md @@ -15,34 +15,39 @@ bilingual_book_maker 是一个 AI 翻译工具,使用 ChatGPT 帮助用户制 ## 使用 -1. `pip install -r requirements.txt` 或 `pip install -U bbook_maker` -2. 使用 `--openai_key` 指定 OpenAI API key,如果有多个可以用英文逗号分隔(xxx,xxx,xxx),可以减少接口调用次数限制带来的错误。 +- `pip install -r requirements.txt` 或 `pip install -U bbook_maker` +- 使用 `--openai_key` 指定 OpenAI API key,如果有多个可以用英文逗号分隔(xxx,xxx,xxx),可以减少接口调用次数限制带来的错误。 或者,指定环境变量 `BMM_OPENAI_API_KEY` 来略过这个选项。 -3. 本地放了一个 `test_books/animal_farm.epub` 给大家测试 -4. 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型,用 `--model gpt3` 来使用 gpt3 模型 -5. 使用 `--test` 命令如果大家没付费可以加上这个先看看效果(有 limit 稍微有些慢) -6. 使用 `--language` 指定目标语言,例如: `--language "Simplified Chinese"`,预设值为 `"Simplified Chinese"`. +- 本地放了一个 `test_books/animal_farm.epub` 给大家测试 +- 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型,用 `--model gpt3` 来使用 gpt3 模型 +- 可以使用 DeepL 封装的 api 进行翻译,需要付费,[DeepL Translator](https://rapidapi.com/splintPRO/api/deepl-translator) 来获得 token `--model deepl --deepl_key ${deepl_key}` +- 可以使用 google 来翻译 `--model google` +- 可用使用彩云进行翻译 `--model caiyun --caiyun_key ${caiyun_key}` +- 使用 `--test` 命令如果大家没付费可以加上这个先看看效果(有 limit 稍微有些慢) +- 使用 `--language` 指定目标语言,例如: `--language "Simplified Chinese"`,预设值为 `"Simplified Chinese"`. 请阅读 helper message 来查找可用的目标语言: `python make_book.py --help` -7. 使用 `--proxy` 参数,方便中国大陆的用户在本地测试时使用代理,传入类似 `http://127.0.0.1:7890` 的字符串 -8. 使用 `--resume` 命令,可以手动中断后,加入命令继续执行。 -9. epub 由 html 文件组成。默认情况下,我们只翻译 `
` 中的内容。 +- 使用 `--proxy` 参数,方便中国大陆的用户在本地测试时使用代理,传入类似 `http://127.0.0.1:7890` 的字符串 +- 使用 `--resume` 命令,可以手动中断后,加入命令继续执行。 +- epub 由 html 文件组成。默认情况下,我们只翻译 `
` 中的内容。 使用 `--translate-tags` 指定需要翻译的标签。使用逗号分隔多个标签。例如: `--translate-tags h1,h2,h3,p,div` -10. 请使用 --book_from 选项指定电子阅读器类型(现在只有 kobo 可用),并使用 --device_path 指定挂载点。 -11. 如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。 +- 请使用 --book_from 选项指定电子阅读器类型(现在只有 kobo 可用),并使用 --device_path 指定挂载点。 +- 如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。 **请注意,此处你输入的api应该是'`https://xxxx/v1`'的字样,域名需要用引号包裹** -12. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书 -13. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了 -14. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书** -15. 如果你想调整 prompt,你可以使用 `--prompt` 参数。有效的占位符包括 `{text}` 和 `{language}`。你可以用以下方式配置 prompt: +- 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书 +- 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了 +- 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书** +- 如果你想调整 prompt,你可以使用 `--prompt` 参数。有效的占位符包括 `{text}` 和 `{language}`。你可以用以下方式配置 prompt: 如果您不需要设置 `system` 角色,可以这样:`--prompt "Translate {text} to {language}" 或者 `--prompt prompt_template_sample.txt`(示例文本文件可以在 [./prompt_template_sample.txt](./prompt_template_sample.txt) 找到)。 如果您需要设置 `system` 角色,可以使用以下方式配置:`--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'`,或者 `--prompt prompt_template_sample.json`(示例 JSON 文件可以在 [./prompt_template_sample.json](./prompt_template_sample.json) 找到)。 你也可以用环境以下环境变量来配置 `system` 和 `user` 角色 prompt:`BBM_CHATGPTAPI_USER_MSG_TEMPLATE` 和 `BBM_CHATGPTAPI_SYS_MSG`。 该参数可以是提示模板字符串,也可以是模板 `.txt` 文件的路径。 -16. 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书 -17. 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了 -18. 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书** -19. 使用`--batch_size` 参数,指定批量翻译的行数(默认行数为10,目前只对txt生效) +- 翻译完会生成一本 ${book_name}_bilingual.epub 的双语书 +- 如果出现了错误或使用 `CTRL+C` 中断命令,不想接下来继续翻译了,会生成一本 ${book_name}_bilingual_temp.epub 的书,直接改成你想要的名字就可以了 +- 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书** +- 使用`--batch_size` 参数,指定批量翻译的行数(默认行数为10,目前只对txt生效) + + ### 示范用例 **如果使用 `pip install bbook_maker` 以下命令都可以改成 `bbook args`** @@ -60,6 +65,10 @@ export OPENAI_API_KEY=${your_api_key} # 或使用 gpt3 模型 python3 make_book.py --book_name test_books/animal_farm.epub --model gpt3 --language ja +# Use the DeepL model with Japanese +python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_token ${deepl_token}--language ja + + # Translate contents in
python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags div,p @@ -78,7 +87,7 @@ python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch # 使用彩云小译翻译(彩云api目前只支持: 简体中文 <-> 英文, 简体中文 <-> 日语) # 彩云提供了测试token(3975l6lr5pcbvidl6jl2) # 你可以参考这个教程申请自己的token (https://bobtranslate.com/service/translate/caiyun.html) -python3 make_book.py --model caiyun --openai_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub +python3 make_book.py --model caiyun --caiyun_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub # 可以在环境变量中设置BBM_CAIYUN_API_KEY,略过--openai_key export BBM_CAIYUN_API_KEY=${your_api_key} @@ -96,8 +105,6 @@ python make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_b 1. Free trail 的 API token 有所限制,如果想要更快的速度,可以考虑付费方案 2. 欢迎提交 PR -3. 尤其是 batch translate 做完效果会好很多 -4. DeepL 模型稍后更新 # 感谢 diff --git a/README.md b/README.md index d24b4a7..26f01c6 100644 --- a/README.md +++ b/README.md @@ -15,33 +15,34 @@ The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist u ## Use -1. `pip install -r requirements.txt` or `pip install -U bbook_maker`(you can use) -2. Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits. +- `pip install -r requirements.txt` or `pip install -U bbook_maker`(you can use) +- Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits. Or, just set environment variable `BMM_OPENAI_API_KEY` instead. -3. A sample book, `test_books/animal_farm.epub`, is provided for testing purposes. -4. The default underlying model is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently. Use `--model gpt3` to change the underlying model to `GPT3` -5. Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time. -6. Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`. +- A sample book, `test_books/animal_farm.epub`, is provided for testing purposes. +- The default underlying model is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently. Use `--model gpt3` to change the underlying model to `GPT3` +5. support DeepL model [DeepL Translator](https://rapidapi.com/splintPRO/api/deepl-translator) need pay to get the token use `--model deepl --deepl_key ${deepl_key}` +- Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time. +- Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`. Read available languages by helper message: `python make_book.py --help` -7. Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`. -8. Use `--resume` option to manually resume the process after an interruption. -9. epub is made of html files. By default, we only translate contents in `
`. +- Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`. +- Use `--resume` option to manually resume the process after an interruption. +- epub is made of html files. By default, we only translate contents in `
`.
Use `--translate-tags` to specify tags need for translation. Use comma to seperate multiple tags. For example:
`--translate-tags h1,h2,h3,p,div`
-10. Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.
-11. If you want to change api_base like using Cloudflare Workers, use `--api_base
python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags div,p
@@ -135,7 +140,6 @@ docker run --rm --name bilingual_book_maker --mount type=bind,source=/home/user/
1. API token from free trial has limit. If you want to speed up the process, consider paying for the service or use multiple OpenAI tokens
2. PR is welcome
-3. The DeepL model will be updated later.
# Thanks
diff --git a/book_maker/cli.py b/book_maker/cli.py
index 7e04c53..a837f8d 100644
--- a/book_maker/cli.py
+++ b/book_maker/cli.py
@@ -52,6 +52,7 @@ def parse_prompt_arg(prompt_arg):
def main():
+ translate_model_list = list(MODEL_DICT.keys())
parser = argparse.ArgumentParser()
parser.add_argument(
"--book_name",
@@ -73,6 +74,7 @@ def main():
type=str,
help="Path of e-reader device",
)
+ ########## KEYS ##########
parser.add_argument(
"--openai_key",
dest="openai_key",
@@ -81,6 +83,19 @@ def main():
help="OpenAI api key,if you have more than one key, please use comma"
" to split them to go beyond the rate limits",
)
+ parser.add_argument(
+ "--caiyun_key",
+ dest="caiyun_key",
+ type=str,
+ help="you can apply caiyun key from here (https://dashboard.caiyunapp.com/user/sign_in/)",
+ )
+ parser.add_argument(
+ "--deepl_key",
+ dest="deepl_key",
+ type=str,
+ help="you can apply deepl key from here (https://rapidapi.com/splintPRO/api/deepl-translator",
+ )
+
parser.add_argument(
"--test",
dest="test",
@@ -100,7 +115,7 @@ def main():
dest="model",
type=str,
default="chatgptapi",
- choices=["chatgptapi", "gpt3", "google", "caiyun"], # support DeepL later
+ choices=translate_model_list, # support DeepL later
metavar="MODEL",
help="model to use, available: {%(choices)s}",
)
@@ -162,12 +177,6 @@ def main():
default=10,
help="how many lines will be translated by aggregated translation(This options currently only applies to txt files)",
)
- parser.add_argument(
- "--caiyun_key",
- dest="caiyun_key",
- type=str,
- help="you can apply caiyun key from here (https://dashboard.caiyunapp.com/user/sign_in/)",
- )
options = parser.parse_args()
PROXY = options.proxy
@@ -196,6 +205,10 @@ def main():
API_KEY = options.caiyun_key or env.get("BBM_CAIYUN_API_KEY")
if not API_KEY:
raise Exception("Please provid caiyun key")
+ elif options.model == "deepl":
+ API_KEY = options.deepl_key or env.get("BBM_DEEPL_API_KEY")
+ if not API_KEY:
+ raise Exception("Please provid deepl key")
else:
API_KEY = ""
diff --git a/book_maker/loader/epub_loader.py b/book_maker/loader/epub_loader.py
index 2af96ab..92b811c 100644
--- a/book_maker/loader/epub_loader.py
+++ b/book_maker/loader/epub_loader.py
@@ -5,6 +5,7 @@ from copy import copy
from pathlib import Path
from bs4 import BeautifulSoup as bs
+from bs4.element import NavigableString
from ebooklib import ITEM_DOCUMENT, epub
from rich import print
from tqdm import tqdm
@@ -114,8 +115,12 @@ class EPUBBookLoader(BaseBookLoader):
if self.resume and index < p_to_save_len:
new_p.string = self.p_to_save[index]
else:
- new_p.string = self.translate_model.translate(p.text)
- self.p_to_save.append(new_p.text)
+ if type(p) == NavigableString:
+ new_p = self.translate_model.translate(p.text)
+ self.p_to_save.append(new_p)
+ else:
+ new_p.string = self.translate_model.translate(p.text)
+ self.p_to_save.append(new_p.text)
p.insert_after(new_p)
index += 1
if index % 20 == 0:
@@ -164,7 +169,10 @@ class EPUBBookLoader(BaseBookLoader):
# PR welcome here
if index < p_to_save_len:
new_p = copy(p)
- new_p.string = self.p_to_save[index]
+ if type(p) == NavigableString:
+ new_p = self.p_to_save[index]
+ else:
+ new_p.string = self.p_to_save[index]
p.insert_after(new_p)
index += 1
else:
diff --git a/book_maker/loader/txt_loader.py b/book_maker/loader/txt_loader.py
index f5238a7..67b8937 100644
--- a/book_maker/loader/txt_loader.py
+++ b/book_maker/loader/txt_loader.py
@@ -71,7 +71,11 @@ class TXTBookLoader(BaseBookLoader):
if self.resume and index < p_to_save_len:
pass
else:
- temp = self.translate_model.translate(batch_text)
+ try:
+ temp = self.translate_model.translate(batch_text)
+ except Exception as e:
+ print(str(e))
+ raise Exception("Something is wrong when translate")
self.p_to_save.append(temp)
self.bilingual_result.append(batch_text)
self.bilingual_result.append(temp)
diff --git a/book_maker/translator/__init__.py b/book_maker/translator/__init__.py
index 3eb54d1..b345d46 100644
--- a/book_maker/translator/__init__.py
+++ b/book_maker/translator/__init__.py
@@ -2,11 +2,13 @@ from book_maker.translator.chatgptapi_translator import ChatGPTAPI
from book_maker.translator.google_translator import Google
from book_maker.translator.gpt3_translator import GPT3
from book_maker.translator.caiyun_translator import Caiyun
+from book_maker.translator.deepl_translator import DeepL
MODEL_DICT = {
"chatgptapi": ChatGPTAPI,
"gpt3": GPT3,
"google": Google,
- "caiyun": Caiyun
+ "caiyun": Caiyun,
+ "deepl": DeepL,
# add more here
}
diff --git a/book_maker/translator/deepl_translator.py b/book_maker/translator/deepl_translator.py
index b692769..817c3f1 100644
--- a/book_maker/translator/deepl_translator.py
+++ b/book_maker/translator/deepl_translator.py
@@ -1,5 +1,83 @@
+import json
+import time
+
+import requests
+
+from book_maker.utils import TO_LANGUAGE_CODE, LANGUAGES
from .base_translator import Base
class DeepL(Base):
- pass
+ """
+ caiyun translator
+ """
+
+ def __init__(self, key, language, **kwargs):
+ super().__init__(key, language)
+ self.api_url = "https://deepl-translator.p.rapidapi.com/translate"
+ self.headers = {
+ "content-type": "application/json",
+ "X-RapidAPI-Key": "",
+ "X-RapidAPI-Host": "deepl-translator.p.rapidapi.com",
+ }
+ l = None
+ if language in LANGUAGES:
+ l = language
+ else:
+ l = TO_LANGUAGE_CODE.get(language)
+ if l not in [
+ "bg",
+ "zh",
+ "cs",
+ "da",
+ "nl",
+ "en-US",
+ "en-GB",
+ "et",
+ "fi",
+ "fr",
+ "de",
+ "el",
+ "hu",
+ "id",
+ "it",
+ "ja",
+ "lv",
+ "lt",
+ "pl",
+ "pt-PT",
+ "pt-BR",
+ "ro",
+ "ru",
+ "sk",
+ "sl",
+ "es",
+ "sv",
+ "tr",
+ "uk",
+ "ko",
+ "nb",
+ ]:
+ raise Exception(f"DeepL do not support {l}")
+ self.language = l
+
+ def rotate_key(self):
+ self.headers["X-RapidAPI-Key"] = f"{next(self.keys)}"
+
+ def translate(self, text):
+ self.rotate_key()
+ print(text)
+ payload = {"text": text, "source": "EN", "target": self.language}
+ try:
+ response = requests.request(
+ "POST", self.api_url, data=json.dumps(payload), headers=self.headers
+ )
+ except Exception as e:
+ print(str(e))
+ time.sleep(30)
+ response = requests.request(
+ "POST", self.api_url, data=json.dumps(payload), headers=self.headers
+ )
+ t_text = response.json().get("text", "")
+ print(t_text)
+ return t_text
diff --git a/book_maker/utils.py b/book_maker/utils.py
index cfa74a4..ca5ac97 100644
--- a/book_maker/utils.py
+++ b/book_maker/utils.py
@@ -2,6 +2,7 @@
LANGUAGES = {
"en": "english",
"zh-hans": "simplified chinese",
+ "zh": "simplified chinese",
"zh-hant": "traditional chinese",
"de": "german",
"es": "spanish",
diff --git a/setup.py b/setup.py
index 8912e24..6ff077f 100644
--- a/setup.py
+++ b/setup.py
@@ -4,7 +4,7 @@ from setuptools import find_packages, setup
setup(
name="bbook_maker",
description="The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt files and books.",
- version="0.1.0",
+ version="0.2.0",
license="MIT",
author="yihong0618",
author_email="zouzou0208@gmail.com",