From b42a33d9a8933db0927aa19dd6eaf36e72dc3fe8 Mon Sep 17 00:00:00 2001 From: umm <32997707+umm233@users.noreply.github.com> Date: Sat, 9 Nov 2024 12:46:22 +0800 Subject: [PATCH] Update README (#433) --- README-CN.md | 134 +++++++++++++++++++++++++++++++++++++-------- README.md | 152 ++++++++++++++++++++++++++++++++++++++------------- 2 files changed, 227 insertions(+), 59 deletions(-) diff --git a/README-CN.md b/README-CN.md index 427441c..07dce8a 100644 --- a/README-CN.md +++ b/README-CN.md @@ -30,57 +30,61 @@ bbook --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test - 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型,也就是 ChatGPT 正在使用的模型。 * DeepL + 使用 DeepL 封装的 api 进行翻译,需要付费。[DeepL Translator](https://rapidapi.com/splintPRO/api/dpl-translator) 来获得 token - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key} ``` * DeepL free + 使用 DeepL free - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model deeplfree ``` * Claude + 使用 [Claude](https://console.anthropic.com/docs) 模型进行翻译 - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key} ``` * 谷歌翻译 - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model google ``` * 彩云小译 - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model caiyun --caiyun_key ${caiyun_key} ``` * Gemini - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model gemini --gemini_key ${gemini_key} ``` * 腾讯交互翻译 - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model tencentransmart ``` * [xAI](https://x.ai) - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model xai --xai_key ${xai_key} ``` * [Ollama](https://github.com/ollama/ollama) + 使用 [Ollama](https://github.com/ollama/ollama) 自托管模型进行翻译。 如果 ollama server 不运行在本地,使用 `--api_base http://x.x.x.x:port/v1` 指向 ollama server 地址 @@ -88,9 +92,11 @@ bbook --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test python3 make_book.py --book_name test_books/animal_farm.epub --ollama_model ${ollama_model_name} ``` -* groq +* [Groq](https://console.groq.com/keys) - ``` + GroqCloud 当前支持的模型可以查看[Supported Models](https://console.groq.com/docs/models) + + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --groq_key [your_key] --model groq --model_list llama3-8b-8192 ``` @@ -101,24 +107,108 @@ bbook --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test ## 参数说明 -- `--test`: 如果大家没付费可以加上这个先看看效果(有 limit 稍微有些慢) +- `--test`: + + 如果大家没付费可以加上这个先看看效果(有 limit 稍微有些慢) + - `--language`: 指定目标语言 + - 例如: `--language "Simplified Chinese"`,预设值为 `"Simplified Chinese"`. - 请阅读 helper message 来查找可用的目标语言: `python make_book.py --help` -- `--proxy` 参数,方便中国大陆的用户在本地测试时使用代理,传入类似 `http://127.0.0.1:7890` 的字符串 -- 使用`--batch_size` 参数,指定批量翻译的行数(默认行数为 10,目前只对 txt 生效) -- `--resume` 命令,可以手动中断后,加入命令继续执行。 -- `--translate-tags` 指定需要翻译的标签,使用逗号分隔多个标签。epub 由 html 文件组成,默认情况下,只翻译 `

` 中的内容。例如: `--translate-tags h1,h2,h3,p,div` -- `--book_from` 选项指定电子阅读器类型(现在只有 kobo 可用),并使用 `--device_path` 指定挂载点。 -- `--api_base ${url}`: 如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。 + +- `--proxy` + + 方便中国大陆的用户在本地测试时使用代理,传入类似 `http://127.0.0.1:7890` 的字符串 + +- `--resume` + + 手动中断后,加入命令可以从之前中断的位置继续执行。 + + ```shell + python3 make_book.py --book_name test_books/animal_farm.epub --model google --resume + ``` + +- `--translate-tags` + + 指定需要翻译的标签,使用逗号分隔多个标签。epub 由 html 文件组成,默认情况下,只翻译 `

` 中的内容。例如: `--translate-tags h1,h2,h3,p,div` + +- `--book_from` + + 选项指定电子阅读器类型(现在只有 kobo 可用),并使用 `--device_path` 指定挂载点。 + +- `--api_base ${url}` + + 如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。 **请注意,此处你输入的 api 应该是'`https://xxxx/v1`'的字样,域名需要用引号包裹** -- `--allow_navigable_strings`: 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书** -- `--prompt`: 如果你想调整 prompt,你可以使用 `--prompt` 参数。有效的占位符包括 `{text}` 和 `{language}`。你可以用以下方式配置 prompt: - 如果您不需要设置 `system` 角色,可以这样:`--prompt "Translate {text} to {language}"` 或者 `--prompt prompt_template_sample.txt`(示例文本文件可以在 [./prompt_template_sample.txt](./prompt_template_sample.txt) 找到)。 - 如果您需要设置 `system` 角色,可以使用以下方式配置:`--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'`,或者 `--prompt prompt_template_sample.json`(示例 JSON 文件可以在 [./prompt_template_sample.json](./prompt_template_sample.json) 找到)。 - 你也可以用环境以下环境变量来配置 `system` 和 `user` 角色 prompt:`BBM_CHATGPTAPI_USER_MSG_TEMPLATE` 和 `BBM_CHATGPTAPI_SYS_MSG`。 + +- `--allow_navigable_strings` + + 如果你想要翻译电子书中的无标签字符串,可以使用 `--allow_navigable_strings` 参数,会将可遍历字符串加入翻译队列,**注意,在条件允许情况下,请寻找更规范的电子书** + +- `--prompt` + + 如果你想调整 prompt,你可以使用 `--prompt` 参数。有效的占位符包括 `{text}` 和 `{language}`。你可以用以下方式配置 prompt: + + - 如果您不需要设置 `system` 角色,可以这样:`--prompt "Translate {text} to {language}"` 或者 `--prompt prompt_template_sample.txt`(示例文本文件可以在 [./prompt_template_sample.txt](./prompt_template_sample.txt) 找到)。 + + - 如果您需要设置 `system` 角色,可以使用以下方式配置:`--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'`,或者 `--prompt prompt_template_sample.json`(示例 JSON 文件可以在 [./prompt_template_sample.json](./prompt_template_sample.json) 找到)。 + + - 你也可以用环境以下环境变量来配置 `system` 和 `user` 角色 prompt:`BBM_CHATGPTAPI_USER_MSG_TEMPLATE` 和 `BBM_CHATGPTAPI_SYS_MSG`。 该参数可以是提示模板字符串,也可以是模板 `.txt` 文件的路径。 +- `--batch_size` + + 指定批量翻译的行数(默认行数为 10,目前只对 txt 生效) + +- `--accumulated_num`: + + 达到累计token数开始进行翻译。gpt3.5将total_token限制为4090。 + 例如,如果您使用`--accumulation_num 1600`,则可能会输出2200个令牌,另外200个令牌用于系统指令(system_message)和用户指令(user_message),1600+2200+200 = 4000,所以token接近极限。你必须选择一个自己合适的值,我们无法在发送之前判断是否达到限制 + +- `--use_context`: + + prompts the model to create a three-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`). + For subsequent passages, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work. This improves consistency of flow and tone throughout the translation. This option is available for all ChatGPT-compatible models and Gemini models. + + 模型提示词将创建三段摘要。如果是翻译的开始,它将总结发送的整个段落(大小取决于`--accumulated_num`)。 + 对于后续的段落,它将修改摘要,以包括最近段落的细节,创建一个完整的段落上下文负载,包含整个翻译作品的重要细节。 这提高了整个翻译过程中的流畅性和语气的一致性。 此选项适用于所有ChatGPT兼容型号和Gemini型号。 + + - `--context_paragraph_limit`: + + 使用`--use_context`选项时,使用`--context_paragraph_limit`设置上下文段落数限制。 + +- `--temperature`: + + 使用 `--temperature` 设置 `chatgptapi`/`gpt4`/`claude`模型的temperature值. + 如 `--temperature 0.7`. + +- `--block_size`: + + 使用`--block_size`将多个段落合并到一个块中。这可能会提高准确性并加快处理速度,但可能会干扰原始格式。必须与`--single_translate`一起使用。 + 例如:`--block_size 5 --single_translate`。 + +- `--single_translate`: + + 使用`--single_translate`只输出翻译后的图书,不创建双语版本。 + +- `--translation_style`: + + 如: `--translation_style "color: #808080; font-style: italic;"` + +- `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`: + + - 重新翻译,从 start_str 到 end_str 的标记: + + ```shell + python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously' + ``` + + - 重新翻译, 从start_str 的标记开始: + + ```shell + python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' + ``` + ### 示范用例 **如果使用 `pip install bbook_maker` 以下命令都可以改成 `bbook args`** diff --git a/README.md b/README.md index 715421d..1c28fb6 100644 --- a/README.md +++ b/README.md @@ -51,59 +51,65 @@ bbook --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test * DeepL free - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model deeplfree ``` * [Claude](https://console.anthropic.com/docs) - 使用 [Claude](https://console.anthropic.com/docs) 模型进行翻译 - ``` + Use [Claude](https://console.anthropic.com/docs) model to translate + + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key} ``` -* 谷歌翻译 +* Google Translate - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model google ``` -* 彩云小译 +* Caiyun Translate - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model caiyun --caiyun_key ${caiyun_key} ``` * Gemini + Support Google [Gemini](https://aistudio.google.com/app/apikey) model, use `--model gemini` for Gemini Flash or `--model geminipro` for Gemini Pro. If you want to use a specific model alias with Gemini (eg `gemini-1.5-flash-002` or `gemini-1.5-flash-8b-exp-0924`), you can use `--model gemini --model_list gemini-1.5-flash-002,gemini-1.5-flash-8b-exp-0924`. `--model_list` takes a comma-separated list of model aliases. - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model gemini --gemini_key ${gemini_key} ``` * [Tencent TranSmart](https://transmart.qq.com) - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model tencentransmart ``` * [xAI](https://x.ai) - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --model xai --xai_key ${xai_key} ``` -* Support [Ollama](https://github.com/ollama/ollama) self-host models, +* [Ollama](https://github.com/ollama/ollama) + + Support [Ollama](https://github.com/ollama/ollama) self-host models, If ollama server is not running on localhost, use `--api_base http://x.x.x.x:port/v1` to point to the ollama server address - ``` + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --ollama_model ${ollama_model_name} ``` -* groq +* [groq](https://console.groq.com/keys) - ``` + GroqCloud currently supports models: you can find from [Supported Models](https://console.groq.com/docs/models) + + ```shell python3 make_book.py --book_name test_books/animal_farm.epub --groq_key [your_key] --model groq --model_list llama3-8b-8192 ``` @@ -114,33 +120,105 @@ bbook --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test ## Params -- Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time. -- Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`. +- `--test`: + + Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time. + +- `--language`: + + Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`. Read available languages by helper message: `python make_book.py --help` -- Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`. -- Use `--resume` option to manually resume the process after an interruption. -`--translate-tags`: epub is made of html files. By default, we only translate contents in `

`. - Use `--translate-tags` to specify tags need for translation. Use comma to separate multiple tags. For example: - `--translate-tags h1,h2,h3,p,div` -- Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point. -`--api_base`: If you want to change api_base like using Cloudflare Workers, use `--api_base ` to support it. - **Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.** -`--allow_navigable_strings`: If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.** -`--prompt`: To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt: - If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt` (example of a text file can be found at [./prompt_template_sample.txt](./prompt_template_sample.txt)). - If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json` (example of a JSON file can be found at [./prompt_template_sample.json](./prompt_template_sample.json)). - You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`. -- Use the `--batch_size` parameter to specify the number of lines for batch translation (default is 10, currently only effective for txt files). -- `--accumulated_num` Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090. For example, if you use --accumulated_num 1600, maybe openai will - output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000, So you are close to reaching the limit. You have to choose your own + +- `--proxy`: + + Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`. + +- `--resume`: + + Use `--resume` option to manually resume the process after an interruption. + + ```shell + python3 make_book.py --book_name test_books/animal_farm.epub --model google --resume + ``` + +- `--translate-tags`: + + epub is made of html files. By default, we only translate contents in `

`. + Use `--translate-tags` to specify tags need for translation. Use comma to separate multiple tags. + For example: `--translate-tags h1,h2,h3,p,div` + +- `--book_from`: + + Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point. + +- `--api_base`: + + If you want to change api_base like using Cloudflare Workers, use `--api_base ` to support it. + **Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.** + +- `--allow_navigable_strings`: + + If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.** + +- `--prompt`: + + To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt: + + - If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt` (example of a text file can be found at [./prompt_template_sample.txt](./prompt_template_sample.txt)). + + - If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json` (example of a JSON file can be found at [./prompt_template_sample.json](./prompt_template_sample.json)). + + - You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`. + +- `--batch_size`: + + Use the `--batch_size` parameter to specify the number of lines for batch translation (default is 10, currently only effective for txt files). + +- `--accumulated_num`: + + Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090. For example, if you use `--accumulated_num 1600`, maybe openai will output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000, So you are close to reaching the limit. You have to choose your own value, there is no way to know if the limit is reached before sending -- `--use_context` prompts the model to create a three-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`). For subsequent passages, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work. This improves consistency of flow and tone throughout the translation. This option is available for all ChatGPT-compatible models and Gemini models. -- Use `--context_paragraph_limit` to set a limit on the number of context paragraphs when using the `--use_context` option. -- Use `--temperature` to set the temperature parameter for `chatgptapi`/`gpt4`/`claude` models. For example: `--temperature 0.7`. -- Use `--block_size` to merge multiple paragraphs into one block. This may increase accuracy and speed up the process but can disturb the original format. Must be used with `--single_translate`. For example: `--block_size 5`. -- Use `--single_translate` to output only the translated book without creating a bilingual version. -- `--translation_style` example: `--translation_style "color: #808080; font-style: italic;"` -- `--retranslate` `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`
+ +- `--use_context`: + + prompts the model to create a three-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`). + For subsequent passages, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work. This improves consistency of flow and tone throughout the translation. This option is available for all ChatGPT-compatible models and Gemini models. + +- `--context_paragraph_limit`: + + Use `--context_paragraph_limit` to set a limit on the number of context paragraphs when using the `--use_context` option. + +- `--temperature`: + + Use `--temperature` to set the temperature parameter for `chatgptapi`/`gpt4`/`claude` models. + For example: `--temperature 0.7`. + +- `--block_size`: + + Use `--block_size` to merge multiple paragraphs into one block. This may increase accuracy and speed up the process but can disturb the original format. Must be used with `--single_translate`. + For example: `--block_size 5 --single_translate`. + +- `--single_translate`: + + Use `--single_translate` to output only the translated book without creating a bilingual version. + +- `--translation_style`: + + example: `--translation_style "color: #808080; font-style: italic;"` + +- `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`: + Retranslate from start_str to end_str's tag: - `python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously'`
+ + ```shell + python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously' + ``` + Retranslate start_str's tag: - `python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'` + + ```shell + python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' + ``` ### Examples