mirror of
https://github.com/yihong0618/bilingual_book_maker.git
synced 2025-06-01 00:50:12 +00:00
parent
d8ad734888
commit
c7ee4acb14
5
Makefile
5
Makefile
@ -7,4 +7,7 @@ fmt:
|
||||
.PHONY:tests
|
||||
tests:
|
||||
@echo "Running tests ..."
|
||||
venv/bin/pytest tests/test_integration.py
|
||||
venv/bin/pytest tests/test_integration.py
|
||||
|
||||
serve-docs:
|
||||
mkdocs serve
|
||||
|
@ -51,7 +51,7 @@ Find more info here for using liteLLM: https://github.com/BerriAI/litellm/blob/m
|
||||
- `--accumulated_num` Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090. For example, if you use --accumulated_num 1600, maybe openai will
|
||||
output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000, So you are close to reaching the limit. You have to choose your own
|
||||
value, there is no way to know if the limit is reached before sending
|
||||
- `--use_context` prompts the GPT4 model to create a one-paragraph summary. If it's the beginning of the translation, it will summarise the entire passage sent (the size depending on `--accumulated_num`), but if it's any proceeding passage, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work, which improves consistency of flow and tone of each translation.
|
||||
- `--use_context` prompts the GPT4 model to create a one-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`), but if it's any proceeding passage, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work, which improves consistency of flow and tone of each translation.
|
||||
- `--translation_style` example: `--translation_style "color: #808080; font-style: italic;"`
|
||||
- `--retranslate` `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`<br>
|
||||
Retranslate from start_str to end_str's tag:
|
||||
@ -60,7 +60,7 @@ Retranslate start_str's tag:
|
||||
`python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'`
|
||||
### Examples
|
||||
|
||||
**Note if use `pip install bbook_maker` all commands can change to `bbook args`**
|
||||
**Note if use `pip install bbook_maker` all commands can change to `bbook_maker args`**
|
||||
|
||||
```shell
|
||||
# Test quickly
|
||||
|
@ -154,6 +154,7 @@ def main():
|
||||
# args to change api_base
|
||||
parser.add_argument(
|
||||
"--api_base",
|
||||
metavar="API_BASE_URL",
|
||||
dest="api_base",
|
||||
type=str,
|
||||
help="specify base url other than the OpenAI's official API address",
|
||||
|
@ -11,4 +11,4 @@ If you have any concerns or suggestions about the use of this project, please co
|
||||
1. 该项目设计目的是为了帮助用户制作多语言版本的epub文件和图书,仅适用于进入公共版权领域书籍,不适用于有版权的书籍。我们强烈建议用户在使用该项目时仔细阅读其版权信息并遵守相关法律和规定,以保护自己和他人的权益。
|
||||
2. 在任何情况下,作者和开发者不对因使用该项目而导致的任何损失或损害承担任何责任。使用该项目的风险由用户自行承担。用户必须在使用该项目之前,确认其已获得了原著作权人的许可或使用了公开可用的开源EPUB文件,以避免可能存在的版权风险。
|
||||
|
||||
如果您对该项目的使用有任何疑虑或建议,请通过 issus 与我们联系。
|
||||
如果您对该项目的使用有任何疑虑或建议,请通过 issues 与我们联系。
|
||||
|
20
docs/book_source.md
Normal file
20
docs/book_source.md
Normal file
@ -0,0 +1,20 @@
|
||||
# Translate from Different Sources
|
||||
|
||||
## txt/srt
|
||||
Txt files and srt files are plain text files. This program can translate plain text.
|
||||
|
||||
python3 make_book.py --book_name test_books/the_little_prince.txt --test --language zh-hans
|
||||
|
||||
## epub
|
||||
epub is made of html files. By default, we only translate contents in `<p>`. Use `--translate-tags` to specify tags need for translation. Use comma to seperate multiple tags. For example: `--translate-tags h1,h2,h3,p,div`
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --openai_key ${openai_key} --translate-tags div,p
|
||||
|
||||
If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. <br>
|
||||
**Note that it's best to look for e-books that are more standardized if possible.**
|
||||
|
||||
## e-reader
|
||||
Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.
|
||||
|
||||
# Translate books download from Rakuten Kobo on kobo e-reader
|
||||
bbook_maker --book_from kobo --device_path /tmp/kobo
|
101
docs/cmd.md
Normal file
101
docs/cmd.md
Normal file
@ -0,0 +1,101 @@
|
||||
# Command Line Options
|
||||
|
||||
## Test translate
|
||||
`--test` <br>
|
||||
|
||||
Use this option to preview the result if you haven't paid for the service or just want to test. Note that there is a limit and it may take some time.
|
||||
|
||||
```sh
|
||||
bbook_maker --book_name test_books/Lex_Fridman_episode_322.srt --openai_key ${openai_key} --test
|
||||
```
|
||||
|
||||
```sh
|
||||
bbook_maker --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test --language zh-hans
|
||||
```
|
||||
|
||||
`--test_num <TEST_NUM>`<br>
|
||||
|
||||
Use this option to set how many paragraph you want to translate for testing. Default is 10.
|
||||
|
||||
## Resume
|
||||
`--resume` <br>
|
||||
|
||||
Use this option to manually resume the process after an interruption.
|
||||
|
||||
## Retranslate (epub only)
|
||||
`--retranslate <translated_filepath, file_name_in_epub, start_str [, end_str]>`<br>
|
||||
|
||||
If a file in epub is not translated well, it supports to re-translate part of epub separately.
|
||||
|
||||
This option take 4 arguments: `translated_filepath`, `file_name_in_epub`, `start_str`, `end_str`. `end_str` is optional.
|
||||
|
||||
- Retranslate from start_str to end_str's tag:
|
||||
|
||||
bbook_maker --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously'
|
||||
|
||||
- Retranslate start_str's tag:
|
||||
|
||||
bbook_maker --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'
|
||||
|
||||
- Retranslate start_str's tag, auto find filename:
|
||||
|
||||
bbook_maker --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' '' 'in spite of the present book shortage which'
|
||||
|
||||
**Warning:**
|
||||
|
||||
**It deletes from the tag at start_str of the finished book to the next tag at end_str, and then re-translates.**
|
||||
|
||||
**Therefore, please make sure that the next tag of end_str is the translated content. (If end_str is not provided, the next label of start_str is guaranteed to be the translated content.) There can be missing translations between the two strings, but if end_str is not translated, there will be problems.**
|
||||
|
||||
|
||||
|
||||
|
||||
## Customize output style (epub only)
|
||||
`--translation_style <TRANSLATION_STYLE>`<br>
|
||||
|
||||
Support changing the output style of epub files.
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --translation_style "color: #4a4a4a; font-style: normal; background-color: #f7f7f7; padding: 5px; margin: 10px 0; border-radius: 5px;"
|
||||
|
||||

|
||||
## Proxy
|
||||
`--proxy <PROXY>` <br>
|
||||
|
||||
Use this option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890` .
|
||||
|
||||
## API base
|
||||
`--api_base <API_BASE_URL>`<br>
|
||||
|
||||
If you want to change api_base like using Cloudflare Workers, use this option to support it.<br>
|
||||
|
||||
bbook_maker --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'
|
||||
**Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.**
|
||||
|
||||
## Microsoft Azure Endpoints
|
||||
`--api_base <API_BASE_URL>` `--deployment_id <DEPLOYMENT_ID>`<br>
|
||||
|
||||
You can use the api endpoint provided from Microsoft.
|
||||
|
||||
|
||||
bbook_maker --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'
|
||||
|
||||
**Note : Current only support chatgptapi model for deployment_id. And `api_base` must be provided when using `deployment_id`. You can check [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal) for more information about `deployment_id`.**
|
||||
|
||||
## Batch size (txt only)
|
||||
`--batch_size`<br>
|
||||
|
||||
Use this parameter to specify the number of lines for batch translation. Default is 10. (Currently only effective for txt files).
|
||||
```sh
|
||||
python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch_size 20
|
||||
```
|
||||
|
||||
## Accumulated Num
|
||||
`--accumulated_num <ACCUMULATED_NUM>`<br>
|
||||
|
||||
Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090.
|
||||
|
||||
For example, if you use --accumulated_num 1600, maybe openai will
|
||||
output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages. 1600+2200+200=4000, so you are close to the limit.
|
||||
|
||||
You have to choose your own
|
||||
value, there is no way to tell if the limit is reached before sending request.
|
14
docs/disclaimer.md
Normal file
14
docs/disclaimer.md
Normal file
@ -0,0 +1,14 @@
|
||||
Disclaimer:
|
||||
|
||||
1. The purpose of this project, bilingual_book_maker, is to assist users in creating multilingual versions of epub files and books. It is only applicable to books that have entered the public domain and is not intended for use with copyrighted material. We strongly advise users to read the copyright information carefully before using this project and to comply with relevant laws and regulations in order to protect their own and others' rights.
|
||||
2. In no event shall the authors or developers be liable for any loss or damage caused by the use of this project. Users assume all risks associated with the use of this project. Users must confirm that they have obtained permission from the original copyright holder or used open source EPUB files before using this project to avoid potential copyright risks.
|
||||
|
||||
If you have any concerns or suggestions about the use of this project, please contact us through the issues section.
|
||||
|
||||
|
||||
免责声明:
|
||||
|
||||
1. 该项目设计目的是为了帮助用户制作多语言版本的epub文件和图书,仅适用于进入公共版权领域书籍,不适用于有版权的书籍。我们强烈建议用户在使用该项目时仔细阅读其版权信息并遵守相关法律和规定,以保护自己和他人的权益。
|
||||
2. 在任何情况下,作者和开发者不对因使用该项目而导致的任何损失或损害承担任何责任。使用该项目的风险由用户自行承担。用户必须在使用该项目之前,确认其已获得了原著作权人的许可或使用了公开可用的开源EPUB文件,以避免可能存在的版权风险。
|
||||
|
||||
如果您对该项目的使用有任何疑虑或建议,请通过 issues 与我们联系。
|
11
docs/env_settings.md
Normal file
11
docs/env_settings.md
Normal file
@ -0,0 +1,11 @@
|
||||
# Environment Settings
|
||||
You can also write information into env to skip some options.
|
||||
|
||||
## Model keys
|
||||
```
|
||||
# Set env BBM_OPENAI_API_KEY to ignore option --openai_key
|
||||
export BBM_OPENAI_API_KEY=${your_api_key}
|
||||
|
||||
# Set env BBM_CAIYUN_API_KEY to ignore option --caiyun_key
|
||||
export BBM_CAIYUN_API_KEY=${your_api_key}
|
||||
```
|
5
docs/index.md
Normal file
5
docs/index.md
Normal file
@ -0,0 +1,5 @@
|
||||
# bilingual book maker
|
||||
|
||||
The `bilingual_book_maker` is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt files and books.
|
||||
|
||||
This tool is exclusively designed for translating epub books that have entered the public domain and is not intended for copyrighted works. Before using this tool, please review the project's **[disclaimer](disclaimer.md)**.
|
13
docs/installation.md
Normal file
13
docs/installation.md
Normal file
@ -0,0 +1,13 @@
|
||||
# Installation
|
||||
## pip
|
||||
bilingual_book_maker has been published as a [Python package](https://pypi.org/project/bbook-maker/) and can be install by `pip`. (Recommend in a virtual environment.)
|
||||
```sh
|
||||
pip install -U bbook_maker
|
||||
```
|
||||
|
||||
## git
|
||||
You can also install from github if you want to use the latest version.
|
||||
```sh
|
||||
git clone git@github.com:yihong0618/bilingual_book_maker.git
|
||||
pip install .
|
||||
```
|
115
docs/model_lang.md
Normal file
115
docs/model_lang.md
Normal file
@ -0,0 +1,115 @@
|
||||
# Model and Languages
|
||||
## Models
|
||||
`-m, --model <Model>` <br>
|
||||
|
||||
Currently `bbook_maker` supports these models: `chatgptapi` , `gpt3` , `google` , `caiyun` , `deepl` , `deeplfree` , `gpt4` , `claude` .
|
||||
Default model is `chatgptapi` .
|
||||
|
||||
### OPENAI models
|
||||
|
||||
There are three models you can choose from.
|
||||
|
||||
* gpt3
|
||||
|
||||
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model gpt3 --openai_key ${openai_key}
|
||||
|
||||
|
||||
|
||||
* chatgpiapi
|
||||
|
||||
|
||||
`chatgptapi` is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently.
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model chatgptapi --openai_key ${openai_key}
|
||||
|
||||
* gpt4
|
||||
|
||||
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model gpt4 --openai_key ${openai_key}
|
||||
|
||||
If using `gpt4` , you can add `--use_context` to add a context paragraph to each passage sent to the model for translation.
|
||||
|
||||
|
||||
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model gpt4 --openai_key ${openai_key} --use_context
|
||||
|
||||
The option `--use_context` prompts the GPT4 model to create a one-paragraph summary.
|
||||
|
||||
|
||||
|
||||
If it is the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num` ).
|
||||
|
||||
|
||||
|
||||
If it has any proceeding passage, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work, which improves consistency of flow and tone of each translation.
|
||||
|
||||
**Note 1: Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx, xxx, xxx) to reduce errors caused by API call limits.**
|
||||
|
||||
**Note 2: You can just set the environment variable `BBM_OPENAI_API_KEY` instead the openai_key. See [Environment setting](settings.md).**
|
||||
|
||||
### CAIYUN
|
||||
|
||||
Using Caiyun model to translate. The api currently only support:
|
||||
|
||||
|
||||
|
||||
1. Simplified Chinese <-> English
|
||||
2. Simplified Chinese <-> Japanese
|
||||
|
||||
The official Caiyun has provided a test token (3975l6lr5pcbvidl6jl2). You can apply your own token by following this [tutorial].(https://bobtranslate.com/service/translate/caiyun.html)
|
||||
|
||||
|
||||
bbook_maker --model caiyun --caiyun_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub
|
||||
|
||||
### DEEPL
|
||||
|
||||
There are two models you can choose from.
|
||||
|
||||
|
||||
|
||||
* deepl: [DeepL Translator](https://rapidapi.com/splintPRO/api/deepl-translator). <br>
|
||||
|
||||
|
||||
|
||||
Need to pay to get the token. Use `--model deepl --deepl_key ${deepl_key}`
|
||||
|
||||
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key}
|
||||
|
||||
|
||||
|
||||
* deeplfree: DeepL free model
|
||||
|
||||
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model deeplfree
|
||||
|
||||
### Claude
|
||||
|
||||
Support [Claude](https://console.anthropic.com/docs) model. Use `--model claude --claude_key ${claude_key}` .
|
||||
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key}
|
||||
|
||||
|
||||
|
||||
### Google
|
||||
|
||||
Support google model. Use `--model google`
|
||||
|
||||
## Languages
|
||||
`--language <LANGUAGE>` <br>
|
||||
|
||||
Set target languages. All models except for `caiyun` supports lots of languages. You can use `bbook_maker --help` to check available languages. Default target language is `"Simplified Chinese"` .
|
||||
|
||||
```sh
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model chatgptapi --openai_key ${openai_key} --language ja
|
||||
```
|
||||
|
||||
```sh
|
||||
bbook_maker --book_name test_books/animal_farm.epub --model chatgptapi --openai_key ${openai_key} --language "Simplified Chinese"
|
||||
```
|
28
docs/prompt.md
Normal file
28
docs/prompt.md
Normal file
@ -0,0 +1,28 @@
|
||||
# Tweek the prompt
|
||||
To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt:
|
||||
|
||||
- If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt`
|
||||
|
||||
# prompt_template_sample.txt
|
||||
Translate the given text to {language}. Be faithful or accurate in translation. Make the translation readable or intelligible. Be elegant or natural in translation. If the text cannot be translated, return the original text as is. Do not translate person's name. Do not add any additional text in the translation. The text to be translated is:
|
||||
{text}
|
||||
|
||||
|
||||
- If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json`
|
||||
|
||||
# prompt_template_sample.json
|
||||
{
|
||||
"system": "You are a professional translator.",
|
||||
"user": "Translate the given text to {language}. Be faithful or accurate in translation. Make the translation readable or intelligible. Be elegant or natural in translation. If the text cannot be translated, return the original text as is. Do not translate person's name. Do not add any additional text in the translation. The text to be translated is:\n{text}"
|
||||
}
|
||||
|
||||
You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`.
|
||||
|
||||
## Examples
|
||||
```sh
|
||||
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.txt
|
||||
# or
|
||||
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.json
|
||||
# or
|
||||
python3 make_book.py --book_name test_books/animal_farm.epub --prompt "Please translate \`{text}\` to {language}"
|
||||
```
|
29
docs/quickstart.md
Normal file
29
docs/quickstart.md
Normal file
@ -0,0 +1,29 @@
|
||||
# QuickStart
|
||||
After successfully install the package, you can see `bbook-maker` is in the output of `pip list`.
|
||||
|
||||
## Preparation
|
||||
1. ChatGPT or OpenAI [token](https://platform.openai.com/account/api-keys)
|
||||
2. epub/txt books
|
||||
3. Environment with internet access or proxy
|
||||
4. Python 3.8+
|
||||
|
||||
## Use
|
||||
You can use by command `bbook_maker`. A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
|
||||
```sh
|
||||
bbook_maker --book_name ${path of a book} --openai_key ${openai_key}
|
||||
|
||||
# Example
|
||||
bbook_maker --book_name test_books/animal_farm.epub --openai_key ${openai_key}
|
||||
```
|
||||
Or, you can use the [script](https://github.com/yihong0618/bilingual_book_maker/blob/main/make_book.py) provided by repository.
|
||||
```sh
|
||||
python3 make_book.py --book_name ${path of a book} --openai_key ${openai_key}
|
||||
|
||||
# Example
|
||||
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key}
|
||||
```
|
||||
|
||||
Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
|
||||
|
||||
|
||||
**Note: If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.**
|
21
mkdocs.yml
Normal file
21
mkdocs.yml
Normal file
@ -0,0 +1,21 @@
|
||||
site_name: bilingual book maker
|
||||
theme:
|
||||
name: material
|
||||
features:
|
||||
- navigation.tabs
|
||||
- navigation.tabs.sticky
|
||||
- content.code.copy
|
||||
|
||||
nav:
|
||||
- Home : index.md
|
||||
- Getting started:
|
||||
- Installation: installation.md
|
||||
- QuickStart: quickstart.md
|
||||
- Usage:
|
||||
- Model and languages: model_lang.md
|
||||
- Command line options: cmd.md
|
||||
- Translate from different source: book_source.md
|
||||
- Environment setting: env_settings.md
|
||||
- Tweek the prompt: prompt.md
|
||||
- Disclaimer: disclaimer.md
|
||||
|
@ -1 +1,3 @@
|
||||
-e .
|
||||
-e .
|
||||
mkdocs
|
||||
mkdocs-material
|
Loading…
x
Reference in New Issue
Block a user