Add formal doc page (#319)

* fix readme

* done
This commit is contained in:
YYLIZH 2023-08-15 09:41:13 +08:00 committed by GitHub
parent d8ad734888
commit c7ee4acb14
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 369 additions and 6 deletions

View File

@ -7,4 +7,7 @@ fmt:
.PHONY:tests
tests:
@echo "Running tests ..."
venv/bin/pytest tests/test_integration.py
venv/bin/pytest tests/test_integration.py
serve-docs:
mkdocs serve

View File

@ -51,7 +51,7 @@ Find more info here for using liteLLM: https://github.com/BerriAI/litellm/blob/m
- `--accumulated_num` Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090. For example, if you use --accumulated_num 1600, maybe openai will
output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000, So you are close to reaching the limit. You have to choose your own
value, there is no way to know if the limit is reached before sending
- `--use_context` prompts the GPT4 model to create a one-paragraph summary. If it's the beginning of the translation, it will summarise the entire passage sent (the size depending on `--accumulated_num`), but if it's any proceeding passage, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work, which improves consistency of flow and tone of each translation.
- `--use_context` prompts the GPT4 model to create a one-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`), but if it's any proceeding passage, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work, which improves consistency of flow and tone of each translation.
- `--translation_style` example: `--translation_style "color: #808080; font-style: italic;"`
- `--retranslate` `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`<br>
Retranslate from start_str to end_str's tag:
@ -60,7 +60,7 @@ Retranslate start_str's tag:
`python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'`
### Examples
**Note if use `pip install bbook_maker` all commands can change to `bbook args`**
**Note if use `pip install bbook_maker` all commands can change to `bbook_maker args`**
```shell
# Test quickly

View File

@ -154,6 +154,7 @@ def main():
# args to change api_base
parser.add_argument(
"--api_base",
metavar="API_BASE_URL",
dest="api_base",
type=str,
help="specify base url other than the OpenAI's official API address",

View File

@ -11,4 +11,4 @@ If you have any concerns or suggestions about the use of this project, please co
1. 该项目设计目的是为了帮助用户制作多语言版本的epub文件和图书仅适用于进入公共版权领域书籍不适用于有版权的书籍。我们强烈建议用户在使用该项目时仔细阅读其版权信息并遵守相关法律和规定以保护自己和他人的权益。
2. 在任何情况下作者和开发者不对因使用该项目而导致的任何损失或损害承担任何责任。使用该项目的风险由用户自行承担。用户必须在使用该项目之前确认其已获得了原著作权人的许可或使用了公开可用的开源EPUB文件以避免可能存在的版权风险。
如果您对该项目的使用有任何疑虑或建议,请通过 issus 与我们联系。
如果您对该项目的使用有任何疑虑或建议,请通过 issues 与我们联系。

20
docs/book_source.md Normal file
View File

@ -0,0 +1,20 @@
# Translate from Different Sources
## txt/srt
Txt files and srt files are plain text files. This program can translate plain text.
python3 make_book.py --book_name test_books/the_little_prince.txt --test --language zh-hans
## epub
epub is made of html files. By default, we only translate contents in `<p>`. Use `--translate-tags` to specify tags need for translation. Use comma to seperate multiple tags. For example: `--translate-tags h1,h2,h3,p,div`
bbook_maker --book_name test_books/animal_farm.epub --openai_key ${openai_key} --translate-tags div,p
If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. <br>
**Note that it's best to look for e-books that are more standardized if possible.**
## e-reader
Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.
# Translate books download from Rakuten Kobo on kobo e-reader
bbook_maker --book_from kobo --device_path /tmp/kobo

101
docs/cmd.md Normal file
View File

@ -0,0 +1,101 @@
# Command Line Options
## Test translate
`--test` <br>
Use this option to preview the result if you haven't paid for the service or just want to test. Note that there is a limit and it may take some time.
```sh
bbook_maker --book_name test_books/Lex_Fridman_episode_322.srt --openai_key ${openai_key} --test
```
```sh
bbook_maker --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test --language zh-hans
```
`--test_num <TEST_NUM>`<br>
Use this option to set how many paragraph you want to translate for testing. Default is 10.
## Resume
`--resume` <br>
Use this option to manually resume the process after an interruption.
## Retranslate (epub only)
`--retranslate <translated_filepath, file_name_in_epub, start_str [, end_str]>`<br>
If a file in epub is not translated well, it supports to re-translate part of epub separately.
This option take 4 arguments: `translated_filepath`, `file_name_in_epub`, `start_str`, `end_str`. `end_str` is optional.
- Retranslate from start_str to end_str's tag:
bbook_maker --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously'
- Retranslate start_str's tag:
bbook_maker --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'
- Retranslate start_str's tag, auto find filename:
bbook_maker --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' '' 'in spite of the present book shortage which'
**Warning:**
**It deletes from the tag at start_str of the finished book to the next tag at end_str, and then re-translates.**
**Therefore, please make sure that the next tag of end_str is the translated content. (If end_str is not provided, the next label of start_str is guaranteed to be the translated content.) There can be missing translations between the two strings, but if end_str is not translated, there will be problems.**
## Customize output style (epub only)
`--translation_style <TRANSLATION_STYLE>`<br>
Support changing the output style of epub files.
bbook_maker --book_name test_books/animal_farm.epub --translation_style "color: #4a4a4a; font-style: normal; background-color: #f7f7f7; padding: 5px; margin: 10px 0; border-radius: 5px;"
![output_style](https://user-images.githubusercontent.com/89069008/226104545-7c029bb1-5325-46d4-a1eb-ec4e7bbaee97.png)
## Proxy
`--proxy <PROXY>` <br>
Use this option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890` .
## API base
`--api_base <API_BASE_URL>`<br>
If you want to change api_base like using Cloudflare Workers, use this option to support it.<br>
bbook_maker --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'
**Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.**
## Microsoft Azure Endpoints
`--api_base <API_BASE_URL>` `--deployment_id <DEPLOYMENT_ID>`<br>
You can use the api endpoint provided from Microsoft.
bbook_maker --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'
**Note : Current only support chatgptapi model for deployment_id. And `api_base` must be provided when using `deployment_id`. You can check [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal) for more information about `deployment_id`.**
## Batch size (txt only)
`--batch_size`<br>
Use this parameter to specify the number of lines for batch translation. Default is 10. (Currently only effective for txt files).
```sh
python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch_size 20
```
## Accumulated Num
`--accumulated_num <ACCUMULATED_NUM>`<br>
Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090.
For example, if you use --accumulated_num 1600, maybe openai will
output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages. 1600+2200+200=4000, so you are close to the limit.
You have to choose your own
value, there is no way to tell if the limit is reached before sending request.

14
docs/disclaimer.md Normal file
View File

@ -0,0 +1,14 @@
Disclaimer:
1. The purpose of this project, bilingual_book_maker, is to assist users in creating multilingual versions of epub files and books. It is only applicable to books that have entered the public domain and is not intended for use with copyrighted material. We strongly advise users to read the copyright information carefully before using this project and to comply with relevant laws and regulations in order to protect their own and others' rights.
2. In no event shall the authors or developers be liable for any loss or damage caused by the use of this project. Users assume all risks associated with the use of this project. Users must confirm that they have obtained permission from the original copyright holder or used open source EPUB files before using this project to avoid potential copyright risks.
If you have any concerns or suggestions about the use of this project, please contact us through the issues section.
免责声明:
1. 该项目设计目的是为了帮助用户制作多语言版本的epub文件和图书仅适用于进入公共版权领域书籍不适用于有版权的书籍。我们强烈建议用户在使用该项目时仔细阅读其版权信息并遵守相关法律和规定以保护自己和他人的权益。
2. 在任何情况下作者和开发者不对因使用该项目而导致的任何损失或损害承担任何责任。使用该项目的风险由用户自行承担。用户必须在使用该项目之前确认其已获得了原著作权人的许可或使用了公开可用的开源EPUB文件以避免可能存在的版权风险。
如果您对该项目的使用有任何疑虑或建议,请通过 issues 与我们联系。

11
docs/env_settings.md Normal file
View File

@ -0,0 +1,11 @@
# Environment Settings
You can also write information into env to skip some options.
## Model keys
```
# Set env BBM_OPENAI_API_KEY to ignore option --openai_key
export BBM_OPENAI_API_KEY=${your_api_key}
# Set env BBM_CAIYUN_API_KEY to ignore option --caiyun_key
export BBM_CAIYUN_API_KEY=${your_api_key}
```

5
docs/index.md Normal file
View File

@ -0,0 +1,5 @@
# bilingual book maker
The `bilingual_book_maker` is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt files and books.
This tool is exclusively designed for translating epub books that have entered the public domain and is not intended for copyrighted works. Before using this tool, please review the project's **[disclaimer](disclaimer.md)**.

13
docs/installation.md Normal file
View File

@ -0,0 +1,13 @@
# Installation
## pip
bilingual_book_maker has been published as a [Python package](https://pypi.org/project/bbook-maker/) and can be install by `pip`. (Recommend in a virtual environment.)
```sh
pip install -U bbook_maker
```
## git
You can also install from github if you want to use the latest version.
```sh
git clone git@github.com:yihong0618/bilingual_book_maker.git
pip install .
```

115
docs/model_lang.md Normal file
View File

@ -0,0 +1,115 @@
# Model and Languages
## Models
`-m, --model <Model>` <br>
Currently `bbook_maker` supports these models: `chatgptapi` , `gpt3` , `google` , `caiyun` , `deepl` , `deeplfree` , `gpt4` , `claude` .
Default model is `chatgptapi` .
### OPENAI models
There are three models you can choose from.
* gpt3
bbook_maker --book_name test_books/animal_farm.epub --model gpt3 --openai_key ${openai_key}
* chatgpiapi
`chatgptapi` is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently.
bbook_maker --book_name test_books/animal_farm.epub --model chatgptapi --openai_key ${openai_key}
* gpt4
bbook_maker --book_name test_books/animal_farm.epub --model gpt4 --openai_key ${openai_key}
If using `gpt4` , you can add `--use_context` to add a context paragraph to each passage sent to the model for translation.
bbook_maker --book_name test_books/animal_farm.epub --model gpt4 --openai_key ${openai_key} --use_context
The option `--use_context` prompts the GPT4 model to create a one-paragraph summary.
If it is the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num` ).
If it has any proceeding passage, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work, which improves consistency of flow and tone of each translation.
**Note 1: Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx, xxx, xxx) to reduce errors caused by API call limits.**
**Note 2: You can just set the environment variable `BBM_OPENAI_API_KEY` instead the openai_key. See [Environment setting](settings.md).**
### CAIYUN
Using Caiyun model to translate. The api currently only support:
1. Simplified Chinese <-> English
2. Simplified Chinese <-> Japanese
The official Caiyun has provided a test token (3975l6lr5pcbvidl6jl2). You can apply your own token by following this [tutorial].(https://bobtranslate.com/service/translate/caiyun.html)
bbook_maker --model caiyun --caiyun_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub
### DEEPL
There are two models you can choose from.
* deepl: [DeepL Translator](https://rapidapi.com/splintPRO/api/deepl-translator). <br>
Need to pay to get the token. Use `--model deepl --deepl_key ${deepl_key}`
bbook_maker --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key}
* deeplfree: DeepL free model
bbook_maker --book_name test_books/animal_farm.epub --model deeplfree
### Claude
Support [Claude](https://console.anthropic.com/docs) model. Use `--model claude --claude_key ${claude_key}` .
bbook_maker --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key}
### Google
Support google model. Use `--model google`
## Languages
`--language <LANGUAGE>` <br>
Set target languages. All models except for `caiyun` supports lots of languages. You can use `bbook_maker --help` to check available languages. Default target language is `"Simplified Chinese"` .
```sh
bbook_maker --book_name test_books/animal_farm.epub --model chatgptapi --openai_key ${openai_key} --language ja
```
```sh
bbook_maker --book_name test_books/animal_farm.epub --model chatgptapi --openai_key ${openai_key} --language "Simplified Chinese"
```

28
docs/prompt.md Normal file
View File

@ -0,0 +1,28 @@
# Tweek the prompt
To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt:
- If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt`
# prompt_template_sample.txt
Translate the given text to {language}. Be faithful or accurate in translation. Make the translation readable or intelligible. Be elegant or natural in translation. If the text cannot be translated, return the original text as is. Do not translate person's name. Do not add any additional text in the translation. The text to be translated is:
{text}
- If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json`
# prompt_template_sample.json
{
"system": "You are a professional translator.",
"user": "Translate the given text to {language}. Be faithful or accurate in translation. Make the translation readable or intelligible. Be elegant or natural in translation. If the text cannot be translated, return the original text as is. Do not translate person's name. Do not add any additional text in the translation. The text to be translated is:\n{text}"
}
You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`.
## Examples
```sh
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.txt
# or
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.json
# or
python3 make_book.py --book_name test_books/animal_farm.epub --prompt "Please translate \`{text}\` to {language}"
```

29
docs/quickstart.md Normal file
View File

@ -0,0 +1,29 @@
# QuickStart
After successfully install the package, you can see `bbook-maker` is in the output of `pip list`.
## Preparation
1. ChatGPT or OpenAI [token](https://platform.openai.com/account/api-keys)
2. epub/txt books
3. Environment with internet access or proxy
4. Python 3.8+
## Use
You can use by command `bbook_maker`. A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
```sh
bbook_maker --book_name ${path of a book} --openai_key ${openai_key}
# Example
bbook_maker --book_name test_books/animal_farm.epub --openai_key ${openai_key}
```
Or, you can use the [script](https://github.com/yihong0618/bilingual_book_maker/blob/main/make_book.py) provided by repository.
```sh
python3 make_book.py --book_name ${path of a book} --openai_key ${openai_key}
# Example
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key}
```
Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
**Note: If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.**

21
mkdocs.yml Normal file
View File

@ -0,0 +1,21 @@
site_name: bilingual book maker
theme:
name: material
features:
- navigation.tabs
- navigation.tabs.sticky
- content.code.copy
nav:
- Home : index.md
- Getting started:
- Installation: installation.md
- QuickStart: quickstart.md
- Usage:
- Model and languages: model_lang.md
- Command line options: cmd.md
- Translate from different source: book_source.md
- Environment setting: env_settings.md
- Tweek the prompt: prompt.md
- Disclaimer: disclaimer.md

View File

@ -1 +1,3 @@
-e .
-e .
mkdocs
mkdocs-material

View File

@ -24,7 +24,7 @@ setup(
author_email="zouzou0208@gmail.com",
packages=find_packages(),
url="https://github.com/yihong0618/bilingual_book_maker",
python_requires=">=3.7",
python_requires=">=3.8",
install_requires=packages,
classifiers=[
"Programming Language :: Python :: 3",