docs: update readme

2025-07-24 10:20:06 +00:00 · 2024-10-24 18:18:07 +08:00 · 2024-10-24 18:18:07 +08:00 · df3ec9f35b
commit df3ec9f35b
parent 491efd5cbd
3 changed files with 330 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -6,10 +6,20 @@ English | [简体中文](README_ZH.md)

 # Memos

-Memos is a privacy-focused passive recording project. It can automatically record screen content, build intelligent indices, and provide a web interface to retrieve historical records.
+Memos is a privacy-focused passive recording project. It can automatically record screen content, build intelligent indices, and provide a convenient web interface to retrieve historical records.

 This project draws heavily from two other projects: one called [Rewind](https://www.rewind.ai/) and another called [Windows Recall](https://support.microsoft.com/en-us/windows/retrace-your-steps-with-recall-aa03f8a0-a78b-4b3e-b0a1-2eb8ac48701c). However, unlike both of them, Memos allows you to have complete control over your data, avoiding the transfer of data to untrusted data centers.

+## Features
+
+- Simple installation: just install dependencies via pip to get started
+- Complete data control: all data is stored locally, allowing for full local operation and self-managed data processing
+- Full-text and vector search support
+- Integrates with Ollama, using it as the machine learning engine for Memos
+- Compatible with any OpenAI API models (e.g., OpenAI, Azure OpenAI, vLLM, etc.)
+- Supports Mac and Windows (Linux support is in development)
+- Extensible functionality through plugins
+
 ## Quick Start

 ### 1. Install Memos
@ -39,6 +49,7 @@ This command will:

 - Begin recording all screens
 - Start the Web service
+- Set the service to start on boot

 ### 4. Access the Web Interface

@ -46,3 +57,156 @@ Open your browser and visit `http://localhost:8839`

 - Default username: `admin`
 - Default password: `changeme`
+
+## User Guide
+
+### Using Ollama for Visual Search
+
+By default, Memos only enables the OCR plugin to extract text from screenshots and build indices. However, this method significantly limits search effectiveness for images without text.
+
+To achieve more comprehensive visual search capabilities, we need a multimodal image understanding service compatible with the OpenAI API. Ollama perfectly fits this role.
+
+#### Important Notes Before Use
+
+Before deciding to enable the VLM feature, please note the following:
+
+1. **Hardware Requirements**
+
+   - Recommended configuration: NVIDIA graphics card with at least 8GB VRAM or Mac with M series chip
+   - The minicpm-v model will occupy about 5.5GB of storage space
+   - CPU mode is not recommended as it will cause severe system lag
+
+2. **Performance and Power Consumption Impact**
+
+   - Enabling VLM will significantly increase system power consumption
+   - Consider using other devices to provide OpenAI API compatible model services
+
+#### Enabling Steps
+
+1. **Install Ollama**
+
+Visit the [Ollama official documentation](https://ollama.com) for detailed installation and configuration instructions.
+
+2. **Prepare the Multimodal Model**
+
+Download and run the multimodal model `minicpm-v` using the following command:
+
+```sh
+ollama run minicpm-v "Describe what this service is"
+```
+
+This command will download and run the minicpm-v model. If the running speed is too slow, it is not recommended to use this feature.
+
+3. **Configure Memos to Use Ollama**
+
+Open the `~/.memos/config.yaml` file with your preferred text editor and modify the `vlm` configuration:
+
+```yaml
+vlm:
+  enabled: true          # Enable VLM feature
+  endpoint: http://localhost:11434  # Ollama service address
+  modelname: minicpm-v   # Model name to use
+  force_jpeg: true       # Convert images to JPEG format to ensure compatibility
+  prompt: Please describe the content of this image, including the layout and visual elements  # Prompt sent to the model
+```
+
+Use the above configuration to overwrite the `vlm` configuration in the `~/.memos/config.yaml` file.
+
+Also, modify the `default_plugins` configuration in the `~/.memos/plugins/vlm/config.yaml` file:
+
+```yaml
+default_plugins:
+- builtin_ocr
+- builtin_vlm
+```
+
+This adds the `builtin_vlm` plugin to the default plugin list.
+
+4. **Restart Memos Service**
+
+```sh
+memos stop
+memos start
+```
+
+After restarting the Memos service, wait a moment to see the data extracted by VLM in the latest screenshots on the Memos web interface:
+
+![image](./docs/images/single-screenshot-view-with-minicpm-result.png)
+
+If you do not see the VLM results, you can:
+
+- Use the command `memos ps` to check if the Memos process is running normally
+- Check for error messages in `~/.memos/logs/memos.log`
+- Confirm whether the Ollama model is loaded correctly (`ollama ps`)
+
+### Full Indexing
+
+Memos is a compute-intensive application. The indexing process requires the collaboration of OCR, VLM, and embedding models. To minimize the impact on the user's computer, Memos calculates the average processing time for each screenshot and adjusts the indexing frequency accordingly. Therefore, not all screenshots are indexed immediately by default.
+
+If you want to index all screenshots, you can use the following command for full indexing:
+
+```sh
+memos scan
+```
+
+This command will scan and index all recorded screenshots. Note that depending on the number of screenshots and system configuration, this process may take some time and consume significant system resources. The index construction is idempotent, and running this command multiple times will not re-index already indexed data.
+
+## Privacy and Security
+
+During the development of Memos, I closely followed the progress of similar products, especially [Rewind](https://www.rewind.ai/) and [Windows Recall](https://support.microsoft.com/en-us/windows/retrace-your-steps-with-recall-aa03f8a0-a78b-4b3e-b0a1-2eb8ac48701c). I greatly appreciate their product philosophy, but they do not do enough in terms of privacy protection, which is a concern for many users (or potential users). Recording the screen of a personal computer may expose extremely sensitive private data, such as bank accounts, passwords, chat records, etc. Therefore, ensuring that data storage and processing are completely controlled by the user to prevent data leakage is particularly important.
+
+The advantages of Memos are:
+
+1. The code is completely open-source and easy-to-understand Python code, allowing anyone to review the code to ensure there are no backdoors.
+2. Data is completely localized, all data is stored locally, and data processing is entirely controlled by the user. Data will be stored in the user's `~/.memos` directory.
+3. Easy to uninstall. If you no longer use Memos, you can close the program with `memos stop && memos disable`, then uninstall it with `pip uninstall memos`, and finally delete the `~/.memos` directory to clean up all databases and screenshot data.
+4. Data processing is entirely controlled by the user. Memos is an independent project, and the machine learning models used (including VLM and embedding models) are chosen by the user. Due to Memos' operating mode, using smaller models can also achieve good results.
+
+Of course, there is still room for improvement in terms of privacy, and contributions are welcome to make Memos better.
+
+## Other Noteworthy Content
+
+### About Storage Space
+
+Memos records the screen every 5 seconds and saves the original screenshots in the `~/.memos/screenshots` directory. Storage space usage mainly depends on the following factors:
+
+1. **Screenshot Data**:
+
+   - Single screenshot size: about 40-400KB (depending on screen resolution and display complexity)
+   - Daily data volume: about 400MB (based on 10 hours of usage, single screen 2560x1440 resolution)
+   - Multi-screen usage: data volume increases with the number of screens
+   - Monthly estimate: about 8GB based on 20 working days
+
+   Screenshots are deduplicated. If the content of consecutive screenshots does not change much, only one screenshot will be retained. The deduplication mechanism can significantly reduce storage usage in scenarios where content does not change frequently (such as reading, document editing, etc.).
+
+2. **Database Space**:
+
+   - SQLite database size depends on the number of indexed screenshots
+   - Reference value: about 2.2GB of storage space after indexing 100,000 screenshots
+
+### About Power Consumption
+
+Memos requires two compute-intensive tasks by default:
+
+- One is the OCR task, used to extract text from screenshots
+- The other is the embedding task, used to extract semantic information and build vector indices
+
+#### Resource Usage
+
+- **OCR Task**: Executed using the CPU, and optimized to select the OCR engine based on different operating systems to minimize CPU usage
+- **Embedding Task**: Intelligently selects the computing device
+
+  - NVIDIA GPU devices prioritize using the GPU
+  - Mac devices prioritize using Metal GPU
+  - Other devices use the CPU
+
+#### Performance Optimization Strategy
+
+To avoid affecting users' daily use, Memos has adopted the following optimization measures:
+
+- Dynamically adjust the indexing frequency, adapting to system processing speed
+- Automatically reduce processing frequency when on battery power to save power
+
+## Development Guide
+
+to be continued
--- a/README_ZH.md
+++ b/README_ZH.md
@ -8,7 +8,17 @@

 Memos 是一个专注于隐私的被动记录项目。它可以自动记录屏幕内容，构建智能索引，并提供便捷的 web 界面来检索历史记录。

-这个项目大量参考了另外的两个项目，一个叫做 [Rewind](https://www.rewind.ai/)，另一个叫做 [Windows Recall](https://support.microsoft.com/en-us/windows/retrace-your-steps-with-recall-aa03f8a0-a78b-4b3e-b0a1-2eb8ac48701c)。不过，和他们两者不同，Memos 让你可以完全管控自己的数据，避免将数据传递到不信任的数据中心。
+这个项目主要参考了另外两个项目，一个叫做 [Rewind](https://www.rewind.ai/)，另一个叫做 [Windows Recall](https://support.microsoft.com/en-us/windows/retrace-your-steps-with-recall-aa03f8a0-a78b-4b3e-b0a1-2eb8ac48701c)。不过，与它们不同的是 Memos 让你可以完全管控自己的数据，避免将数据传递到不信任的数据中心。
+
+## 功能特性
+
+- 安装简单，只需要通过 pip 安装依赖就可以开始使用了
+- 数据全掌控，所有数据都存储在本地，可以完全本地化运行，数据处理完全由自己控制
+- 支持全文检索和向量检索
+- 支持和 Ollama 一起工作，让 Ollama 作为 Memos 的机器学习引擎
+- 支持任何 OpenAI API 兼容的模型（比如 OpenAI, Azure OpenAI，vLLM 等）
+- 支持 Mac 和 Windows 系统（Linux 支持正在开发中）
+- 支持通过插件扩展出更多数据处理能力

 ## 快速开始

@ -39,6 +49,7 @@ memos start

 - 开始对所有屏幕进行记录
 - 启动 Web 服务
+- 将服务设置为开机启动

 ### 4. 访问 Web 界面

@ -46,3 +57,156 @@ memos start

 - 默认用户名：`admin`
 - 默认密码：`changeme`
+
+## 使用指南
+
+### 使用 Ollama 支持视觉检索
+
+默认情况下，Memos 仅启用 OCR 插件来提取截图中的文字并建立索引。然而，对于不包含文字的图像，这种方式会大大限制检索效果。
+
+为了实现更全面的视觉检索功能，我们需要一个兼容 OpenAI API 的多模态图像理解服务。Ollama 正好可以完美胜任这项工作。
+
+#### 使用前的重要说明
+
+在决定是否启用 VLM 功能前，请注意以下几点：
+
+1. **硬件要求**
+
+   - 推荐配置：至少 8GB 显存的 NVIDIA 显卡或 M 系列芯片的 Mac
+   - minicpm-v 模型将占用约 5.5GB 存储空间
+   - 不建议使用 CPU 模式，会导致系统严重卡顿
+
+2. **性能和功耗影响**
+
+   - 启用 VLM 后会显著增加系统功耗
+   - 可以考虑使用其他设备提供 OpenAI API 兼容的模型服务
+
+#### 启用步骤
+
+1. **安装 Ollama**
+
+请访问 [Ollama 官方文档](https://ollama.com) 获取详细的安装和配置指南。
+
+2. **准备多模态模型**
+
+使用以下命令下载并运行多模态模型 `minicpm-v`：
+
+```sh
+ollama run minicpm-v "描述一下这是什么服务"
+```
+
+这条命令会下载并运行 minicpm-v 模型，如果发现运行速度太慢的话，不推荐使用这部分功能。
+
+1. **配置 Memos 使用 Ollama**
+
+使用你喜欢的文本编辑器打开 `~/.memos/config.yaml` 文件，并修改 `vlm` 配置：
+
+```yaml
+vlm:
+  enabled: true          # 启用 VLM 功能
+  endpoint: http://localhost:11434  # Ollama 服务地址
+  modelname: minicpm-v   # 使用的模型名称
+  force_jpeg: true       # 将图片转换为 JPEG 格式以确保兼容性
+  prompt: 请帮描述这个图片中的内容，包括画面格局、出现的视觉元素等  # 发送给模型的提示词
+```
+
+使用上述配置覆盖 `~/.memos/config.yaml` 文件中的 `vlm` 配置。
+
+同时还要修改 `~/.memos/plugins/vlm/config.yaml` 文件中的 `default_plugins` 配置：
+
+```yaml
+default_plugins:
+- builtin_ocr
+- builtin_vlm
+```
+
+这里就是将 `builtin_vlm` 插件添加到默认的插件列表中。
+
+4. **重启 Memos 服务**
+
+```sh
+memos stop
+memos start
+```
+
+重启 Memos 服务之后，稍等片刻，就可以在 Memos 的 Web 界面中最新的截图里看到通过 VLM 所提取的数据了：
+
+![image](./docs/images/single-screenshot-view-with-minicpm-result.png)
+
+如果没有看到 VLM 的结果，可以：
+
+- 使用命令 `memos ps` 查看 Memos 进程是否正常运行
+- 检查 `~/.memos/logs/memos.log` 中是否有错误信息
+- 确认 Ollama 模型是否正确加载（`ollama ps`）
+
+### 全量索引
+
+Memos 是一个计算密集型的应用，Memos 的索引过程会需要 OCR、VLM 以及词向量模型协同工作。为了尽量减少对用户电脑的影响，Memos 会计算每个截图的平均处理时间，并依据这个时间来调整索引的频率。因此，默认情况下并不是所有的截图都会被立即索引。
+
+如果希望对所有截图进行索引，可以使用以下命令进行全量索引：
+
+```sh
+memos scan
+```
+
+该命令会扫描并索引所有已记录的截图。请注意，根据截图数量和系统配置的不同，这个过程可能会持续一段时间，并且会占用较多系统资源。索引的构建是幂等的，多次运行该命令不会对已索引的数据进行重复索引。
+
+## 隐私安全
+
+在开发 Memos 的过程中，我一直密切关注类似产品的进展，特别是 [Rewind](https://www.rewind.ai/) 和 [Windows Recall](https://support.microsoft.com/en-us/windows/retrace-your-steps-with-recall-aa03f8a0-a78b-4b3e-b0a1-2eb8ac48701c)。我非常欣赏它们的产品理念，但它们在隐私保护方面做得不够，这也是许多用户（或潜在用户）所担心的问题。记录个人电脑的屏幕可能会暴露极为敏感的隐私数据，如银行账户、密码、聊天记录等。因此，确保数据的存储和处理完全由用户掌控，防止数据泄露，变得尤为重要。
+
+Memos 的优势在于：
+
+1. 代码完全开源，并且是易于理解的 Python 代码，任何人都可以审查代码，确保没有后门。
+2. 数据完全本地化，所有数据都存储在本地，数据处理完全由用户控制，数据将被存储在用户的 `~/.memos` 目录中。
+3. 易于卸载，如果不再使用 Memos，通过 `memos stop && memos disable` 即可关闭程序，然后通过 `pip uninstall memos` 即可卸载，最后删除 `~/.memos` 目录即可清理所有的数据库和截图数据。
+4. 数据处理完全由用户控制，Memos 是一个独立项目，所使用的机器学习模型（包括 VLM 以及词向量模型）都由用户自己选择，并且由于 Memos 的运作模式，使用较小的模型也可以达到不错的效果。
+
+当然 Memos 肯定在隐私方面依然有可以改进的地方，欢迎大家贡献代码，一起让 Memos 变得更好。
+
+## 其他值得注意的内容
+
+### 有关存储空间
+
+Memos 每 5 秒会记录一次屏幕，并将原始截图保存到 `~/.memos/screenshots` 目录中。存储空间占用主要取决于以下因素：
+
+1. **截图数据**：
+
+   - 单张截图大小：约 40-400KB（取决于屏幕分辨率以及显示的复杂程度）
+   - 日均数据量：约 400MB（基于 10 小时使用时长，单屏幕 2560x1440 分辨率）
+   - 多屏幕使用：数据量会随屏幕数量增加
+   - 月度估算：按 20 个工作日计算，约 8GB
+
+   截图会进行去重，如果连续截图内容变化不大，那么只会保留一张截图，去重机制可以在内容变化不频繁时（如阅读、文档编辑等场景）显著减少存储占用。
+
+2. **数据库空间**：
+
+   - SQLite 数据库大小取决于索引的截图数量
+   - 参考值：10 万张截图索引后约占用 2.2GB 存储空间
+
+### 有关功耗
+
+Memos 默认需要两个计算密集型的任务：
+
+- 一个是 OCR 任务，用于提取截图中的文字
+- 一个是词向量索引任务，用于提取语义信息构建向量索引
+
+#### 资源使用情况
+
+- **OCR 任务**：使用 CPU 执行，并根据不同操作系统优化选择 OCR 引擎，以最小化 CPU 占用
+- **词向量索引**：智能选择计算设备
+
+  - NVIDIA GPU 设备优先使用 GPU
+  - Mac 设备优先使用 Metal GPU
+  - 其他设备使用 CPU
+
+#### 性能优化策略
+
+为了避免影响用户日常使用，Memos 采取了以下优化措施：
+
+- 动态调整索引频率，根据系统处理速度自适应
+- 电池供电时自动降低处理频率，最大程度节省电量
+
+## 开发指南
+
+to be continued
--- a/docs/images/single-screenshot-view-with-minicpm-result.png
+++ b/docs/images/single-screenshot-view-with-minicpm-result.png