feat(storage): implement NAS content storage with read/write capabilities
Build and Push Reader API Image / docker (push) Successful in 1m3s
Build and Push Reader API Image / docker (push) Successful in 1m3s
feat(docker): configure NAS content and EPUB source directories in docker-compose feat(migrations): add tables for SourceAsset, ImportJob, ChapterContentRef, and AssetNovelMapping feat(scripts): create backfill script for populating ChapterContentRef from MongoDB chapters
This commit is contained in:
+6
-1
@@ -11,4 +11,9 @@ build/
|
||||
.env
|
||||
.env.local
|
||||
# Local debug/test scripts
|
||||
test_*.py
|
||||
test_*.py
|
||||
|
||||
# Local NAS mount test data
|
||||
data/epub-source/*
|
||||
data/content/*
|
||||
data/nas-content/*
|
||||
|
||||
@@ -90,6 +90,51 @@ Notes:
|
||||
- `api-local` listens on port `8001` and automatically points to `postgres` + `mongo` containers.
|
||||
- `web` listens on port `3000` and calls API internally through `http://api:8000`.
|
||||
|
||||
### NAS mount points (chapter content + EPUB source)
|
||||
|
||||
API containers now reserve two mount folders:
|
||||
|
||||
- `/data/content`: converted chapter files (`txt` + `raw_html`)
|
||||
- `/data/epub-source`: source EPUB library
|
||||
|
||||
Default env mapping (already wired in compose):
|
||||
|
||||
```env
|
||||
NAS_CONTENT_ROOT=/data/content
|
||||
EPUB_SOURCE_ROOT=/data/epub-source
|
||||
```
|
||||
|
||||
If you want to bind to host folders for local testing:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
api:
|
||||
volumes:
|
||||
- /absolute/local/path/content:/data/content
|
||||
- /absolute/local/path/epub-source:/data/epub-source
|
||||
```
|
||||
|
||||
If you want to use NFS-backed docker volumes, define them under `volumes:`. Example:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
nas_chapter_content:
|
||||
driver: local
|
||||
driver_opts:
|
||||
type: nfs
|
||||
o: addr=100.93.79.10,nolock,soft,rw
|
||||
device: ":/volume2/apps/reader-content"
|
||||
|
||||
nas_epub_source:
|
||||
driver: local
|
||||
driver_opts:
|
||||
type: nfs
|
||||
o: addr=100.93.79.10,nolock,soft,rw
|
||||
device: ":/volume2/apps/reader-epub"
|
||||
```
|
||||
|
||||
For your EPUB structure (folder per novel, multiple `.epub` parts inside), mount the parent folder to `/data/epub-source`.
|
||||
|
||||
## Implemented Endpoints
|
||||
|
||||
- GET /api/health
|
||||
@@ -109,6 +154,52 @@ Notes:
|
||||
- GET /api/truyen/suggest
|
||||
- GET /api/chapters/{chapterId}
|
||||
|
||||
## NAS Migration Ops
|
||||
|
||||
### 1) Apply SQL migration manually
|
||||
|
||||
Run SQL in `migrations/2026_04_nas_content_storage.sql` against PostgreSQL.
|
||||
|
||||
### 2) Backfill existing chapter content from Mongo -> NAS + ChapterContentRef
|
||||
|
||||
Dry-run first:
|
||||
|
||||
```bash
|
||||
python scripts/backfill_chapter_content_refs.py --limit 1000 --dry-run
|
||||
```
|
||||
|
||||
Then execute:
|
||||
|
||||
```bash
|
||||
python scripts/backfill_chapter_content_refs.py --limit 1000
|
||||
```
|
||||
|
||||
You can run multiple batches by increasing/changing `--limit`.
|
||||
|
||||
Checkpoint/resume mode:
|
||||
|
||||
```bash
|
||||
python scripts/backfill_chapter_content_refs.py --limit 1000 --state-file .backfill_state.json
|
||||
```
|
||||
|
||||
Or continue from a known ObjectId:
|
||||
|
||||
```bash
|
||||
python scripts/backfill_chapter_content_refs.py --limit 1000 --after-id 680f7f3a2f0d53f4f2b7a123
|
||||
```
|
||||
|
||||
## Chapter Read Cutover Flag
|
||||
|
||||
Set in `.env`:
|
||||
|
||||
```env
|
||||
CHAPTER_CONTENT_MODE=nas_first
|
||||
```
|
||||
|
||||
Values:
|
||||
- `nas_first` (default): read NAS ref first, fallback Mongo.
|
||||
- `mongo_first`: keep Mongo-first during cautious rollout.
|
||||
|
||||
## Notes
|
||||
|
||||
- Web session auth is supported via NextAuth session cookies (next-auth.session-token and secure variants).
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
# Rollout Checklist - NAS Chapter Storage
|
||||
|
||||
## Pre-Deploy
|
||||
- [ ] Backup PostgreSQL schema + critical tables
|
||||
- [ ] Verify NAS mount/access permissions in API runtime
|
||||
- [ ] Enable feature flags (default: Mongo fallback on)
|
||||
|
||||
## Deploy Order
|
||||
1. Deploy DB migrations
|
||||
2. Deploy API with dual-read disabled by default
|
||||
3. Enable discover/approve/convert job APIs
|
||||
4. Run pilot import set (small curated EPUB batch)
|
||||
5. Enable NAS-first for pilot users/env
|
||||
6. Gradually ramp NAS-first traffic
|
||||
|
||||
## Runtime Verification
|
||||
- [ ] `/api/health` stable
|
||||
- [ ] Chapter read success rate >= target
|
||||
- [ ] NAS read timeout/error rate below threshold
|
||||
- [ ] Mongo fallback rate trending down
|
||||
|
||||
## Rollback
|
||||
- [ ] Switch feature flag to Mongo-first immediately
|
||||
- [ ] Stop import jobs
|
||||
- [ ] Keep imported refs for investigation (no destructive cleanup)
|
||||
|
||||
## Post-Deploy
|
||||
- [ ] Compare chapter counts and random content samples
|
||||
- [ ] Review failed/review_required import queue
|
||||
- [ ] Publish release notes for web/mobile teams
|
||||
+4
-1
@@ -121,7 +121,10 @@ async def resolve_current_user(db: AsyncSession, request: Request) -> dict[str,
|
||||
return await _get_user_from_session_cookie(db, request)
|
||||
|
||||
|
||||
async def require_current_user(db: AsyncSession, request: Request) -> dict[str, Any]:
|
||||
async def require_current_user(
|
||||
request: Request,
|
||||
db: AsyncSession = Depends(get_db_session),
|
||||
) -> dict[str, Any]:
|
||||
user = await resolve_current_user(db, request)
|
||||
if not user:
|
||||
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||
|
||||
@@ -20,6 +20,9 @@ class Settings(BaseSettings):
|
||||
r2_secret_access_key: str = ""
|
||||
r2_bucket_name: str = ""
|
||||
r2_public_base_url: str = ""
|
||||
nas_content_root: str = "./data/content"
|
||||
epub_source_root: str = "./data/epub-source"
|
||||
chapter_content_mode: str = "nas_first" # nas_first | mongo_first
|
||||
|
||||
deepseek_key: str = ""
|
||||
deepseek_model: str = "deepseek-chat"
|
||||
|
||||
+956
-16
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,33 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
from pathlib import Path
|
||||
|
||||
from app.config import settings
|
||||
|
||||
|
||||
class NasContentStorage:
|
||||
def __init__(self, root_dir: str):
|
||||
self.root = Path(root_dir).resolve()
|
||||
self.root.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def _resolve(self, href: str) -> Path:
|
||||
rel = href.strip().lstrip("/")
|
||||
target = (self.root / rel).resolve()
|
||||
if self.root not in target.parents and target != self.root:
|
||||
raise ValueError("Invalid storage href")
|
||||
return target
|
||||
|
||||
def read_text(self, href: str) -> str:
|
||||
path = self._resolve(href)
|
||||
return path.read_text(encoding="utf-8")
|
||||
|
||||
def write_text(self, href: str, content: str) -> dict[str, str | int]:
|
||||
path = self._resolve(href)
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(content, encoding="utf-8")
|
||||
digest = hashlib.sha256(content.encode("utf-8")).hexdigest()
|
||||
return {"href": href, "sha256": digest, "size": len(content.encode("utf-8"))}
|
||||
|
||||
|
||||
storage = NasContentStorage(settings.nas_content_root)
|
||||
@@ -9,6 +9,12 @@ services:
|
||||
- .env
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
NAS_CONTENT_ROOT: ${NAS_CONTENT_ROOT:-/data/content}
|
||||
EPUB_SOURCE_ROOT: ${EPUB_SOURCE_ROOT:-/data/epub-source}
|
||||
volumes:
|
||||
- nas_chapter_content:/data/content
|
||||
- nas_epub_source:/data/epub-source
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/api/health').read()"]
|
||||
@@ -30,8 +36,13 @@ services:
|
||||
environment:
|
||||
DATABASE_URL: postgresql://reader:reader@postgres:5432/reader
|
||||
MONGODB_URI: mongodb://mongo:27017/reader
|
||||
NAS_CONTENT_ROOT: ${NAS_CONTENT_ROOT:-/data/content}
|
||||
EPUB_SOURCE_ROOT: ${EPUB_SOURCE_ROOT:-/data/epub-source}
|
||||
ports:
|
||||
- "8001:8000"
|
||||
volumes:
|
||||
- nas_chapter_content:/data/content
|
||||
- nas_epub_source:/data/epub-source
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/api/health').read()"]
|
||||
@@ -104,3 +115,5 @@ volumes:
|
||||
web_uploads:
|
||||
postgres_data:
|
||||
mongo_data:
|
||||
nas_chapter_content:
|
||||
nas_epub_source:
|
||||
|
||||
@@ -0,0 +1,43 @@
|
||||
CREATE EXTENSION IF NOT EXISTS unaccent;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS "SourceAsset" (
|
||||
id TEXT PRIMARY KEY,
|
||||
path TEXT NOT NULL,
|
||||
sha256 TEXT NOT NULL,
|
||||
opf_identifier TEXT,
|
||||
title TEXT,
|
||||
author TEXT,
|
||||
status TEXT NOT NULL DEFAULT 'discovered',
|
||||
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS "SourceAsset_sha256_key" ON "SourceAsset"(sha256);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS "ImportJob" (
|
||||
id TEXT PRIMARY KEY,
|
||||
"sourceAssetId" TEXT NOT NULL REFERENCES "SourceAsset"(id) ON DELETE CASCADE,
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
error TEXT,
|
||||
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS "ChapterContentRef" (
|
||||
"chapterId" TEXT PRIMARY KEY,
|
||||
"txtHref" TEXT NOT NULL,
|
||||
"rawHtmlHref" TEXT NOT NULL,
|
||||
"contentHash" TEXT NOT NULL,
|
||||
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS "AssetNovelMapping" (
|
||||
id TEXT PRIMARY KEY,
|
||||
"sourceAssetId" TEXT NOT NULL REFERENCES "SourceAsset"(id) ON DELETE CASCADE,
|
||||
"novelId" TEXT NOT NULL,
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
note TEXT,
|
||||
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
@@ -0,0 +1,118 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import hashlib
|
||||
import json
|
||||
from pathlib import Path
|
||||
from bson import ObjectId
|
||||
from sqlalchemy import text
|
||||
|
||||
from app.config import settings
|
||||
from app.database import SessionLocal, mongo_db
|
||||
from app.storage import storage
|
||||
|
||||
|
||||
async def backfill(limit: int, dry_run: bool, after_id: str | None, state_file: str | None) -> None:
|
||||
query = {
|
||||
"$or": [
|
||||
{"content": {"$exists": True, "$type": "string", "$ne": ""}},
|
||||
{"contentHtml": {"$exists": True, "$type": "string", "$ne": ""}},
|
||||
]
|
||||
}
|
||||
if after_id:
|
||||
query["_id"] = {"$gt": ObjectId(after_id)}
|
||||
|
||||
docs = (
|
||||
await mongo_db["chapters"]
|
||||
.find(query, {"content": 1, "contentHtml": 1})
|
||||
.sort("_id", 1)
|
||||
.limit(limit)
|
||||
.to_list(limit)
|
||||
)
|
||||
|
||||
mapped = 0
|
||||
skipped = 0
|
||||
async with SessionLocal() as db:
|
||||
for doc in docs:
|
||||
chapter_id = str(doc.get("_id") or "")
|
||||
if not chapter_id:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
exists = (
|
||||
await db.execute(
|
||||
text('SELECT "chapterId" FROM "ChapterContentRef" WHERE "chapterId" = :id LIMIT 1'),
|
||||
{"id": chapter_id},
|
||||
)
|
||||
).mappings().first()
|
||||
if exists:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
txt = str(doc.get("content") or "").strip()
|
||||
raw_html = str(doc.get("contentHtml") or doc.get("content") or "")
|
||||
if not txt:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
txt_href = f"legacy/{chapter_id}.txt"
|
||||
raw_href = f"legacy/{chapter_id}.raw.html"
|
||||
content_hash = hashlib.sha256(txt.encode("utf-8")).hexdigest()
|
||||
|
||||
if not dry_run:
|
||||
storage.write_text(txt_href, txt)
|
||||
storage.write_text(raw_href, raw_html)
|
||||
await db.execute(
|
||||
text(
|
||||
'INSERT INTO "ChapterContentRef" ("chapterId", "txtHref", "rawHtmlHref", "contentHash") '
|
||||
'VALUES (:chapter_id, :txt_href, :raw_href, :hash) '
|
||||
'ON CONFLICT ("chapterId") DO NOTHING'
|
||||
),
|
||||
{
|
||||
"chapter_id": chapter_id,
|
||||
"txt_href": txt_href,
|
||||
"raw_href": raw_href,
|
||||
"hash": content_hash,
|
||||
},
|
||||
)
|
||||
mapped += 1
|
||||
|
||||
if not dry_run:
|
||||
await db.commit()
|
||||
|
||||
last_id = str(docs[-1]["_id"]) if docs else None
|
||||
summary = {
|
||||
"scanned": len(docs),
|
||||
"mapped": mapped,
|
||||
"skipped": skipped,
|
||||
"dryRun": dry_run,
|
||||
"contentRoot": settings.nas_content_root,
|
||||
"nextAfterId": last_id,
|
||||
}
|
||||
if state_file and last_id and not dry_run:
|
||||
Path(state_file).write_text(json.dumps({"afterId": last_id}, ensure_ascii=True), encoding="utf-8")
|
||||
print(summary)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Backfill ChapterContentRef from Mongo chapters")
|
||||
parser.add_argument("--limit", type=int, default=1000)
|
||||
parser.add_argument("--dry-run", action="store_true")
|
||||
parser.add_argument("--after-id", type=str, default="")
|
||||
parser.add_argument("--state-file", type=str, default="")
|
||||
args = parser.parse_args()
|
||||
after_id = args.after_id.strip() or None
|
||||
state_file = args.state_file.strip() or None
|
||||
if state_file and not after_id:
|
||||
p = Path(state_file)
|
||||
if p.exists():
|
||||
try:
|
||||
after_id = json.loads(p.read_text(encoding="utf-8")).get("afterId")
|
||||
except Exception:
|
||||
after_id = None
|
||||
asyncio.run(backfill(limit=args.limit, dry_run=args.dry_run, after_id=after_id, state_file=state_file))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user