feat(storage): implement NAS content storage with read/write capabilities
Build and Push Reader API Image / docker (push) Successful in 1m3s
Build and Push Reader API Image / docker (push) Successful in 1m3s
feat(docker): configure NAS content and EPUB source directories in docker-compose feat(migrations): add tables for SourceAsset, ImportJob, ChapterContentRef, and AssetNovelMapping feat(scripts): create backfill script for populating ChapterContentRef from MongoDB chapters
This commit is contained in:
+6
-1
@@ -11,4 +11,9 @@ build/
|
|||||||
.env
|
.env
|
||||||
.env.local
|
.env.local
|
||||||
# Local debug/test scripts
|
# Local debug/test scripts
|
||||||
test_*.py
|
test_*.py
|
||||||
|
|
||||||
|
# Local NAS mount test data
|
||||||
|
data/epub-source/*
|
||||||
|
data/content/*
|
||||||
|
data/nas-content/*
|
||||||
|
|||||||
@@ -90,6 +90,51 @@ Notes:
|
|||||||
- `api-local` listens on port `8001` and automatically points to `postgres` + `mongo` containers.
|
- `api-local` listens on port `8001` and automatically points to `postgres` + `mongo` containers.
|
||||||
- `web` listens on port `3000` and calls API internally through `http://api:8000`.
|
- `web` listens on port `3000` and calls API internally through `http://api:8000`.
|
||||||
|
|
||||||
|
### NAS mount points (chapter content + EPUB source)
|
||||||
|
|
||||||
|
API containers now reserve two mount folders:
|
||||||
|
|
||||||
|
- `/data/content`: converted chapter files (`txt` + `raw_html`)
|
||||||
|
- `/data/epub-source`: source EPUB library
|
||||||
|
|
||||||
|
Default env mapping (already wired in compose):
|
||||||
|
|
||||||
|
```env
|
||||||
|
NAS_CONTENT_ROOT=/data/content
|
||||||
|
EPUB_SOURCE_ROOT=/data/epub-source
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to bind to host folders for local testing:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
api:
|
||||||
|
volumes:
|
||||||
|
- /absolute/local/path/content:/data/content
|
||||||
|
- /absolute/local/path/epub-source:/data/epub-source
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want to use NFS-backed docker volumes, define them under `volumes:`. Example:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
volumes:
|
||||||
|
nas_chapter_content:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: nfs
|
||||||
|
o: addr=100.93.79.10,nolock,soft,rw
|
||||||
|
device: ":/volume2/apps/reader-content"
|
||||||
|
|
||||||
|
nas_epub_source:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: nfs
|
||||||
|
o: addr=100.93.79.10,nolock,soft,rw
|
||||||
|
device: ":/volume2/apps/reader-epub"
|
||||||
|
```
|
||||||
|
|
||||||
|
For your EPUB structure (folder per novel, multiple `.epub` parts inside), mount the parent folder to `/data/epub-source`.
|
||||||
|
|
||||||
## Implemented Endpoints
|
## Implemented Endpoints
|
||||||
|
|
||||||
- GET /api/health
|
- GET /api/health
|
||||||
@@ -109,6 +154,52 @@ Notes:
|
|||||||
- GET /api/truyen/suggest
|
- GET /api/truyen/suggest
|
||||||
- GET /api/chapters/{chapterId}
|
- GET /api/chapters/{chapterId}
|
||||||
|
|
||||||
|
## NAS Migration Ops
|
||||||
|
|
||||||
|
### 1) Apply SQL migration manually
|
||||||
|
|
||||||
|
Run SQL in `migrations/2026_04_nas_content_storage.sql` against PostgreSQL.
|
||||||
|
|
||||||
|
### 2) Backfill existing chapter content from Mongo -> NAS + ChapterContentRef
|
||||||
|
|
||||||
|
Dry-run first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/backfill_chapter_content_refs.py --limit 1000 --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
Then execute:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/backfill_chapter_content_refs.py --limit 1000
|
||||||
|
```
|
||||||
|
|
||||||
|
You can run multiple batches by increasing/changing `--limit`.
|
||||||
|
|
||||||
|
Checkpoint/resume mode:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/backfill_chapter_content_refs.py --limit 1000 --state-file .backfill_state.json
|
||||||
|
```
|
||||||
|
|
||||||
|
Or continue from a known ObjectId:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/backfill_chapter_content_refs.py --limit 1000 --after-id 680f7f3a2f0d53f4f2b7a123
|
||||||
|
```
|
||||||
|
|
||||||
|
## Chapter Read Cutover Flag
|
||||||
|
|
||||||
|
Set in `.env`:
|
||||||
|
|
||||||
|
```env
|
||||||
|
CHAPTER_CONTENT_MODE=nas_first
|
||||||
|
```
|
||||||
|
|
||||||
|
Values:
|
||||||
|
- `nas_first` (default): read NAS ref first, fallback Mongo.
|
||||||
|
- `mongo_first`: keep Mongo-first during cautious rollout.
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Web session auth is supported via NextAuth session cookies (next-auth.session-token and secure variants).
|
- Web session auth is supported via NextAuth session cookies (next-auth.session-token and secure variants).
|
||||||
|
|||||||
@@ -0,0 +1,30 @@
|
|||||||
|
# Rollout Checklist - NAS Chapter Storage
|
||||||
|
|
||||||
|
## Pre-Deploy
|
||||||
|
- [ ] Backup PostgreSQL schema + critical tables
|
||||||
|
- [ ] Verify NAS mount/access permissions in API runtime
|
||||||
|
- [ ] Enable feature flags (default: Mongo fallback on)
|
||||||
|
|
||||||
|
## Deploy Order
|
||||||
|
1. Deploy DB migrations
|
||||||
|
2. Deploy API with dual-read disabled by default
|
||||||
|
3. Enable discover/approve/convert job APIs
|
||||||
|
4. Run pilot import set (small curated EPUB batch)
|
||||||
|
5. Enable NAS-first for pilot users/env
|
||||||
|
6. Gradually ramp NAS-first traffic
|
||||||
|
|
||||||
|
## Runtime Verification
|
||||||
|
- [ ] `/api/health` stable
|
||||||
|
- [ ] Chapter read success rate >= target
|
||||||
|
- [ ] NAS read timeout/error rate below threshold
|
||||||
|
- [ ] Mongo fallback rate trending down
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
- [ ] Switch feature flag to Mongo-first immediately
|
||||||
|
- [ ] Stop import jobs
|
||||||
|
- [ ] Keep imported refs for investigation (no destructive cleanup)
|
||||||
|
|
||||||
|
## Post-Deploy
|
||||||
|
- [ ] Compare chapter counts and random content samples
|
||||||
|
- [ ] Review failed/review_required import queue
|
||||||
|
- [ ] Publish release notes for web/mobile teams
|
||||||
+4
-1
@@ -121,7 +121,10 @@ async def resolve_current_user(db: AsyncSession, request: Request) -> dict[str,
|
|||||||
return await _get_user_from_session_cookie(db, request)
|
return await _get_user_from_session_cookie(db, request)
|
||||||
|
|
||||||
|
|
||||||
async def require_current_user(db: AsyncSession, request: Request) -> dict[str, Any]:
|
async def require_current_user(
|
||||||
|
request: Request,
|
||||||
|
db: AsyncSession = Depends(get_db_session),
|
||||||
|
) -> dict[str, Any]:
|
||||||
user = await resolve_current_user(db, request)
|
user = await resolve_current_user(db, request)
|
||||||
if not user:
|
if not user:
|
||||||
raise HTTPException(status_code=401, detail="Unauthorized")
|
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||||
|
|||||||
@@ -20,6 +20,9 @@ class Settings(BaseSettings):
|
|||||||
r2_secret_access_key: str = ""
|
r2_secret_access_key: str = ""
|
||||||
r2_bucket_name: str = ""
|
r2_bucket_name: str = ""
|
||||||
r2_public_base_url: str = ""
|
r2_public_base_url: str = ""
|
||||||
|
nas_content_root: str = "./data/content"
|
||||||
|
epub_source_root: str = "./data/epub-source"
|
||||||
|
chapter_content_mode: str = "nas_first" # nas_first | mongo_first
|
||||||
|
|
||||||
deepseek_key: str = ""
|
deepseek_key: str = ""
|
||||||
deepseek_model: str = "deepseek-chat"
|
deepseek_model: str = "deepseek-chat"
|
||||||
|
|||||||
+956
-16
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,33 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from app.config import settings
|
||||||
|
|
||||||
|
|
||||||
|
class NasContentStorage:
|
||||||
|
def __init__(self, root_dir: str):
|
||||||
|
self.root = Path(root_dir).resolve()
|
||||||
|
self.root.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
def _resolve(self, href: str) -> Path:
|
||||||
|
rel = href.strip().lstrip("/")
|
||||||
|
target = (self.root / rel).resolve()
|
||||||
|
if self.root not in target.parents and target != self.root:
|
||||||
|
raise ValueError("Invalid storage href")
|
||||||
|
return target
|
||||||
|
|
||||||
|
def read_text(self, href: str) -> str:
|
||||||
|
path = self._resolve(href)
|
||||||
|
return path.read_text(encoding="utf-8")
|
||||||
|
|
||||||
|
def write_text(self, href: str, content: str) -> dict[str, str | int]:
|
||||||
|
path = self._resolve(href)
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(content, encoding="utf-8")
|
||||||
|
digest = hashlib.sha256(content.encode("utf-8")).hexdigest()
|
||||||
|
return {"href": href, "sha256": digest, "size": len(content.encode("utf-8"))}
|
||||||
|
|
||||||
|
|
||||||
|
storage = NasContentStorage(settings.nas_content_root)
|
||||||
@@ -9,6 +9,12 @@ services:
|
|||||||
- .env
|
- .env
|
||||||
ports:
|
ports:
|
||||||
- "8000:8000"
|
- "8000:8000"
|
||||||
|
environment:
|
||||||
|
NAS_CONTENT_ROOT: ${NAS_CONTENT_ROOT:-/data/content}
|
||||||
|
EPUB_SOURCE_ROOT: ${EPUB_SOURCE_ROOT:-/data/epub-source}
|
||||||
|
volumes:
|
||||||
|
- nas_chapter_content:/data/content
|
||||||
|
- nas_epub_source:/data/epub-source
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/api/health').read()"]
|
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/api/health').read()"]
|
||||||
@@ -30,8 +36,13 @@ services:
|
|||||||
environment:
|
environment:
|
||||||
DATABASE_URL: postgresql://reader:reader@postgres:5432/reader
|
DATABASE_URL: postgresql://reader:reader@postgres:5432/reader
|
||||||
MONGODB_URI: mongodb://mongo:27017/reader
|
MONGODB_URI: mongodb://mongo:27017/reader
|
||||||
|
NAS_CONTENT_ROOT: ${NAS_CONTENT_ROOT:-/data/content}
|
||||||
|
EPUB_SOURCE_ROOT: ${EPUB_SOURCE_ROOT:-/data/epub-source}
|
||||||
ports:
|
ports:
|
||||||
- "8001:8000"
|
- "8001:8000"
|
||||||
|
volumes:
|
||||||
|
- nas_chapter_content:/data/content
|
||||||
|
- nas_epub_source:/data/epub-source
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/api/health').read()"]
|
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/api/health').read()"]
|
||||||
@@ -104,3 +115,5 @@ volumes:
|
|||||||
web_uploads:
|
web_uploads:
|
||||||
postgres_data:
|
postgres_data:
|
||||||
mongo_data:
|
mongo_data:
|
||||||
|
nas_chapter_content:
|
||||||
|
nas_epub_source:
|
||||||
|
|||||||
@@ -0,0 +1,43 @@
|
|||||||
|
CREATE EXTENSION IF NOT EXISTS unaccent;
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS "SourceAsset" (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
path TEXT NOT NULL,
|
||||||
|
sha256 TEXT NOT NULL,
|
||||||
|
opf_identifier TEXT,
|
||||||
|
title TEXT,
|
||||||
|
author TEXT,
|
||||||
|
status TEXT NOT NULL DEFAULT 'discovered',
|
||||||
|
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX IF NOT EXISTS "SourceAsset_sha256_key" ON "SourceAsset"(sha256);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS "ImportJob" (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
"sourceAssetId" TEXT NOT NULL REFERENCES "SourceAsset"(id) ON DELETE CASCADE,
|
||||||
|
status TEXT NOT NULL DEFAULT 'pending',
|
||||||
|
error TEXT,
|
||||||
|
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS "ChapterContentRef" (
|
||||||
|
"chapterId" TEXT PRIMARY KEY,
|
||||||
|
"txtHref" TEXT NOT NULL,
|
||||||
|
"rawHtmlHref" TEXT NOT NULL,
|
||||||
|
"contentHash" TEXT NOT NULL,
|
||||||
|
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS "AssetNovelMapping" (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
"sourceAssetId" TEXT NOT NULL REFERENCES "SourceAsset"(id) ON DELETE CASCADE,
|
||||||
|
"novelId" TEXT NOT NULL,
|
||||||
|
status TEXT NOT NULL DEFAULT 'pending',
|
||||||
|
note TEXT,
|
||||||
|
"createdAt" TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
"updatedAt" TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
);
|
||||||
@@ -0,0 +1,118 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from bson import ObjectId
|
||||||
|
from sqlalchemy import text
|
||||||
|
|
||||||
|
from app.config import settings
|
||||||
|
from app.database import SessionLocal, mongo_db
|
||||||
|
from app.storage import storage
|
||||||
|
|
||||||
|
|
||||||
|
async def backfill(limit: int, dry_run: bool, after_id: str | None, state_file: str | None) -> None:
|
||||||
|
query = {
|
||||||
|
"$or": [
|
||||||
|
{"content": {"$exists": True, "$type": "string", "$ne": ""}},
|
||||||
|
{"contentHtml": {"$exists": True, "$type": "string", "$ne": ""}},
|
||||||
|
]
|
||||||
|
}
|
||||||
|
if after_id:
|
||||||
|
query["_id"] = {"$gt": ObjectId(after_id)}
|
||||||
|
|
||||||
|
docs = (
|
||||||
|
await mongo_db["chapters"]
|
||||||
|
.find(query, {"content": 1, "contentHtml": 1})
|
||||||
|
.sort("_id", 1)
|
||||||
|
.limit(limit)
|
||||||
|
.to_list(limit)
|
||||||
|
)
|
||||||
|
|
||||||
|
mapped = 0
|
||||||
|
skipped = 0
|
||||||
|
async with SessionLocal() as db:
|
||||||
|
for doc in docs:
|
||||||
|
chapter_id = str(doc.get("_id") or "")
|
||||||
|
if not chapter_id:
|
||||||
|
skipped += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
exists = (
|
||||||
|
await db.execute(
|
||||||
|
text('SELECT "chapterId" FROM "ChapterContentRef" WHERE "chapterId" = :id LIMIT 1'),
|
||||||
|
{"id": chapter_id},
|
||||||
|
)
|
||||||
|
).mappings().first()
|
||||||
|
if exists:
|
||||||
|
skipped += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
txt = str(doc.get("content") or "").strip()
|
||||||
|
raw_html = str(doc.get("contentHtml") or doc.get("content") or "")
|
||||||
|
if not txt:
|
||||||
|
skipped += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
txt_href = f"legacy/{chapter_id}.txt"
|
||||||
|
raw_href = f"legacy/{chapter_id}.raw.html"
|
||||||
|
content_hash = hashlib.sha256(txt.encode("utf-8")).hexdigest()
|
||||||
|
|
||||||
|
if not dry_run:
|
||||||
|
storage.write_text(txt_href, txt)
|
||||||
|
storage.write_text(raw_href, raw_html)
|
||||||
|
await db.execute(
|
||||||
|
text(
|
||||||
|
'INSERT INTO "ChapterContentRef" ("chapterId", "txtHref", "rawHtmlHref", "contentHash") '
|
||||||
|
'VALUES (:chapter_id, :txt_href, :raw_href, :hash) '
|
||||||
|
'ON CONFLICT ("chapterId") DO NOTHING'
|
||||||
|
),
|
||||||
|
{
|
||||||
|
"chapter_id": chapter_id,
|
||||||
|
"txt_href": txt_href,
|
||||||
|
"raw_href": raw_href,
|
||||||
|
"hash": content_hash,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
mapped += 1
|
||||||
|
|
||||||
|
if not dry_run:
|
||||||
|
await db.commit()
|
||||||
|
|
||||||
|
last_id = str(docs[-1]["_id"]) if docs else None
|
||||||
|
summary = {
|
||||||
|
"scanned": len(docs),
|
||||||
|
"mapped": mapped,
|
||||||
|
"skipped": skipped,
|
||||||
|
"dryRun": dry_run,
|
||||||
|
"contentRoot": settings.nas_content_root,
|
||||||
|
"nextAfterId": last_id,
|
||||||
|
}
|
||||||
|
if state_file and last_id and not dry_run:
|
||||||
|
Path(state_file).write_text(json.dumps({"afterId": last_id}, ensure_ascii=True), encoding="utf-8")
|
||||||
|
print(summary)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description="Backfill ChapterContentRef from Mongo chapters")
|
||||||
|
parser.add_argument("--limit", type=int, default=1000)
|
||||||
|
parser.add_argument("--dry-run", action="store_true")
|
||||||
|
parser.add_argument("--after-id", type=str, default="")
|
||||||
|
parser.add_argument("--state-file", type=str, default="")
|
||||||
|
args = parser.parse_args()
|
||||||
|
after_id = args.after_id.strip() or None
|
||||||
|
state_file = args.state_file.strip() or None
|
||||||
|
if state_file and not after_id:
|
||||||
|
p = Path(state_file)
|
||||||
|
if p.exists():
|
||||||
|
try:
|
||||||
|
after_id = json.loads(p.read_text(encoding="utf-8")).get("afterId")
|
||||||
|
except Exception:
|
||||||
|
after_id = None
|
||||||
|
asyncio.run(backfill(limit=args.limit, dry_run=args.dry_run, after_id=after_id, state_file=state_file))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Reference in New Issue
Block a user