Introduction

❗

Starting from V10, Umo Editor Next has fully refactored its AI features. The related APIs have changed significantly and are not compatible with V9. Please upgrade if you are using an older version.

The goal of the V10 AI refactor is not to add more flashy features, but to shift the core approach: upgrading AI from isolated features into a controllable capability system designed for editing workflows:

Frontend: collect editor context in a unified way (selection / node-level Markdown / full document / cursor marker), stream-render in a unified way, and write results back reliably
Backend: act as an adapter layer that can connect to any model, and choose between the default and agui protocols, making it pluggable, extensible, and evolvable

This refactor primarily addresses common pain points when shipping AI features in real projects: unstable context and write-back (unclear what to change and how much), fragmented protocol/model integrations (high cost to switch models or add tool-calling), and unclear frontend/backend responsibilities leading to higher security and maintenance costs.

The benefits: AI output behaves more like “actions executed inside the editor”, write-back becomes more reliable, models and protocols become easier to extend, and private deployment/compliance becomes easier to guarantee.

The AI capabilities in Umo Editor Next are designed to make the model behave like a reliable execution-oriented editing assistant inside the editor—helping users complete document editing tasks safely, controllably, and efficiently, rather than acting as a generic chatbot. The key traits are “controllable, reproducible, and extensible”, and they heavily rely on editor context (selection, nodes, document structure, cursor position, etc.).

The core integration idea is:

Frontend: collect editor context, send requests, and render SSE streaming output
Backend: act as a business-side adapter/proxy layer, connect to any LLM provider (including mainstream OpenAI-compatible models), and output SSE chunks that follow the agreed format

Capabilities

AI Document Assistant (Assistant): a floating entry focused on editing output for the current selection / near the cursor, with one-click replace/insert.
AI Chat Assistant (Chat): a sidebar chat entry with history and optional attachments, suitable for multi-turn Q&A, task clarification, and information organization.
AI Suggestions (Suggestion): auto-proposes candidate text while typing; accept with Tab/Enter (autocomplete/continuation), switch suggestions with arrow keys.

Use Cases

First drafts: turn outlines into full content for proposals, reports, policies, and emails
Smart rewriting: polish, expand, shorten, proofread, and translate daily writing
Structured output: turn requirements/meeting notes into lists, tables, and sections
Research and Q&A: clarify requirements in chat, extract conclusions, and write back usable paragraphs
Technical writing: generate code snippets, Mermaid diagrams, and LaTeX formulas and insert them into the document
Real-time completion: use suggestions while typing to improve speed and flow

Works Well With

Markdown: AI output is rendered as Markdown by default, which is great for structured content and import/export workflows: Markdown
Track Changes: enable Track Changes for AI write-back to make review and finalization safer: Track Changes
Comments: review AI-generated content in threads and write the final decisions back to the document: Comments
Version History: create a version before/after major rewrites for easy rollback and comparison: Version History
Toolbar/Page Aside: Chat typically lives in the aside; Assistant is a floating entry in the editor area: Toolbar, Page Aside

Key Features

Streaming output

Supports SSE streaming responses: render as content is generated for faster interaction
Supports non-streaming responses: return the full result at once (depending on your backend and model setup)

Multiple protocols

Supports the default protocol and the AG-UI standard protocol
AG-UI has built-in frontend parsing, better suited for agent tool-calling, multi-step tasks, and progress/status events

Multiple models & reasoning mode

Configure multiple AI models and switch them in the UI
If the model supports reasoning, the UI provides a “Reasoning” toggle

Attachments

Upload images/files in the AI Chat Assistant and send attachment metadata to your backend
Display files and images in messages

Skill modes

Built-in skill modes guide the output format, including:

write: generate or rewrite Markdown
code: output Markdown code blocks
mermaid: output Mermaid diagram code blocks
math: output math-related Markdown
image: output image-related Markdown
search: interpret/summarize web content (more for understanding, not direct document editing)

Write-back actions

After the AI output completes, a full “result handling loop” is provided:

Replace: replace the current selection with the result
Insert: insert the result at the cursor position
Copy: copy the result to the clipboard
Rewrite: regenerate the result

Pluggable request lifecycle hooks

You can inject custom logic during the request lifecycle:

onRequest: rewrite requests (headers/body/routing/auth, etc.)
onMessage: in the default protocol, map SSE chunks to renderable content segments
onStart / onComplete / onAbort / onError: start/finish/abort/error handling

Privacy & security (control-first by default)

No secrets in the frontend: auth and key management are handled by your backend
You control the context: decide what context can be sent, how to desensitize it, and whether to log it on the server

Server Configuration Getting Started