Frontend/Backend Flow
This chapter explains the full pipeline from “user clicks Send” to “AI response is received and written back into the editor”, and highlights how the frontend params help the model do a better job.
End-to-end sequence diagram
Key points in this flow:
- Build and send
params: collect context + user input, then let the chat engine send the request - Stream parsing and rendering:
defaultusescallbacks.onMessage;aguiuses built-in parsing - Write back into the editor: write back after the user clicks actions like “Replace/Insert”
params structure (the real backend input)
params fields (roughly ordered by importance):
{
systemPrompt, // System prompt with {cursorMarker} already substituted
prompt, // User prompt (what the user asked for)
skill, // Selected skill: 'write' | 'image' | 'code' | 'mermaid' | 'search'
from, // Source: 'assistant' | 'chat'
reasoning, // Reasoning enabled (boolean)
model, // Current model object (value/label/endpoint/protocol/...)
messages, // All messages in this session (array)
locale, // 'zh-CN' | 'en-US'
selectionText, // Selected plain text (may be empty)
selectionNodes, // Selection as node-level Markdown (may be empty; may include markers)
document, // Full document Markdown (with cursor/selection markers)
cursorMarker, // Cursor marker (default '⦙')
chatID, // Chat session id (same id implies a continuous session)
documentID, // Document id (for retrieval/archiving in your app)
userID, // User id (for quotas/audit in your app)
attachments, // Optional: attachments array (chat only, when files.enabled)
}The fields most related to editing quality are:
systemPromptskillselectionNodes/documentcursorMarker(embedded insystemPromptand indocument/selectionNodes)
How the backend should use params (recommended prompt strategy)
Put systemPrompt in the system role
It is strongly recommended to put systemPrompt into the model system message (or top-level instruction), not mixed into the user prompt.
systemPromptcontains hard constraints such as “output must be directly insertable/replacable” and “must not break Markdown structure”- This significantly reduces fluff, structure-breaking output, and out-of-scope edits
Route and constrain by skill
skill is not just for UI—it should drive your backend strategy:
write: allow generating/rewriting Markdown; output should be directly insertable/replacable contentcode: output should be Markdown code blocks (avoid explanatory text)mermaid: output Mermaid code blocksimage: return image Markdownsearch: return summaries/explanations, but do not rewrite document structure (unless your product definition allows it)
Recommended: define different backend templates (or agent toolchains) for each skill.
Prefer selectionNodes over selectionText
In selection scenarios:
selectionTextloses structure (lists/tables/code blocks)selectionNodespreserves node-level Markdown, is more stable, and may include start/end markers; it can include multiple nodes
Backend strategy:
- If
selectionNodesis not empty: treat it as the only allowed editable range (strict) - If there is no selection but
documentcontains a cursor marker: treat nearby context as a locally editable area - If both are empty: ask the user to select content or confirm an insertion point (or apply a backend fallback)
Interpret cursorMarker correctly (so the model knows what to change)
ai.cursorMarker (default ⦙) can appear in:
document: 1 marker for cursor; 2 markers for a selection (start/end)selectionNodes: may contain start/end markers; if insertion is not possible, it may be wrapped at the beginning/endsystemPrompt: rules reference this marker
Treat the cursor marker as a structured instruction, not a normal character:
- Selection: only modify content between the two markers
- Cursor: insert content at the marker; do not change content outside the marker
Use from to distinguish interaction semantics
assistant: editing-oriented output; require content that can be written backchat: conversation-oriented output; allow more explanation/clarification (still keep it concise)
Recommended: maintain two prompt strategies:
- assistant: strict output, minimal explanation, insertable
- chat: more natural, multi-turn, can ask for missing information
How to use attachments
Attachments appear only when:
from === 'chat'ai.chat.files.enabled === true- The user uploaded files and the upload completed (status is not
progress)
attachments come from your onFileUpload return value.
Common backend strategies:
- Images: OCR or vision models, then use extracted content for responses
- Documents/PDF: chunk + retrieval (RAG), then add relevant snippets into context
- Tables: parse into structured data, then let the model answer
Recommended backend interface shapes
SSE + custom protocol (default)
- You control the JSON output structure
- Frontend maps chunks via
ai.callbacks.onMessage
Minimum requirement: each SSE event outputs a single line of JSON (do not output Markdown with real newlines directly). See: Umo Editor Server AI SSE demo endpoints.
SSE + AG-UI (agui)
- You output AG-UI event types
- Frontend parses and renders automatically (great for agent tool calls)
If you plan to do tool calling, multi-step tasks, or progress/status, prefer AG-UI. See: Umo Editor Server AI SSE demo endpoints.