Implement streaming LLM responses with Server-Sent Events. Token-by-token output from API to browser with proper error handling and abort.
## Task Stream LLM responses token-by-token from server to client using Server-Sent Events. ## Requirements - Server: Node.js/Next.js API route or Python FastAPI - Client: React with streaming state management - LLM: Any provider with streaming support (OpenAI, Anthropic, etc.) ## Server Implementation ``` POST /api/chat Request: { messages: Message[], model: string } Response: text/event-stream Event format: data: {"token": "Hello"} data: {"token": " world"} data: {"done": true, "usage": {"input": 50, "output": 12}} data: [DONE] ``` ## Client Implementation ``` - Use fetch() with ReadableStream, NOT EventSource (POST not supported) - Parse SSE lines from the stream - Update UI token by token with requestAnimationFrame batching - Show typing indicator before first token - Support abort via AbortController ``` ## Implementation Notes 1. Handle backpressure — don't buffer unlimited tokens 2. Add timeout for first token (10s) and between tokens (30s) 3. Client-side: batch DOM updates every 50ms for performance 4. Include error events for graceful failure display 5. Support markdown rendering as tokens arrive (tricky — use incremental parser) 6. Clean up on component unmount (abort controller)
No gallery images yet.