Embedding-Based Code Search — Semantic IDE

Build a semantic code search engine using embeddings. Search by intent ('function that validates email') rather than exact text matches.

by Promptsy Team

830 views263 copies

+63

Backend Development Data Science #semantic #vector-db #code-search #embeddings #developer-tools

Prompt Discussion

Prompt

## Task
Semantic code search using embeddings — find code by intent, not just keywords.

## Requirements
- Embeddings: OpenAI text-embedding-3-small or CodeBERT
- Vector store: pgvector, Qdrant, or ChromaDB
- Language: Python or TypeScript

## Pipeline
```
Indexing:
1. Walk codebase, extract functions/classes/modules
2. For each code chunk, create embedding of:
   - Code itself
   - Auto-generated description (via LLM)
   - Docstring/comments
3. Store in vector DB with metadata (file, line, language)

Querying:
1. User types: "function that retries HTTP requests with backoff"
2. Embed the query
3. Find top 10 nearest code chunks
4. Rerank by: similarity score × recency × file relevance
5. Return with file path, line numbers, and preview
```

## Chunking Strategy
```
- Functions: one chunk per function (with signature + body)
- Classes: one chunk per class (signature + method signatures)
- Modules: one chunk for top-level code per file
- Max chunk: 512 tokens
- Include 2 lines of context above and below
```

## Implementation Notes
1. Incremental indexing: only re-embed changed files (use git diff)
2. Language-aware parsing: use tree-sitter for AST extraction
3. Hybrid search: combine vector similarity with keyword BM25
4. Cache embeddings with content hash
5. Support multiple languages (TypeScript, Python, Go, Rust)
6. CLI tool: `codesearch "retry with exponential backoff"`

Compatible models

Copilot (GitHub)Claude Code

Gallery

No gallery images yet.

Version history

Discussion

Start discussion→

No comments yet. Start the discussion