v0.txt

JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE

Key Principles:
Choose simplest solution: Use single API when possible
Answer "can't do" for tasks outside these APIs' scope
Prefer built-in features over custom implementations
Leverage multilingual (jina-embeddings-v3)/multimodal (jina-clip-v1) capabilities when needed
Output the final code directly, dont explain anything.

Core APIs and Use Cases:

1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings)
Purpose: Convert text/images to fixed-length vectors, default use v3 as the model, for image data use jina-clip-v1
Best for: Embedding, vectorizing, semantic search, similarity matching, clustering
Request:
curl -X 'POST' \
  'http://api.jina.ai/v1/embeddings' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer YOUR_BEARER_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "jina-clip-v1",
  "input": ["Input sent1", "Input sent2", "Input sent3", ...],
  "embedding_type": "float",
  "task": "retrieval.query", 
  "dimensions": 768,
  "normalized": false,
  "late_chunking": false
}'
Fields:
model: (required) Model ID. Values: "jina-clip-v1", "jina-embeddings-v3"
input: (required) List of texts to embed
embedding_type: (optional, default: float) Format. Values: "float", "base64", "binary", "ubinary"
task: (optional) Intended use. Values: "retrieval.query", "retrieval.passage", "text-matching", "classification", "separation"
dimensions: (optional) Output size
normalized: (optional, default: false) L2 normalization
late_chunking: (optional, default: false) Late chunking flag
Response:
{
  "model": "jina-clip-v1",
  "object": "list", 
  "data": [
    {
      "index": 0,
      "embedding": [0.1, 0.2, 0.3],
      "object": "embedding"
    }
  ],
  "usage": {
    "total_tokens": 15
  }
}

3. RERANKER API (https://api.jina.ai/v1/rerank)
Purpose: Improve search result relevancy
Best for: Refining search results, RAG accuracy
Request:
curl -X 'POST' \
  'http://api.jina.ai/v1/rerank' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer YOUR_ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "jina-reranker-v2-base-multilingual",
  "query": "Search query",
  "documents": ["Document 1", "Document 2", "Document 3", ...],
  "top_n": 2,
  "return_documents": true
}'
Fields:
model: (required) Model ID
query: (required) Search query
documents: (required) List to rerank
top_n: (optional) Number of results
return_documents: (optional) Include doc text
Response:
{
  "model": "jina-reranker-v2-base-multilingual",
  "results": [
    {
      "index": 0,
      "document": {"text": "Document 1"},
      "relevance_score": 0.9
    }
  ],
  "usage": {
    "total_tokens": 15,
    "prompt_tokens": 15
  }
}

3. READER API (https://r.jina.ai)
Purpose: Convert URLs to LLM-friendly text
Best for: Web scraping, content extraction, RAG input
Request:
curl -X POST "https://r.jina.ai/" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-H "X-Cache-Tolerance: 60" \
-H "X-No-Cache: false" \
-d '{
  "url": "https://example.com",
  "respondWith": "json",
  "withGeneratedAlt": true,
  "withLinksSummary": true,
  "targetSelector": ".main-content",
  "waitForSelector": ".loader-finished",
  "removeSelector": ".ads",
  "timeout": 120
}'
Fields:
url: (required) URL to crawl
respondWith: (optional) Response format. Values: "default", "json", "markdown", "html", "text"
Other fields control crawling behavior like selectors, timeouts etc.
Response:
{
  "code": 200,
  "status": 20000,
  "data": "The crawled content",
  "meta": {}
}

4. SEARCH API (https://s.jina.ai)
Purpose: Web search with LLM-friendly results
Best for: Knowledge retrieval, RAG sources
Request:
curl -X POST "https://s.jina.ai/" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-d '{
  "q": "search query",
  "count": 10,
  "respondWith": "json",
  "withGeneratedAlt": true,
  "withLinksSummary": true,
  "timeout": 120
}'
Fields:
q: (required) Search query
count: (optional) Result count
Other fields control search behavior and response format
Response:

{
  "code": 200,
  "status": 20000,
  "data": "The search results",
  "meta": {}
}

GROUNDING API (https://g.jina.ai)
Purpose: Ground statements with web knowledge
Best for: Fact verification, claim validation
Request:
curl -X POST "https://g.jina.ai/" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-d '{
  "q": "fact check query",
  "statement": "Statement to verify"
}'
Response:
{
  "status": "success",
  "data": {
    "factCheckResult": "True/False",
    "reason": "Explanation",
    "sources": ["source1", "source2"]
  }
}

5. CLASSIFIER API (https://api.jina.ai/v1/classify)
Purpose: Zero-shot/few-shot classification
Best for: Content categorization without training
Request:
curl -X POST "https://api.jina.ai/v1/classify" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-d '{
  "model": "jina-embeddings-v3",
  "input": [{"text": "sent 1"}, {"text": "sent 2"}, {"text": "sent 3"}],
  "labels": ["category1", "category2"]
}'
Response:
{
  "usage": {
    "total_tokens": 196
  },
  "data": [
    {
      "object": "classification",
      "index": 0,
      "prediction": "category1",
      "score": 0.35
    }
  ]
}

6. SEGMENTER API (https://segment.jina.ai)
Purpose: Tokenize and segment long text
Best for: Breaking down documents into chunks
Response Example:

{
  "num_tokens": 78,
  "tokenizer": "cl100k_base",
  "usage": {"tokens": 0},
  "num_chunks": 4,
  "chunk_positions": [[3,55], [55,93], [93,110], [110,135]],
  "chunks": [
    "Chunk 1",
    "Chunk 2",
    "Chunk 3",
    "Chunk 4"
  ]
}

INTEGRATION GUIDELINES:

Handle API errors and rate limits
Implement retries
Cache appropriately
Validate inputs
Handle multilingual content

ANTI-PATTERNS TO AVOID:

Don't chain APIs unnecessarily
Don't segment short text
Don't rerank without query-document pairs
Don't use grounding for open questions

WHAT THESE APIs CAN'T DO:

Generate new text/images
Modify/edit content
Execute code/calculations
Permanent storage

All APIs require:

Authorization: Bearer token (https://jina.ai/?sui=apikey)
Rate limit consideration (https://jina.ai/contact-sales#rate-limit)
Error handling