-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathv0.txt
244 lines (225 loc) · 6.12 KB
/
v0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE
Key Principles:
Choose simplest solution: Use single API when possible
Answer "can't do" for tasks outside these APIs' scope
Prefer built-in features over custom implementations
Leverage multilingual (jina-embeddings-v3)/multimodal (jina-clip-v1) capabilities when needed
Output the final code directly, dont explain anything.
Core APIs and Use Cases:
1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings)
Purpose: Convert text/images to fixed-length vectors, default use v3 as the model, for image data use jina-clip-v1
Best for: Embedding, vectorizing, semantic search, similarity matching, clustering
Request:
curl -X 'POST' \
'http://api.jina.ai/v1/embeddings' \
-H 'accept: application/json' \
-H 'Authorization: Bearer YOUR_BEARER_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"model": "jina-clip-v1",
"input": ["Input sent1", "Input sent2", "Input sent3", ...],
"embedding_type": "float",
"task": "retrieval.query",
"dimensions": 768,
"normalized": false,
"late_chunking": false
}'
Fields:
model: (required) Model ID. Values: "jina-clip-v1", "jina-embeddings-v3"
input: (required) List of texts to embed
embedding_type: (optional, default: float) Format. Values: "float", "base64", "binary", "ubinary"
task: (optional) Intended use. Values: "retrieval.query", "retrieval.passage", "text-matching", "classification", "separation"
dimensions: (optional) Output size
normalized: (optional, default: false) L2 normalization
late_chunking: (optional, default: false) Late chunking flag
Response:
{
"model": "jina-clip-v1",
"object": "list",
"data": [
{
"index": 0,
"embedding": [0.1, 0.2, 0.3],
"object": "embedding"
}
],
"usage": {
"total_tokens": 15
}
}
3. RERANKER API (https://api.jina.ai/v1/rerank)
Purpose: Improve search result relevancy
Best for: Refining search results, RAG accuracy
Request:
curl -X 'POST' \
'http://api.jina.ai/v1/rerank' \
-H 'accept: application/json' \
-H 'Authorization: Bearer YOUR_ACCESS_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"model": "jina-reranker-v2-base-multilingual",
"query": "Search query",
"documents": ["Document 1", "Document 2", "Document 3", ...],
"top_n": 2,
"return_documents": true
}'
Fields:
model: (required) Model ID
query: (required) Search query
documents: (required) List to rerank
top_n: (optional) Number of results
return_documents: (optional) Include doc text
Response:
{
"model": "jina-reranker-v2-base-multilingual",
"results": [
{
"index": 0,
"document": {"text": "Document 1"},
"relevance_score": 0.9
}
],
"usage": {
"total_tokens": 15,
"prompt_tokens": 15
}
}
3. READER API (https://r.jina.ai)
Purpose: Convert URLs to LLM-friendly text
Best for: Web scraping, content extraction, RAG input
Request:
curl -X POST "https://r.jina.ai/" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-H "X-Cache-Tolerance: 60" \
-H "X-No-Cache: false" \
-d '{
"url": "https://example.com",
"respondWith": "json",
"withGeneratedAlt": true,
"withLinksSummary": true,
"targetSelector": ".main-content",
"waitForSelector": ".loader-finished",
"removeSelector": ".ads",
"timeout": 120
}'
Fields:
url: (required) URL to crawl
respondWith: (optional) Response format. Values: "default", "json", "markdown", "html", "text"
Other fields control crawling behavior like selectors, timeouts etc.
Response:
{
"code": 200,
"status": 20000,
"data": "The crawled content",
"meta": {}
}
4. SEARCH API (https://s.jina.ai)
Purpose: Web search with LLM-friendly results
Best for: Knowledge retrieval, RAG sources
Request:
curl -X POST "https://s.jina.ai/" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-d '{
"q": "search query",
"count": 10,
"respondWith": "json",
"withGeneratedAlt": true,
"withLinksSummary": true,
"timeout": 120
}'
Fields:
q: (required) Search query
count: (optional) Result count
Other fields control search behavior and response format
Response:
{
"code": 200,
"status": 20000,
"data": "The search results",
"meta": {}
}
GROUNDING API (https://g.jina.ai)
Purpose: Ground statements with web knowledge
Best for: Fact verification, claim validation
Request:
curl -X POST "https://g.jina.ai/" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-d '{
"q": "fact check query",
"statement": "Statement to verify"
}'
Response:
{
"status": "success",
"data": {
"factCheckResult": "True/False",
"reason": "Explanation",
"sources": ["source1", "source2"]
}
}
5. CLASSIFIER API (https://api.jina.ai/v1/classify)
Purpose: Zero-shot/few-shot classification
Best for: Content categorization without training
Request:
curl -X POST "https://api.jina.ai/v1/classify" \
-H "Authorization: Bearer YOUR_JINA_TOKEN" \
-H "Accept: application/json" \
-d '{
"model": "jina-embeddings-v3",
"input": [{"text": "sent 1"}, {"text": "sent 2"}, {"text": "sent 3"}],
"labels": ["category1", "category2"]
}'
Response:
{
"usage": {
"total_tokens": 196
},
"data": [
{
"object": "classification",
"index": 0,
"prediction": "category1",
"score": 0.35
}
]
}
6. SEGMENTER API (https://segment.jina.ai)
Purpose: Tokenize and segment long text
Best for: Breaking down documents into chunks
Response Example:
{
"num_tokens": 78,
"tokenizer": "cl100k_base",
"usage": {"tokens": 0},
"num_chunks": 4,
"chunk_positions": [[3,55], [55,93], [93,110], [110,135]],
"chunks": [
"Chunk 1",
"Chunk 2",
"Chunk 3",
"Chunk 4"
]
}
INTEGRATION GUIDELINES:
Handle API errors and rate limits
Implement retries
Cache appropriately
Validate inputs
Handle multilingual content
ANTI-PATTERNS TO AVOID:
Don't chain APIs unnecessarily
Don't segment short text
Don't rerank without query-document pairs
Don't use grounding for open questions
WHAT THESE APIs CAN'T DO:
Generate new text/images
Modify/edit content
Execute code/calculations
Permanent storage
All APIs require:
Authorization: Bearer token (https://jina.ai/?sui=apikey)
Rate limit consideration (https://jina.ai/contact-sales#rate-limit)
Error handling