TinyRetriever: HTTP Requests Made Easy

TinyRetriever is a lightweight synchronous wrapper for AIOHTTP that abstracts away the complexities of making asynchronous HTTP requests. It is designed to be simple, easy to use, and efficient. TinyRetriever is built on top of AIOHTTP and AIOFiles, which are popular asynchronous HTTP client and file management libraries for Python.

📚 Full documentation is available here.

Features

TinyRetriever provides the following features:

Concurrent Downloads: Efficiently download multiple files simultaneously
Flexible Response Types: Get responses as text, JSON, or binary data
Rate Limiting: Built-in per-host connection limiting to respect server constraints
Streaming Support: Stream large files efficiently with customizable chunk sizes
Unique Filenames: Generate unique filenames based on query parameters
Works in Jupyter Notebooks: Easily use TinyRetriever in Jupyter notebooks without any additional setup or dependencies
Robust Error Handling: Optional status raising and comprehensive error messages
Performance Optimized: Uses orjson when available for up to 14x faster JSON parsing

TinyRetriever does not use nest-asyncio, instead it creates and manages a dedicated thread for running the event loop. This allows you to use TinyRetriever in Jupyter notebooks and other environments where the event loop is already running.

There are three main functions in TinyRetriever:

download: Download files concurrently;
fetch: Fetch queries concurrently and return responses as text, JSON, or binary;
unique_filename: Generate unique filenames based on query parameters.

Installation

Choose your preferred installation method:

Using `pip`

pip install tiny-retriever

Using `micromamba`

micromamba install -c conda-forge tiny-retriever

Alternatively, you can use conda or mamba.

Quick Start Guide

Please refer to the documentation for detailed usage instructions and more elaborate examples.

Downloading Files

from pathlib import Path
import tiny_retriever as terry

urls = ["https://example.com/file1.pdf", "https://example.com/file2.pdf"]
paths = [Path("downloads/file1.pdf"), Path("downloads/file2.pdf")]
# or generate unique filenames
paths = (terry.unique_filename(u) for u in urls)
paths = [Path("downloads", p) for p in paths]

# Download files concurrently
terry.download(urls, paths)

Fetching Data

urls = ["https://api.example.com/data1", "https://api.example.com/data2"]

# Get JSON responses
json_responses = terry.fetch(urls, "json")

# Get text responses
text_responses = terry.fetch(urls, "text")

# Get binary responses
binary_responses = terry.fetch(urls, "binary")

Generate Unique Filenames

url = "https://api.example.com/data"
params = {"key": "value"}

# Generate unique filename based on URL and parameters
filename = terry.unique_filename(url, params=params, file_extension=".json")

Advanced Usage

Custom Request Parameters

Note that you can also pass a single url and a dictionary of request parameters to the fetch function. The default network related parameters are conservative and can be modified as needed.

urls = "https://api.example.com/data"
kwargs = {"headers": {"Authorization": "Bearer token"}}

responses = terry.fetch(
    urls,
    return_type="json",
    request_method="post",
    request_kwargs=kwargs,
    limit_per_host=2,
    timeout=30,
)

Error Handling

from tiny_retriever import fetch, ServiceError

try:
    responses = fetch(urls, return_type="json", raise_status=True)
except ServiceError as e:
    print(f"Request failed: {e}")

Configuration

TinyRetriever can be configured through environment variables:

MAX_CONCURRENT_CALLS: Maximum number of concurrent requests (default: 10)
Default chunk size for downloads: 1MB
Default timeout: 5 minutes
Default connections per host: 4

Contributing

We welcome contributions! Please see the contributing section for guidelines and instructions.

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
docs		docs
src/tiny_retriever		src/tiny_retriever
tests		tests
.codecov.yml		.codecov.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyRetriever: HTTP Requests Made Easy

Features

Installation

Using `pip`

Using `micromamba`

Quick Start Guide

Downloading Files

Fetching Data

Generate Unique Filenames

Advanced Usage

Custom Request Parameters

Error Handling

Configuration

Contributing

License

About

Releases 4

Languages

License

cheginit/tiny-retriever

Folders and files

Latest commit

History

Repository files navigation

TinyRetriever: HTTP Requests Made Easy

Features

Installation

Using pip

Using micromamba

Quick Start Guide

Downloading Files

Fetching Data

Generate Unique Filenames

Advanced Usage

Custom Request Parameters

Error Handling

Configuration

Contributing

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Languages

Using `pip`

Using `micromamba`