PaperPal

PaperPal is a tool for sorting and analyzing research papers based on your personal research interests. It's designed to be accessible and customizable, allowing users to adapt it for their specific needs.

Key Features

Integration with various language models (Llama 3.1, OpenAI, Anthropic, Gemini) for paper summarization and recommendation
Support for using different models for different tasks (judging, content extraction, newsletter generation) based on configuration
Automated paper downloads from Papers with Code
Saving outputs to a SQLite database
Automated email notifications with research digests in Markdown format
Customizable research interests
Embedding-based paper filtering with configurable similarity thresholds

Requirements

A machine with good computational resources if you are not using a LLM with an API. CPU and MPS are supported through Ollama. Check out the Ollama website for more information.
PyTorch 2.4+
CUDA 11.7+ (for GPU support)

Installation

Clone the repository:

git clone https://github.com/yourusername/PaperPal.git
cd PaperPal

Install the requirements:
```
pip install -r requirements.txt
```

Set up your environment variables:

Create a .env file in the project root and add your API keys:

ANTHROPIC_API_KEY=your_anthropic_key
OPENAI_API_KEY=your_openai_key
GOOGLE_API_KEY=your_google_api_key
GMAIL_SENDER_ADDRESS=your_gmail_address
GMAIL_APP_PASSWORD=your_gmail_app_password
OLLAMA_URL=http://localhost:11434

Configure Gmail:
- To use Gmail for sending emails, you need to set up an application password. Follow these instructions to create an app password for Gmail.
- Add your Gmail address and app password to the .env file as shown above.

Usage

The main script to run PaperPal is run_paperpal.py. You can run it with default settings or customize various parameters:

# Run with default settings
python run_paperpal.py

Running with Custom Parameters

python run_paperpal.py --n-days 14 --top-n 20 --model-name llama2

Available Arguments

--research-interests-path: Path to research interests file (default: "config/research_interests.txt")
--n-days: Number of days to look back for papers (default: 7)
--top-n: Number of top papers to return (default: 5)
--use-different-models: Use different models for different tasks (default: True)
--model-type: Type of model to use (default: "ollama")
--model-name: Name of the model to use (default: "hermes3")
--orchestration-config: Path to config for multiple models (default: "config/orchestration.json")
--embedding-model-name: Name of the embedding model (default: "Alibaba-NLP/gte-base-en-v1.5")
--trust-remote-code: Whether to trust remote code (default: True)
--receiver-address: Email address for notifications (default: None)
--max-new-tokens: Maximum number of new tokens to generate (default: 1024)
--temperature: Temperature for text generation (default: 0.1)
--cosine-similarity-threshold: Threshold for cosine similarity (default: 0.5)
--db-saving: Whether to save results to database (default: True)
--data-path: Path to the database file (default: "data/papers.db")
--verbose: Enable verbose output (default: True)
--start-date: Start date for paper retrieval (default: None)
--end-date: End date for paper retrieval (default: None)

Using Multiple Models

PaperPal supports the use of multiple models for different tasks. This is configured via the config/orchestration.json file and leveraged by the PaperPal class.

Configure `orchestration.json`

Define different models for specific tasks in the config/orchestration.json file:

{
    "judge_model": {
        "model_name": "hermes3",
        "model_type": "ollama",
        "max_new_tokens": 1024,
        "temperature": 0.1,
        "num_ctx": 4096
    },
    "newsletter_model": {
        "model_name": "hermes3",
        "model_type": "ollama",
        "max_new_tokens": 4096,
        "temperature": 0.1,
        "num_ctx": 131072
    },
    "content_extraction_model": {
        "model_name": "hermes3",
        "model_type": "ollama",
        "max_new_tokens": 4096,
        "temperature": 0.1,
        "num_ctx": 131072
    },
    "newsletter_sections_model": {
        "model_name": "qwen2.5:32b",
        "model_type": "ollama",
        "max_new_tokens": 4096,
        "temperature": 0.1,
        "num_ctx": 131072
    },
    "newsletter_intro_model": {
        "model_name": "qwen2.5:32b",
        "model_type": "ollama",
        "max_new_tokens": 4096,
        "temperature": 0.1,
        "num_ctx": 131072
    }
}

Each task can be configured with:

model_name: The specific model to use
model_type: The type of model ("ollama", "anthropic", "openai", or "gemini")
max_new_tokens: Maximum number of tokens to generate
temperature: Temperature for text generation
num_ctx: Context window size (optional, mainly for local models)

Task-Specific Models

PaperPal now supports different models for various tasks:

Judge Model: Evaluates papers against research interests
Newsletter Model: Generates the overall newsletter structure
Content Extraction Model: Extracts and summarizes content from papers
Newsletter Sections Model: Generates individual paper sections
Newsletter Intro Model: Creates the newsletter introduction

Email Formatting

PaperPal now supports Markdown formatting in email newsletters:

Headers and sections are properly formatted
Links are clickable
Text styling (bold, italic) is preserved

Configuration

Model Selection: Set the MODEL_TYPE environment variable to "ollama", "anthropic", "openai", or "gemini" to choose the default model type.
Orchestration: Configure different models for different tasks in config/orchestration.json.
Email: Configure Gmail API access or modify communication.py for other email providers.
Research Interests: Update config/research_interests.txt with your research interests.

Customization

Prompts: Modify prompts in paperpal/prompts.py to adjust the AI's behavior.
Inference: Add or modify inference methods in paperpal/inference.py for different models.
Model Orchestration: Configure different models for different tasks in config/orchestration.json.

Database

PaperPal uses SQLite to store paper information and generated newsletters. The database schema can be found in paperpal/data_handling.py.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgements

Papers with Code for providing the research paper data
Hugging Face for transformer models and tokenizers
Ollama for local model support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperPal

Key Features

Requirements

Installation

Usage

Running with Custom Parameters

Available Arguments

Using Multiple Models

Configure `orchestration.json`

Task-Specific Models

Email Formatting

Configuration

Customization

Database

Contributing

License

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
config		config
paperpal		paperpal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_paperpal.py		run_paperpal.py

License

dsmagic12/PaperPal

Folders and files

Latest commit

History

Repository files navigation

PaperPal

Key Features

Requirements

Installation

Usage

Running with Custom Parameters

Available Arguments

Using Multiple Models

Configure orchestration.json

Task-Specific Models

Email Formatting

Configuration

Customization

Database

Contributing

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Configure `orchestration.json`

Packages