Welcome to NounLogic Summariser Lib – your ultimate tool for intelligent and efficient text summarization! Whether you're dealing with lengthy articles, extensive reports, or any substantial text content, our library simplifies the process by breaking down text into manageable chunks and leveraging the power of the Ollama model to generate concise summaries. Perfect for developers, researchers, and anyone looking to streamline their text processing workflows.
- Multi-Format Support: Seamlessly handle
.txt
,.md
,.pdf
,.xlsx
, and.docx
files. - Smart Sanitization: Automatically cleans and prepares text by removing non-understandable characters.
- Dynamic Chunking: Breaks down large texts into customizable token-sized chunks for efficient processing.
- Configurable Summarization: Fully adjustable settings via
config.json
or CLI commands to tailor the summarization process. - Ollama Integration: Utilizes the Ollama library to interact with locally running models for high-quality summaries.
- Extensible Output: Save summaries in your preferred format, including
.txt
and.pdf
. - Robust Logging & Error Handling: Comprehensive logging to monitor processes and handle errors gracefully.
- User-Friendly CLI: Easy-to-use command-line interface for quick operations and configurations.
- Python 3.8+
- Pip package manager
- Ollama installed and running locally
-
Clone the Repository
git clone https://github.com/yourusername/nounlogic-summariser-lib.git cd nounlogic-summariser-lib
-
Install Dependencies
pip install -r requirements.txt
-
Set Up Console Scripts
Uncomment the
console_scripts
section insetup.cfg
if not already enabled:[options.entry_points] console_scripts = summariser = nounlogic_summariser_lib.skeleton:run
-
Install the Package
pip install . # or for editable mode pip install -e .
-
Verify Installation
summariser --help
NounLogic Summariser Lib can be used both as a Python library and a CLI tool.
from nounlogic_summariser_lib import summariser
# Load configuration
config = summariser.load_config('config.json')
# Process and summarize a file
summariser.process_file('path/to/your/file.txt', config)
summariser summarize path/to/your/file.md
-
Specify Configuration File
summariser summarize path/to/file.pdf --config custom_config.json
-
Enable Verbose Logging
summariser summarize path/to/file.txt -v
-
Summarize a PDF File
summariser summarize documents/report.pdf
This command converts
report.pdf
to Markdown, sanitizes the text, breaks it into chunks, summarizes each chunk using the Ollama model, and saves the summary asreport_summarised.txt
in thesummarized_outputs
directory. -
Customize Summarization Parameters
Modify
config.json
to adjust token limits, prompts, and output settings to fit your specific needs.{ "token_limit": 1500, "prompt_template": "Please provide a concise summary of the following text:", ... }
-
Handle Errors Gracefully
If an error occurs during processing, it will be logged to
errors.log
, and the tool will continue processing remaining chunks ifcontinue_on_error
is set totrue
.
All settings are managed via the config.json
file. Ensure that no settings are hardcoded to allow maximum flexibility.
{
"token_limit": 1000,
"prompt_template": "Generate a summary of the following text:",
"ollama": {
"model": "tinyllama",
"host": "localhost",
"port": 11434,
"timeout": 30,
"retry_attempts": 3
},
"output": {
"suffix": "_summarised.txt",
"directory": "summarized_outputs"
},
...
}
- Input:
.txt
,.md
,.pdf
,.xlsx
,.docx
- Output:
.txt
,.pdf
- Logging: Detailed logs are saved to
summarizer.log
with configurable log levels. - Error Handling: Errors are recorded in
errors.log
, and the tool can be set to continue processing despite errors.
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.
- Fork the repository
- Create your feature branch (
git checkout -b feature/YourFeature
) - Commit your changes (
git commit -m 'Add some feature'
) - Push to the branch (
git push origin feature/YourFeature
) - Open a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For any inquiries or support, please contact nathfavour.
Happy summarizing! 🚀