klassijs SmartOCR

SmartOCR is a powerful OCR (Optical Character Recognition) module designed for text extraction and interaction. It enables you to extract text from PDFs and images seamlessly and even interact with links embedded in those documents.

Features

Text Extraction: Extract text from images and PDF files with high accuracy.
Interactive Links: Detect and interact with clickable links in PDFs and images.
Multi-Format Support: Compatible with a variety of image formats (e.g., PNG, JPEG) and PDF files.
Efficient Processing: Optimized for speed and scalability in OCR operations.

Installation

Install SmartOCR via pnpm:

pnpm add klassijs-smart-ocr

Usage

Here is an example to get started with SmartOCR:

const SmartOCR = require('klassijs-smart-ocr');

(async () => {
    const ocr = new SmartOCR();

    // Extract text from an image
    const text = await ocr.extractTextFromImage('path/to/image.jpg');
    console.log('Extracted Text:', text);

    // Extract text from a PDF
    const pdfText = await ocr.extractTextFromPDF('path/to/document.pdf');
    console.log('Extracted PDF Text:', pdfText);

    // Detect and click links in an image
    const links = await ocr.getLinksFromImage('path/to/image.jpg');
    console.log('Detected Links:', links);

    if (links.length > 0) {
        const result = await ocr.clickLink(links[0]);
        console.log('Link Click Result:', result);
    }
})();

API Reference

`extractTextFromImage(imagePath)`

Extracts text from a given image.

Parameters:
- imagePath (string): Path to the image file.
Returns:
- (Promise) Extracted text.

`extractTextFromPDF(pdfPath)`

Extracts text from a given PDF file.

Parameters:
- pdfPath (string): Path to the PDF file.
Returns:
- (Promise) Extracted text.

`getLinksFromImage(imagePath)`

Detects clickable links within an image.

Parameters:
- imagePath (string): Path to the image file.
Returns:
- (Promise<string[]>) Array of detected links.

`getLinksFromPDF(pdfPath)`

Detects clickable links within a PDF file.

Parameters:
- pdfPath (string): Path to the PDF file.
Returns:
- (Promise<string[]>) Array of detected links.

`clickLink(link)`

Simulates a click on the given link.

Parameters:
- link (string): URL to interact with.
Returns:
- (Promise) Result of the interaction.
  
  Requirements
  
  Node.js 14 or later
  
  Dependencies:
  
  tesseract.js (for OCR)
  
  pdf-lib (for PDF parsing)
  
  Contributing
  
  Contributions are welcome! If you have ideas for improvements or new features, feel free to open an issue or submit a pull request.
  
  Fork the repository.
  
  Create a new branch.
  
  Make your changes and commit them.
  
  Submit a pull request.
  
  License
  
  SmartOCR is open-source software licensed under the MIT License.
  
  Acknowledgments
  
  Special thanks to the developers of tesseract.js and pdf-lib for making OCR and PDF processing seamless.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

klassijs SmartOCR

Features

Installation

Usage

API Reference

`extractTextFromImage(imagePath)`

`extractTextFromPDF(pdfPath)`

`getLinksFromImage(imagePath)`

`getLinksFromPDF(pdfPath)`

`clickLink(link)`

Requirements

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

klassijs/klassijs-smart-ocr

Folders and files

Latest commit

History

Repository files navigation

klassijs SmartOCR

Features

Installation

Usage

API Reference

extractTextFromImage(imagePath)

extractTextFromPDF(pdfPath)

getLinksFromImage(imagePath)

getLinksFromPDF(pdfPath)

clickLink(link)

Requirements

Contributing

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`extractTextFromImage(imagePath)`

`extractTextFromPDF(pdfPath)`

`getLinksFromImage(imagePath)`

`getLinksFromPDF(pdfPath)`

`clickLink(link)`

Packages