Firecrawl Scheduled Action
ActionsTags
(2)This Nextjs app aims to provide a modern web interface for crawling documentation and processing it for LLM use. Use the output markdown
, xml
, or zip
files to build knowledge files to copy over to a vector database, a ChatGPT GPT, an OpenAI Assistant, Claude Artifacts, Vapi.ai, Aimdoc, or any other LLM tool.
The Next app generates a .md file, .xml file, or .zip of markdown files ready for LLM consumption, inspired by the devdocs-to-llm Jupyter notebook by Alex Fazio.
- 🌐 Serverless architecture using
Firecrawl API
v1 - ⚡ Real-time crawl status updates
- 🎨 Modern UI with dark/light mode support
- 📂 Crawl History using
Local Storage
- 💥 Github Action defined to manually run crawl function and commit to /knowledge_bases folder
Use the Github Action template to define automations. Leverage Github Actions cron to schedule crawls for a given site and commit markdown file directly to repo.
FirecrawlAction.1.mp4
Scheduled Crawl (available on Github Marketplace)
Add this to any Github Repo to start crawling on a schedule. It will commit the output results automatically after crawling to a specified folder. Default is to crawl Hacker News
everyday at midnight and store results in the /knowledge_bases
folder.
name: Scheduled Crawl Action
# This workflow will automatically crawl the specified URL on a schedule and commit the results to your repository.
on:
schedule:
- cron: '0 0 * * *' # Replace with the cron expression for the schedule you want to use (e.g., '0 0 * * *' for daily at midnight UTC)
workflow_dispatch: # Allow manual triggering
jobs:
crawl:
runs-on: ubuntu-latest
permissions:
contents: write
id-token: write
actions: read
steps:
- uses: actions/checkout@v4
- name: Firecrawl Scheduled Action
uses: cameronking4/nextjs-firecrawl-starter@v1.0.0
with:
url: 'https://news.ycombinator.com' # Replace with the URL you want to crawl regularly
output_folder: 'knowledge_bases' # Replace with the folder name where the output commits will be saved
api_url: 'https://nextjs-firecrawl-starter.vercel.app' # Replace with the API URL of your Firecrawl API endpoint, this is the default URL for the starter app.
You can use this project to serve endpoints for your LLM tools. In ChatGPT, you can click Create a GPT
and then Create Action
to allow your GPT to call the Firecrawl API endpoints and return results in chat.
Add the Firecrawl actions to your GPT by copying and pasting this import URL in the Configure Tab:
https://nextjs-firecrawl-starter.vercel.app/api/openapi
This URL is defined and can be edited in the /api/openapi/route.ts
file.
- Framework: Next.js 15.1.4
- Styling: Tailwind CSS
- UI Components:
- Radix UI primitives
- Shadcn/ui components
- State Management: React Hook Form
- Animations: Framer Motion & Rombo
- Development: TypeScript
- API Routes: Firecrawl API Key & Next.js App Router
The application uses Next.js App Router API routes for serverless functionality:
/api/crawl/route.ts
- Initiates a new crawl job/api/crawl/status/[id]/route.ts
- Gets the status of an ongoing crawl/api/map/route.ts
- Generates site maps/api/scrape/route.ts
- Handles individual page scraping
- Clone the repository
- Install dependencies:
npm install
# or
yarn install
# or
pnpm install
- Create a
.env
file with your Firecrawl API key:
FIRECRAWL_API_KEY=your_api_key_here
- Run the development server:
npm run dev
# or
yarn dev
# or
pnpm dev
- Open http://localhost:3000 in your browser
- UI & Components:
- @radix-ui/* - Headless UI components
- class-variance-authority - Component variants
- tailwind-merge - Tailwind class merging
- lucide-react - Icons
- next-themes - Theme management
- framer-motion - Animations
- rombo - Animations
- Forms & Validation:
- react-hook-form - Form handling
- @hookform/resolvers - Form validation
- zod - Schema validation
- Crawling: Uses Firecrawl API to crawl documentation sites and generate sitemaps
- Processing: Extracts content and converts it to markdown format
- Status Tracking: Real-time updates on crawl progress
- Results: Displays processed content ready for LLM consumption
This project is a Next.js implementation inspired by the devdocs-to-llm Jupyter notebook by Alex Fazio. The original project demonstrated how to use Firecrawl API to crawl developer documentation and prepare it for LLM use.
The original Jupyter notebook implementation provides:
- Documentation crawling with Firecrawl API
- Content extraction and markdown conversion
- Export capabilities to Rentry.co and Google Docs
This Next.js version builds upon these capabilities by:
- Adding a modern web interface
- Implementing real-time crawl status tracking
- Providing a serverless architecture for Firecrawl API processing
- Adding dark/light theme support
- Making the tool more accessible through a user-friendly UI deployed on Vercel
- Github Actions for manual and scheduled scraping
- OpenAPI Specification for LLM Tool Calling
See the LICENSE file for details.
Firecrawl Scheduled Action is not certified by GitHub. It is provided by a third-party and is governed by separate terms of service, privacy policy, and support documentation.