mirror of
https://git.mirrors.martin98.com/https://github.com/mendableai/firecrawl
synced 2025-04-18 12:09:42 +08:00
Gemini 2.5 Web Crawler
A powerful web crawler that uses Google's Gemini 2.5 Pro model to intelligently analyze web content, PDFs, and images based on user-defined objectives.
Features
- Intelligent URL mapping and ranking based on relevance to search objective
- PDF content extraction and analysis
- Image content analysis and description
- Smart content filtering based on user objectives
- Support for multiple content types (markdown, PDFs, images)
- Color-coded console output for better readability
Prerequisites
- Python 3.8+
- Google Cloud API key with Gemini API access
- Firecrawl API key
Installation
- Clone the repository:
git clone <your-repo-url>
cd <your-repo-directory>
- Install the required dependencies:
pip install -r requirements.txt
- Create a
.env
file based on.env.example
:
cp .env.example .env
- Add your API keys to the
.env
file:
FIRECRAWL_API_KEY=your_firecrawl_api_key
GEMINI_API_KEY=your_gemini_api_key
Usage
Run the script:
python gemini-2.5-crawler.py
The script will prompt you for:
- The website URL to crawl
- Your search objective
The crawler will then:
- Map the website and find relevant pages
- Analyze the content using Gemini 2.5 Pro
- Extract and analyze any PDFs or images found
- Return structured information related to your objective
Output
The script provides color-coded console output for:
- Process steps and progress
- Debug information
- Success and error messages
- Final results in JSON format
Error Handling
The script includes comprehensive error handling for:
- API failures
- Content extraction issues
- Invalid URLs
- Timeouts
- JSON parsing errors
Note
This script uses the experimental Gemini 2.5 Pro model (gemini-2.5-pro-exp-03-25
). Make sure you have appropriate access and quota for using this model.