# Gemini 2.5 Web Crawler A powerful web crawler that uses Google's Gemini 2.5 Pro model to intelligently analyze web content, PDFs, and images based on user-defined objectives. ## Features - Intelligent URL mapping and ranking based on relevance to search objective - PDF content extraction and analysis - Image content analysis and description - Smart content filtering based on user objectives - Support for multiple content types (markdown, PDFs, images) - Color-coded console output for better readability ## Prerequisites - Python 3.8+ - Google Cloud API key with Gemini API access - Firecrawl API key ## Installation 1. Clone the repository: ```bash git clone cd ``` 2. Install the required dependencies: ```bash pip install -r requirements.txt ``` 3. Create a `.env` file based on `.env.example`: ```bash cp .env.example .env ``` 4. Add your API keys to the `.env` file: ``` FIRECRAWL_API_KEY=your_firecrawl_api_key GEMINI_API_KEY=your_gemini_api_key ``` ## Usage Run the script: ```bash python gemini-2.5-crawler.py ``` The script will prompt you for: 1. The website URL to crawl 2. Your search objective The crawler will then: 1. Map the website and find relevant pages 2. Analyze the content using Gemini 2.5 Pro 3. Extract and analyze any PDFs or images found 4. Return structured information related to your objective ## Output The script provides color-coded console output for: - Process steps and progress - Debug information - Success and error messages - Final results in JSON format ## Error Handling The script includes comprehensive error handling for: - API failures - Content extraction issues - Invalid URLs - Timeouts - JSON parsing errors ## Note This script uses the experimental Gemini 2.5 Pro model (`gemini-2.5-pro-exp-03-25`). Make sure you have appropriate access and quota for using this model.