firecrawl/examples/gemini-2.5-web-extractor

Gemini 2.5 Web Extractor

A powerful web information extraction tool that combines Google's Gemini 2.5 Pro (Experimental) model with Firecrawl's web extraction capabilities to gather structured information about companies from the web.

Features

  • Uses Google Search (via SerpAPI) to find relevant web pages
  • Leverages Gemini 2.5 Pro (Experimental) to intelligently select the most relevant URLs
  • Extracts structured information using Firecrawl's advanced web extraction
  • Real-time progress monitoring and colorized console output

Prerequisites

  • Python 3.8 or higher
  • Google API Key (Gemini)
  • Firecrawl API Key
  • SerpAPI Key

Setup

  1. Clone the repository:
git clone <repository-url>
cd gemini-2.5-web-extractor
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
    • Copy .env.example to .env
    • Fill in your API keys in the .env file:
      • GOOGLE_API_KEY: Your Google API key for Gemini
      • FIRECRAWL_API_KEY: Your Firecrawl API key
      • SERP_API_KEY: Your SerpAPI key

Usage

Run the script:

python gemini-2.5-web-extractor.py

The script will:

  1. Prompt you for a company name
  2. Ask what information you want to extract about the company
  3. Search for relevant web pages
  4. Use Gemini to select the most relevant URLs
  5. Extract structured information using Firecrawl
  6. Display the results in a formatted JSON output

Example

Enter the company name: Tesla
Enter what information you want about the company: latest electric vehicle models and their specifications

The script will then:

  1. Search for relevant Tesla information
  2. Select the most informative URLs about Tesla's current EV lineup
  3. Extract and structure the vehicle specifications
  4. Present the data in a clean, organized format

Error Handling

The script includes comprehensive error handling for:

  • API failures
  • Network issues
  • Invalid responses
  • Timeout scenarios

All errors are clearly displayed with colored output for better visibility.

License

[Add your license information here]