mirror of
https://git.mirrors.martin98.com/https://github.com/mendableai/firecrawl
synced 2025-06-02 18:34:20 +08:00
Gemini 2.5 Web Extractor
A powerful web information extraction tool that combines Google's Gemini 2.5 Pro (Experimental) model with Firecrawl's web extraction capabilities to gather structured information about companies from the web.
Features
- Uses Google Search (via SerpAPI) to find relevant web pages
- Leverages Gemini 2.5 Pro (Experimental) to intelligently select the most relevant URLs
- Extracts structured information using Firecrawl's advanced web extraction
- Real-time progress monitoring and colorized console output
Prerequisites
- Python 3.8 or higher
- Google API Key (Gemini)
- Firecrawl API Key
- SerpAPI Key
Setup
- Clone the repository:
git clone <repository-url>
cd gemini-2.5-web-extractor
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
- Copy
.env.example
to.env
- Fill in your API keys in the
.env
file:GOOGLE_API_KEY
: Your Google API key for GeminiFIRECRAWL_API_KEY
: Your Firecrawl API keySERP_API_KEY
: Your SerpAPI key
- Copy
Usage
Run the script:
python gemini-2.5-web-extractor.py
The script will:
- Prompt you for a company name
- Ask what information you want to extract about the company
- Search for relevant web pages
- Use Gemini to select the most relevant URLs
- Extract structured information using Firecrawl
- Display the results in a formatted JSON output
Example
Enter the company name: Tesla
Enter what information you want about the company: latest electric vehicle models and their specifications
The script will then:
- Search for relevant Tesla information
- Select the most informative URLs about Tesla's current EV lineup
- Extract and structure the vehicle specifications
- Present the data in a clean, organized format
Error Handling
The script includes comprehensive error handling for:
- API failures
- Network issues
- Invalid responses
- Timeout scenarios
All errors are clearly displayed with colored output for better visibility.
License
[Add your license information here]