GPT-4.1 Web Crawler

A smart web crawler powered by GPT-4.1 that intelligently searches websites to find specific information based on user objectives.

Features

Clone this repository:

git clone https://github.com/yourusername/gpt-4.1-web-crawler.git
cd gpt-4.1-web-crawler

Install the required dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:
```
cp .env.example .env
```
Then edit the .env file and add your API keys.

Run the script:

python gpt-4.1-web-crawler.py

The program will prompt you for:

Example:

Enter the website to crawl: https://example.com
Enter your objective: Find the company's leadership team with their roles and short bios

The crawler will then:

Mapping: The crawler uses Firecrawl to map the website structure and find relevant pages based on search terms derived from your objective.
Ranking: GPT-4.1 analyzes the URLs to determine which pages are most likely to contain the information you're looking for.
Extraction: The top pages are scraped and analyzed to extract the specific information requested in your objective.
Results: If found, the information is returned in a clean, structured JSON format.

Contributions are welcome! Please feel free to submit a Pull Request.