mirror of
https://git.mirrors.martin98.com/https://github.com/mendableai/firecrawl
synced 2025-04-18 12:09:42 +08:00
DeepSeek V3 Web Crawler
This script uses the DeepSeek V3 large language model (via Hugging Face's Inference API) and FireCrawl to crawl websites based on specific objectives.
Prerequisites
- Python 3.8+
- A FireCrawl API key (get one at FireCrawl's website)
- A Hugging Face API key with access to inference API
Installation
- Clone this repository:
git clone <repository-url>
cd <repository-directory>
- Install the required packages:
pip install -r requirements.txt
- Create a
.env
file in the root directory with your API keys:
FIRECRAWL_API_KEY=your_firecrawl_api_key
HUGGINGFACE_API_KEY=your_huggingface_api_key
Usage
Run the script:
python deepseek-v3-crawler.py
The script will prompt you to:
- Enter a website URL to crawl
- Enter your objective (what information you're looking for)
The script will then:
- Use DeepSeek V3 to generate optimal search parameters for the website
- Map the website to find relevant pages
- Crawl the most relevant pages to extract information based on your objective
- Output the results in JSON format if successful
Example
Input:
- Website: https://www.example.com
- Objective: Find information about their pricing plans
Output:
- The script will output structured JSON data containing the pricing information found on the website.
Notes
- The script uses DeepSeek V3, an advanced language model, to analyze web content.
- The model is accessed via Hugging Face's Inference API.
- You may need to adjust temperature or max_new_tokens parameters in the script based on your needs.