mirror of
https://git.mirrors.martin98.com/https://github.com/mendableai/firecrawl
synced 2025-06-20 09:02:11 +08:00
1.6 KiB
1.6 KiB
DeepSeek V3 Web Crawler
This script uses the DeepSeek V3 large language model (via Hugging Face's Inference API) and FireCrawl to crawl websites based on specific objectives.
Prerequisites
- Python 3.8+
- A FireCrawl API key (get one at FireCrawl's website)
- A Hugging Face API key with access to inference API
Installation
- Clone this repository:
git clone <repository-url>
cd <repository-directory>
- Install the required packages:
pip install -r requirements.txt
- Create a
.env
file in the root directory with your API keys:
FIRECRAWL_API_KEY=your_firecrawl_api_key
HUGGINGFACE_API_KEY=your_huggingface_api_key
Usage
Run the script:
python deepseek-v3-crawler.py
The script will prompt you to:
- Enter a website URL to crawl
- Enter your objective (what information you're looking for)
The script will then:
- Use DeepSeek V3 to generate optimal search parameters for the website
- Map the website to find relevant pages
- Crawl the most relevant pages to extract information based on your objective
- Output the results in JSON format if successful
Example
Input:
- Website: https://www.example.com
- Objective: Find information about their pricing plans
Output:
- The script will output structured JSON data containing the pricing information found on the website.
Notes
- The script uses DeepSeek V3, an advanced language model, to analyze web content.
- The model is accessed via Hugging Face's Inference API.
- You may need to adjust temperature or max_new_tokens parameters in the script based on your needs.