Update README.md

2025-08-01 10:11:59 +08:00 · 2024-08-28 13:03:07 -03:00 · 2024-08-28 13:03:07 -03:00 · 3039cc264f
commit 3039cc264f
parent 4003d37fbc
1 changed files with 171 additions and 129 deletions
--- a/README.md
+++ b/README.md
@ -6,7 +6,7 @@ _This repository is in its early development stages. We are still merging custom
 ## What is Firecrawl?
-[Firecrawl](https://firecrawl.dev?ref=github) is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap required.
+[Firecrawl](https://firecrawl.dev?ref=github) is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap required. Check out our [documentation](https://docs.firecrawl.dev).
 _Pst. hey, you, join our stargazers :)_
@ -41,18 +41,26 @@ To use the API, you need to sign up on [Firecrawl](https://firecrawl.dev) and ge
 Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.
 ```bash
-curl -X POST https://api.firecrawl.dev/v0/crawl \
+curl -X POST https://api.firecrawl.dev/v1/crawl \
    -H 'Content-Type: application/json' \
-    -H 'Authorization: Bearer YOUR_API_KEY' \
+    -H 'Authorization: Bearer fc-YOUR_API_KEY' \
    -d '{
-      "url": "https://mendable.ai"
+      "url": "https://docs.firecrawl.dev",
      "limit": 100,
      "scrapeOptions": {
        "formats": ["markdown", "html"]
      }
    }'
 ```
-Returns a jobId
+Returns a crawl job id and the url to check the status of the crawl.
 ```json
-{ "jobId": "1234-5678-9101" }
+{
  "success": true,
  "id": "123-456-789",
  "url": "https://api.firecrawl.dev/v1/crawl/123-456-789"
 }
 ```
 ### Check Crawl Job
@ -60,7 +68,7 @@ Returns a jobId
 Used to check the status of a crawl job and get its result.
 ```bash
-curl -X GET https://api.firecrawl.dev/v0/crawl/status/1234-5678-9101 \
+curl -X GET https://api.firecrawl.dev/v1/crawl/123-456-789 \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY'
 ```
@ -68,18 +76,20 @@ curl -X GET https://api.firecrawl.dev/v0/crawl/status/1234-5678-9101 \
 ```json
 {
  "status": "completed",
-  "current": 22,
+  "totalCount": 36,
-  "total": 22,
+  "creditsUsed": 36,
  "expiresAt": "2024-00-00T00:00:00.000Z",
  "data": [
    {
-      "content": "Raw Content ",
+      "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...",
-      "markdown": "# Markdown Content",
+      "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...",
      "provider": "web-scraper",
      "metadata": {
-        "title": "Mendable | AI for CX and Sales",
+        "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
-        "description": "AI for CX and Sales",
+        "language": "en",
-        "language": null,
+        "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
-        "sourceURL": "https://www.mendable.ai/"
+        "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
        "ogLocaleAlternate": [],
        "statusCode": 200
      }
    }
  ]
@ -88,14 +98,15 @@ curl -X GET https://api.firecrawl.dev/v0/crawl/status/1234-5678-9101 \
 ### Scraping
-Used to scrape a URL and get its content.
+Used to scrape a URL and get its content in the specified formats.
 ```bash
-curl -X POST https://api.firecrawl.dev/v0/scrape \
+curl -X POST https://api.firecrawl.dev/v1/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
-      "url": "https://mendable.ai"
+      "url": "https://docs.firecrawl.dev",
      "formats" : ["markdown", "html"]
    }'
 ```
@ -105,55 +116,83 @@ Response:
 {
  "success": true,
  "data": {
-    "content": "Raw Content ",
+    "markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...",
-    "markdown": "# Markdown Content",
+    "html": "<!DOCTYPE html><html lang=\"en\" class=\"light\" style=\"color-scheme: light;\"><body class=\"__variable_36bd41 __variable_d7dc5d font-inter ...",
    "provider": "web-scraper",
    "metadata": {
-      "title": "Mendable | AI for CX and Sales",
+      "title": "Home - Firecrawl",
-      "description": "AI for CX and Sales",
+      "description": "Firecrawl crawls and converts any website into clean markdown.",
-      "language": null,
+      "language": "en",
-      "sourceURL": "https://www.mendable.ai/"
+      "keywords": "Firecrawl,Markdown,Data,Mendable,Langchain",
      "robots": "follow, index",
      "ogTitle": "Firecrawl",
      "ogDescription": "Turn any website into LLM-ready data.",
      "ogUrl": "https://www.firecrawl.dev/",
      "ogImage": "https://www.firecrawl.dev/og.png?123",
      "ogLocaleAlternate": [],
      "ogSiteName": "Firecrawl",
      "sourceURL": "https://firecrawl.dev",
      "statusCode": 200
    }
  }
 }
 ```
-### Search (Beta)
+### Map (Alpha)
-Used to search the web, get the most relevant results, scrape each page and return the markdown.
+Used to map a URL and get urls of the website. This returns most links present on the website.
-```bash
+```bash cURL
-curl -X POST https://api.firecrawl.dev/v0/search \
+curl -X POST https://api.firecrawl.dev/v1/map \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
-      "query": "firecrawl",
+      "url": "https://firecrawl.dev"
      "pageOptions": {
        "fetchPageContent": true // false for a fast serp api
      }
    }'
 ```
 Response:
 ```json
 {
-  "success": true,
+  "status": "success",
-  "data": [
+  "links": [
-    {
+    "https://firecrawl.dev",
-      "url": "https://mendable.ai",
+    "https://www.firecrawl.dev/pricing",
-      "markdown": "# Markdown Content",
+    "https://www.firecrawl.dev/blog",
-      "provider": "web-scraper",
+    "https://www.firecrawl.dev/playground",
-      "metadata": {
+    "https://www.firecrawl.dev/smart-crawl",
        "title": "Mendable | AI for CX and Sales",
        "description": "AI for CX and Sales",
        "language": null,
        "sourceURL": "https://www.mendable.ai/"
      }
    }
  ]
 }
 ```
-### Intelligent Extraction (Beta)
+#### Map with search
 Map with `search` param allows you to search for specific urls inside a website.
 ```bash cURL
 curl -X POST https://api.firecrawl.dev/v1/map \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "url": "https://firecrawl.dev",
      "search": "docs"
    }'
 ```
 Response will be an ordered list from the most relevant to the least relevant.
 ```json
 {
  "status": "success",
  "links": [
    "https://docs.firecrawl.dev",
    "https://docs.firecrawl.dev/sdks/python",
    "https://docs.firecrawl.dev/learn/rag-llama3",
  ]
 }
 ```
 ### LLM Extraction (v0) (Beta)
 Used to extract structured data from scraped pages.
@ -220,6 +259,42 @@ curl -X POST https://api.firecrawl.dev/v0/scrape \
 }
 ```
 ### Search (v0) (Beta)
 Used to search the web, get the most relevant results, scrape each page and return the markdown.
 ```bash
 curl -X POST https://api.firecrawl.dev/v0/search \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "query": "firecrawl",
      "pageOptions": {
        "fetchPageContent": true // false for a fast serp api
      }
    }'
 ```
 ```json
 {
  "success": true,
  "data": [
    {
      "url": "https://mendable.ai",
      "markdown": "# Markdown Content",
      "provider": "web-scraper",
      "metadata": {
        "title": "Mendable | AI for CX and Sales",
        "description": "AI for CX and Sales",
        "language": null,
        "sourceURL": "https://www.mendable.ai/"
      }
    }
  ]
 }
 ```
 ## Using Python SDK
 ### Installing Python SDK
@ -231,24 +306,28 @@ pip install firecrawl-py
 ### Crawl a website
 ```python
-from firecrawl import FirecrawlApp
+from firecrawl.firecrawl import FirecrawlApp
-app = FirecrawlApp(api_key="YOUR_API_KEY")
+app = FirecrawlApp(api_key="fc-YOUR_API_KEY")
-crawl_result = app.crawl_url('mendable.ai', {'crawlerOptions': {'excludes': ['blog/*']}})
+# Scrape a website:
 scrape_status = app.scrape_url(
  'https://firecrawl.dev', 
  params={'formats': ['markdown', 'html']}
 )
 print(scrape_status)
-# Get the markdown
+# Crawl a website:
-for result in crawl_result:
+crawl_status = app.crawl_url(
-    print(result['markdown'])
+  'https://firecrawl.dev', 
-```
+  params={
-
+    'limit': 100, 
-### Scraping a URL
+    'scrapeOptions': {'formats': ['markdown', 'html']}
-
+  }, 
-To scrape a single URL, use the `scrape_url` method. It takes the URL as a parameter and returns the scraped data as a dictionary.
+  wait_until_done=True, 
-
+  poll_interval=30
-```python
+)
-url = 'https://example.com'
+print(crawl_status)
 scraped_data = app.scrape_url(url)
 ```
 ### Extracting structured data from a URL
@ -256,6 +335,11 @@ scraped_data = app.scrape_url(url)
 With LLM extraction, you can easily extract structured data from any URL. We support pydantic schemas to make it easier for you too. Here is how you to use it:
 ```python
 from firecrawl.firecrawl import FirecrawlApp
 app = FirecrawlApp(api_key="fc-YOUR_API_KEY", version="v0")
 class ArticleSchema(BaseModel):
    title: str
    points: int
@ -277,15 +361,6 @@ data = app.scrape_url('https://news.ycombinator.com', {
 print(data["llm_extraction"])
 ```
 ### Search for a query
 Performs a web search, retrieve the top results, extract data from each page, and returns their markdown.
 ```python
 query = 'What is Mendable?'
 search_result = app.search(query)
 ```
 ## Using the Node SDK
 ### Installation
@ -301,54 +376,33 @@ npm install @mendable/firecrawl-js
 1. Get an API key from [firecrawl.dev](https://firecrawl.dev)
 2. Set the API key as an environment variable named `FIRECRAWL_API_KEY` or pass it as a parameter to the `FirecrawlApp` class.
 ### Scraping a URL
 To scrape a single URL with error handling, use the `scrapeUrl` method. It takes the URL as a parameter and returns the scraped data as a dictionary.
 ```js
-try {
+import FirecrawlApp, { CrawlParams, CrawlStatusResponse } from '@mendable/firecrawl-js';
-  const url = "https://example.com";
+
-  const scrapedData = await app.scrapeUrl(url);
+const app = new FirecrawlApp({apiKey: "fc-YOUR_API_KEY"});
-  console.log(scrapedData);
+
-} catch (error) {
+// Scrape a website
-  console.error("Error occurred while scraping:", error.message);
+const scrapeResponse = await app.scrapeUrl('https://firecrawl.dev', {
  formats: ['markdown', 'html'],
 });
 if (scrapeResponse) {
  console.log(scrapeResponse)
 }
 // Crawl a website
 const crawlResponse = await app.crawlUrl('https://firecrawl.dev', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown', 'html'],
  }
 } as CrawlParams, true, 30) as CrawlStatusResponse;
 if (crawlResponse) {
  console.log(crawlResponse)
 }
 ```
 ### Crawling a Website
 To crawl a website with error handling, use the `crawlUrl` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.
 ```js
 const crawlUrl = "https://example.com";
 const params = {
  crawlerOptions: {
    excludes: ["blog/"],
    includes: [], // leave empty for all pages
    limit: 1000,
  },
  pageOptions: {
    onlyMainContent: true,
  },
 };
 const waitUntilDone = true;
 const timeout = 5;
 const crawlResult = await app.crawlUrl(
  crawlUrl,
  params,
  waitUntilDone,
  timeout
 );
 ```
 ### Checking Crawl Status
 To check the status of a crawl job with error handling, use the `checkCrawlStatus` method. It takes the job ID as a parameter and returns the current status of the crawl job.
 ```js
 const status = await app.checkCrawlStatus(jobId);
 console.log(status);
 ```
 ### Extracting structured data from a URL
@ -360,6 +414,7 @@ import { z } from "zod";
 const app = new FirecrawlApp({
  apiKey: "fc-YOUR_API_KEY",
  version: "v0"
 });
 // Define schema to extract contents into
@ -384,19 +439,6 @@ const scrapeResult = await app.scrapeUrl("https://news.ycombinator.com", {
 console.log(scrapeResult.data["llm_extraction"]);
 ```
 ### Search for a query
 With the `search` method, you can search for a query in a search engine and get the top results along with the page content for each result. The method takes the query as a parameter and returns the search results.
 ```js
 const query = "what is mendable?";
 const searchResults = await app.search(query, {
  pageOptions: {
    fetchPageContent: true, // Fetch the page content for each search result
  },
 });
 ```
 ## Contributing
 We love contributions! Please read our [contributing guide](CONTRIBUTING.md) before submitting a pull request.