3475 Commits

Author SHA1 Message Date
Gergő Móricz
f41af8241e fix(scrapeURL/pdf): better timeout error 2025-05-23 13:59:53 +02:00
Gergő Móricz
bfe731309c fix(scrapeURL/pdf/mu): remove log 2025-05-23 13:47:34 +02:00
Gergő Móricz
b03670a8b7
feat: parse PDFs on fc side and reject if too long for timeout (FIR-2083) (#1592)
* feat: pdf-parser, implementation in scrapeURL

* use pdf-parser for page count instead of mu

* fix(pdf-parser): bindings

* feat(scrapeURL/pdf): adjust MILLISECONDS_PER_PAGE

* implement post-runsync polling and fix

* fix(Dockerfile): copy in the pdf-parser source code

* fix(scrapeURL/pdf): better error for timeout below 0
2025-05-23 13:45:53 +02:00
Gergő Móricz
321fff1695 ok what 2025-05-23 11:41:34 +02:00
Gergő Móricz
00cc733972 more logs 2025-05-23 11:29:34 +02:00
Gergő Móricz
bb67b9812b check if enum is being overwritten somehow 2025-05-23 11:27:49 +02:00
Gergő Móricz
d4e7bde03d add stack 2025-05-23 10:18:30 +02:00
Gergő Móricz
6776292cc2 more log 2025-05-23 09:57:15 +02:00
Gergő Móricz
2e863da334 feat(api/v1/authMiddleware): add log to debug extract agent preview mode 2025-05-23 09:35:29 +02:00
Gergő Móricz
3e736f1e0d
feat(concurrency-log): add cclog endpoint (FIR-2067) (#1589)
* feat(concurrency-log): add cclog endpoint

* fix(api/routes/admin): misimport

* more misimports
2025-05-22 18:13:35 -03:00
Gergő Móricz
fd74299134
feat(scrapeURL, logJob): log pdf page count to db (FIR-2068) (#1587)
* feat(scrapeURL, logJob): log pdf page count to db

* devin stop the test littering pls
2025-05-22 17:26:01 -03:00
Gergő Móricz
cc2c968425
fix(robustFetch): selective logging (#1588) 2025-05-22 17:23:49 -03:00
Gergő Móricz
749d89a551
feat(api/v1/extract): ignoreInvalidURLs (#1585) 2025-05-21 21:46:55 -03:00
Gergő Móricz
6478754f1b
feat(api/extract): show extract as origin for scrapes originating from it (#1584) 2025-05-21 21:45:44 -03:00
Gergő Móricz
8522103228 feat(api/v1/extract): log requests 2025-05-21 23:25:32 +02:00
Gergő Móricz
938ef1cfb2 feat(api/v1/map): log requests 2025-05-21 23:00:43 +02:00
Gergő Móricz
a3aee9be13
fix(queue-worker): finish crawl if all addable URLs were already locked (#1582) 2025-05-21 17:40:31 -03:00
Gergő Móricz
9bb97388e2
feat(search): ignoreBlockedURLs (FIR-1954) (#1580)
* feat(search): ignoreBlockedURLs

* change name
2025-05-21 13:02:58 -03:00
Gergő Móricz
3d4692f47c
Update SELF_HOST.md 2025-05-21 17:50:26 +02:00
devin-ai-integration[bot]
c7bb9e77ab
docs: add MAX_RAM and MAX_CPU environment variables documentation (#1581)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: mogery@sideguide.dev <mogery@sideguide.dev>
2025-05-21 17:49:16 +02:00
Gergő Móricz
b8ea4021c1 fix(services/webhook): greatly improved logging 2025-05-21 16:02:01 +02:00
devin-ai-integration[bot]
9949403b59
FIR-2006: Fix maxUrls and timeLimit parameters in Deep Research API (#1569)
* FIR-2006: Fix maxUrls and timeLimit enforcement in Deep Research API

Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>

* FIR-2006: Add tests for maxUrls and timeLimit enforcement

Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>

* FIR-2006: Replace mocked tests with end-to-end tests for deep research

Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>

* Delete apps/api/src/__tests__/snips/deep-research-service.test.ts

* Delete apps/api/src/__tests__/snips/lib.ts

* Revert "Delete apps/api/src/__tests__/snips/lib.ts"

This reverts commit a2af9baff89d64adc1930ea5b37b4f07f0735a67.

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nicolascamara29@gmail.com>
2025-05-20 18:39:56 -03:00
Gergő Móricz
513f469b0f
feat(python-sdk/CrawlWatcher): remove max payload size from WebSocket (FIR-2038) (#1577)
* feat(python-sdk/CrawlWatcher): remove max payload size from WebSocket

* Update __init__.py

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-05-20 16:59:08 -03:00
Rafael Miller
6d75161302
Fix sdk/undefined response handle error (#1578)
* fix(sdk): update version to 1.25.1 and enhance error handling in FirecrawlApp (#1576)

* package version bump
2025-05-20 16:55:07 -03:00
devin-ai-integration[bot]
a5a915d639
Fix: Concatenate metadata arrays into strings with exceptions (#1574)
* Fix: Concatenate metadata arrays into strings except for ogLocaleAlternate

Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>

* Fix: Only concatenate description field, preserve other metadata arrays

Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>

* Fix: Only concatenate description field, keep other metadata fields in original format

Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nicolascamara29@gmail.com>
2025-05-20 12:40:53 -03:00
Gergő Móricz
f838190ba6
hotfix: kill zombie workers, respect timeouts better (FIR-2034) (#1575)
* feat(scrapeURL): add strict timeouts everywhere

* feat(queue-worker/liveness): add networking check

* fix(queue-worker): typo

* fix(queue-worker/liveness): do not parse

* fix(queue-worker): check local network instead

* fix(queue-worker/liveness): typo
2025-05-20 17:35:32 +02:00
Emircan ERKUL
5152019a05
Update docker-compose.yaml (#1566) 2025-05-20 17:14:02 +02:00
Gergő Móricz
5fcd8bb041
fix(api/search): log page options correctly (#1572) 2025-05-19 19:39:22 -03:00
Gergő Móricz
d8405de89d
fix(auto_charge): fix ACUC clear (#1571) 2025-05-19 19:38:20 -03:00
Gergő Móricz
192d056bef
feat(scrapeURL/pdf/mu): add timeout and created_at (#1570) 2025-05-19 21:36:15 +02:00
Gergő Móricz
fab4f00536
feat(scrapeURL): proxy auto mode (FIR-1853) (#1551)
* feat(scrapeURL): proxy auto mode

* feat(api/tests/snips/proxy/auto): add test for stealth pick
2025-05-19 19:43:03 +02:00
devin-ai-integration[bot]
8eeb3c5cd4
FIR-1951: Add automatic URL encoding in preprocessing for special characters in query parameters (#1547)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nicolascamara29@gmail.com>
2025-05-19 19:31:47 +02:00
devin-ai-integration[bot]
7ccbbec488
Fix LLMs.txt cache bug with subdomains and add bypass option (#1557)
* Fix LLMs.txt cache bug with subdomains and add bypass option (#1519)

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Nick:

* Update LLMs.txt test file to use helper functions and concurrent tests

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Remove LLMs.txt test file as requested

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Change parameter name to 'cache' and keep 7-day expiration

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Update generate-llmstxt-supabase.ts

* Update JS and Python SDKs to include cache parameter

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Fix LLMs.txt cache implementation to use normalizeUrl and exact matching

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Revert "Fix LLMs.txt cache implementation to use normalizeUrl and exact matching"

This reverts commit d05b9964677b7b2384453329d2ac99d841467053.

* Nick:

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-05-16 16:29:09 -03:00
devin-ai-integration[bot]
ab30c8e4ac
Fix Supabase client configuration errors when USE_DB_AUTHENTICATION is false (#1534)
* Fix Supabase client configuration errors when USE_DB_AUTHENTICATION is false

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Add USE_DB_AUTHENTICATION checks to map and search controllers

Add test for USE_DB_AUTHENTICATION=false

Add USE_DB_AUTHENTICATION checks to billing services

Add USE_DB_AUTHENTICATION checks to batch_billing.ts

Add USE_DB_AUTHENTICATION checks to cached-docs.ts

Add USE_DB_AUTHENTICATION checks to supabase-jobs.ts

Add USE_DB_AUTHENTICATION checks to team-id-sync.ts

Add USE_DB_AUTHENTICATION checks to test-suite log.ts

Add USE_DB_AUTHENTICATION checks to idempotency services

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Revert "Add USE_DB_AUTHENTICATION checks to map and search controllers"

This reverts commit 834a5d51a68c74ada67800fa3a0aa45bde22d745.

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
v1.9.0
2025-05-16 12:56:33 -03:00
devin-ai-integration[bot]
526165e1b9
Add caching for RunPod PDF markdown results in GCS (#1561)
* Add caching for RunPod PDF markdown results in GCS

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Update PDF caching to hash base64 directly and add metadata

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

* Fix PDF caching to directly hash content and fix test expectations

Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: thomas@sideguide.dev <thomas@sideguide.dev>
2025-05-16 12:04:38 -03:00
Gergő Móricz
bd9673e104
Mog/cachable lookup (#1560)
* feat(scrapeURL): use cacheableLookup

* feat(queue-worker): add cacheablelookup

* fix(cacheable-lookup): make it work with tailscale on local

* add devenv

* try again

* allow querying all

* log

* fixes

* asd

* fix:

* fix(lookup):

* lookup
2025-05-16 15:44:52 +02:00
Gergő Móricz
d46ba95924 Revert "feat: use cacheable lookup everywhere (#1559)"
This reverts commit b8703b2a720765b92f5c4cab94cc90ea624198a8.
2025-05-16 15:31:06 +02:00
Gergő Móricz
b8703b2a72
feat: use cacheable lookup everywhere (#1559)
* feat(scrapeURL): use cacheableLookup

* feat(queue-worker): add cacheablelookup

* fix(cacheable-lookup): make it work with tailscale on local

* add devenv

* try again

* allow querying all

* log

* fixes

* asd

* fix:

* fix(lookup):
2025-05-16 15:27:24 +02:00
Gergő Móricz
f936befcdb feat(queue-worker): liveness check endpoint 2025-05-16 14:15:48 +02:00
Gergő Móricz
b5b612c35b
feat(api/extract/fire-0): error logging (#1556) 2025-05-15 11:32:59 -03:00
Will
b0c203e512
Fix/optional chaining operators missing (#1549)
* fix: missing optional chaining operator in req.acuc.flags

* fix: missing optional chaining operator in req.acuc.flags
2025-05-15 00:04:04 +02:00
Gergő Móricz
cee481a3a9 fix(fire-engine): sslerror passthrough 2025-05-14 23:50:57 +02:00
Gergő Móricz
3db2294b97
feat(scrapeURL): better error for SSL failures (#1552) 2025-05-14 23:34:59 +02:00
Ademílson Tonato
06189b9646
refactor: increase max limit for search request schema from 50 to 100 (#1545) 2025-05-13 17:40:32 -03:00
Yohann Prigent
505924875e
create openAI provider using base url parameter (#1480)
Co-authored-by: Yohann Prigent <yohann@pandascore.co>
2025-05-12 20:43:07 +02:00
Gergő Móricz
0fd05a67a0 Revert "Revert "fix(queue-worker, scrape): match billing logic and add billing for stealth proxies (#1521)""
This reverts commit 017a915ae8f550ceaa01ad607b4e6a684385eadf.
2025-05-12 17:46:09 +02:00
Gergő Móricz
fdeb01847d feat(queue-worker): add more logs around crawl finishing logic 2025-05-09 16:52:38 +02:00
Nicolas
907cf1cf41 Update __init__.py 2025-05-08 20:29:20 -03:00
devin-ai-integration[bot]
21adf047be
[Bug Fix] Make WaitAction milliseconds field optional in firecrawl-py (#1533)
* This fixes issue #1512 by making the milliseconds field optional in WaitAction and adding a validator to ensure exactly one of milliseconds or selector is provided.

Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>

* Update firecrawl.py

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-05-08 20:27:27 -03:00
Nicolas
7b03ab36a7 Update openapi.json 2025-05-08 20:15:49 -03:00