2223 Commits

Author SHA1 Message Date
Gergő Móricz
a1e6c13b67 move crawl to read replica 2025-03-06 21:57:42 +01:00
Nicolas
982b3da27d Update auth.ts 2025-03-06 17:53:41 -03:00
Nicolas
5a149a1c30 Revert "fix(auth): always use replica for acuc"
This reverts commit 57b313608cb6e9f3c123e4754c653fef8dfacbdf.
2025-03-06 17:53:28 -03:00
Nicolas
783fad90dd Nick: more read replicas 2025-03-06 17:45:51 -03:00
Nicolas
949fb68bdd Merge branch 'main' of https://github.com/mendableai/firecrawl 2025-03-06 17:36:57 -03:00
Nicolas
4c4d51e60b Revert "Update runWebScraper.ts"
This reverts commit 40eacfacd4c5a782f722fd85e3935f4a1741b27a.
2025-03-06 17:36:08 -03:00
Gergő Móricz
57b313608c fix(auth): always use replica for acuc 2025-03-06 21:35:30 +01:00
Gergő Móricz
7b05512054 fix(credit_billing): teams check 2025-03-06 21:30:57 +01:00
Nicolas
ae010a76c1 Update blocklist.ts 2025-03-06 17:11:00 -03:00
Gergő Móricz
e6c3f20944 fix(preview): temporarily disable 2025-03-06 21:08:12 +01:00
Nicolas
72d894c2a3 Update rate-limiter.ts 2025-03-06 17:06:42 -03:00
Nicolas
1de5a2c589 Update batch_billing.ts 2025-03-06 16:57:23 -03:00
Nicolas
eba28a4210 Merge branch 'main' of https://github.com/mendableai/firecrawl 2025-03-06 16:41:22 -03:00
Nicolas
40eacfacd4 Update runWebScraper.ts 2025-03-06 16:41:21 -03:00
Gergő Móricz
e1cfe1da48
feat(crawl): includes/excludes fixes (FIR-1300) (#1303)
* feat(crawl): includes/excludes fixes pt. 1

* fix(snips): billing tests

* drop tha logs

* fix(ci): add replica url

* feat(crawl): drop initial scrape if it's not included

* feat(ci): more verbose logging

* fix crawl path in test

* fix(ci): wait for api

* fix(snips/scrape/ad): test for more pixels

* feat(js-sdk/crawl): add regexOnFullURL
2025-03-06 17:05:15 +01:00
Gergő Móricz
f8df18ed6a feat(feng-check): run on chrome-cdp 2025-03-05 18:53:10 +01:00
Gergő Móricz
c22c87ab0a fix(crawler): check for more strings 2025-03-05 18:48:05 +01:00
Eric Ciarla
5a1886936c
Truncate llmstxt cache based on maxurls limit & improve maxurls handling (#1285)
* init

* Update generate-llmstxt-service.ts
2025-03-03 18:37:33 -03:00
Gergő Móricz
1beadf39f0 fix 2025-03-03 21:06:01 +01:00
Gergő Móricz
6dce8e9ddf asd 2025-03-03 21:05:45 +01:00
Gergő Móricz
8620bf3d5e fix(auth): split load evenly between two instances 2025-03-03 20:53:51 +01:00
Gergő Móricz
67ee2662de feat(auth): force acuc on read replica 2025-03-03 20:36:54 +01:00
Nicolas
64af3ba728 Revert "Update auth.ts"
This reverts commit fea249c568c4a8d63bcc2e560fd3655e73350be6.
2025-03-03 16:26:42 -03:00
Nicolas
9eb25786dc Update deep-research-service.ts 2025-03-02 18:16:20 -03:00
Nicolas
30c1e92a2e Update deep-research-service.ts 2025-03-02 17:37:15 -03:00
Nicolas
22d4f0de38
(feat/deep-research) Alpha prep + Improvements (#1284)
* Update index.ts

* Nick:

* Nick: topic -> query

* Update deep-research.ts

* Nick: bump

* Nick: add onSource callback
2025-03-02 17:33:21 -03:00
Gergő Móricz
9ad947884d
feat(tests/snips): add billing tests + misc billing fixes (FIR-1280) (#1283)
* feat(tests/snips): add billing tests + misc billing fixes

* add testing key

* asd
2025-03-02 16:51:42 -03:00
Gergő Móricz
4f25f12a12
fix(ai): handle if AI returns a JSON code block (#1280) 2025-03-02 15:25:24 -03:00
Gergő Móricz
e8c698d613
feat(crawler): handle cross-origin redirects differently than same-origin redirects (#1279) 2025-03-02 13:32:46 +01:00
Nicolas
fea249c568 Update auth.ts 2025-03-02 02:40:34 -03:00
Gergő Móricz
904e69bfbc
feat(supabase): add read replica routing (#1274) 2025-02-28 09:52:26 +01:00
Gergő Móricz
44bf59229a fix(acuc): cache for 1 hour 2025-02-27 21:36:33 +01:00
Nicolas
b72e21a697
Nick: batch billing (#1264) 2025-02-27 20:18:03 +01:00
Nicolas
289e351c14
(feat/deep-research-alpha) Added Max Urls, Sources and Fixes (#1271)
* Nick: fixes

* Nick:

* Update deep-research-status.ts
2025-02-27 13:24:40 -03:00
Gergő Móricz
1d3757b391 bump map to 30k 2025-02-27 12:44:23 +01:00
Grass Huang
7bf04d409a
fix(scraper): improve charset detection regex to accurately parse meta tags (#1265) 2025-02-26 17:31:06 +01:00
Nicolas
31df234127 Update log_job.ts 2025-02-25 21:01:05 -03:00
Nicolas
ec90aaffd6 Update log_job.ts 2025-02-25 21:01:00 -03:00
Nicolas
59d09f5c45 Update log_job.ts 2025-02-25 19:32:16 -03:00
Gergő Móricz
115b6b61c4 add initial codeowners 2025-02-25 14:28:09 +01:00
Gergő Móricz
8c42b08b7e
feat(v1/crawl-status-ws): update behavior to ignore errors like regular crawl-status (#1234) 2025-02-24 21:44:29 +01:00
Gergő Móricz
15489be542
feat(self-host/ai): use any OpenAI-compatible API (#1245) 2025-02-23 09:07:32 +01:00
Nicolas
b24ac0f6b5
Nick: done (#1237) 2025-02-22 20:18:46 -03:00
Nicolas
5ab86b8b43
(fix/token-slicer) Fixes extract token limit issues (#1236)
* Nick: fixes extract token limit errors

* Update llmExtract.ts

* Update llmExtract.ts
2025-02-21 20:44:42 +01:00
Gergő Móricz
76e1f29ae8
Update Dockerfile (#1231) (#1232)
* Update Dockerfile (#1231)

* Dockerfile: re-add prod-deps stage and fix copies

---------

Co-authored-by: Loris <loris.rion@gmail.com>
2025-02-21 17:44:01 +01:00
Nicolas
6c51ef401e Update rate-limiter.ts 2025-02-20 22:31:43 -03:00
Nicolas
25d9bdb1f6
(feat/ai-sdk) Migrate to AI-SDK (#1220)
* Nick: init

* Update llmExtract.ts

* Update llmExtract.ts

* Nick rename

* fix(v1/types): extract json schema validation

* Update url-processor.ts

* feat(ai-sdk): ollama support

* feat(ai-sdk): further ollama support

* Nick: it is broken btw

* feat(ai-sdk): abstract model adapter

* Update pnpm-lock.yaml

* Update analyzeSchemaAndPrompt.ts

* Nick:

* feat(ai-sdk): ollama support

* doc(SELF_HOST): update with embedding param

* Nick:

* Update ranker.ts

* Nick:

* feat(ai-sdk): fixes

* Update llmExtract.ts

* feat: remove zod-to-json-schema

* fix

* Update llmExtract.ts

* use openai

* fixes

---------

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-02-20 22:48:58 +01:00
Gergő Móricz
16c305775e
fix(crawl-redis): ignore empty includes/excludes (#1223)
* fix(crawl-redis): ignore empty includes/excludes

* fix(snips/scrape): bump timeouts
2025-02-20 19:06:02 +01:00
Gergő Móricz
283a3bfef3
fix(scrapeURL/engines/fetch): discover charset and re-decode (#1221)
* fix(scrapeURL/engines/fetch): discover charset and re-decode

* fix(snips/scrape): allow more time for stealth proxy
2025-02-20 18:56:15 +01:00
Gergő Móricz
e417f83c28
feat(self-host): ollama support (#1219) 2025-02-20 16:59:19 +01:00