3460 Commits

Author SHA1 Message Date
Nicolas
e899ecbe44 Update llmExtract.ts 2024-12-18 16:52:05 -03:00
RutamBhagat
e776847c71 feat(js-sdk): improve API key handling for cloud vs self-hosted services in FirecrawlApp 2024-12-17 13:40:14 -08:00
RutamBhagat
29cea4c51d feat(python-sdk): improve API key handling for cloud vs self-hosted services in FirecrawlApp 2024-12-17 13:31:35 -08:00
Móricz Gergő
bd36c441d3 feat(queue-worker): improve team-based logging 2024-12-17 22:06:36 +01:00
Móricz Gergő
780442d73b feat: improve billing logging 2024-12-17 22:02:31 +01:00
Nicolas
ac187452c3 Nick: better filtering for urls that should be scraped 2024-12-17 17:34:55 -03:00
Nicolas
3b6edef9fa chore: formatting 2024-12-17 16:58:57 -03:00
Nicolas
b9f621bed5 Nick: extract fixes 2024-12-17 16:58:35 -03:00
Nicolas
79e335636a Nick: fixed extract issues 2024-12-17 16:40:45 -03:00
Nicolas
6d77879d68 Update extract.ts 2024-12-17 15:22:25 -03:00
Nicolas
e26a0a65a7 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-12-17 15:19:53 -03:00
Nicolas
0f8b8a717d Update map.ts 2024-12-17 15:19:52 -03:00
Eric Ciarla
4ad6665d39
Merge pull request #987 from mendableai/default-to-pdf-parse
Revert to pdf parse
2024-12-17 12:13:46 -05:00
Eric Ciarla
a20a003c74 revert to pdf parse 2024-12-17 12:12:22 -05:00
Eric Ciarla
db8e9c36d6
Merge pull request #984 from mendableai/default-to-pdf-parse
Default to pdf2md, if under 500 chars (indicating failure) use LlamaParse
2024-12-17 10:06:40 -05:00
Eric Ciarla
194353af0d Remove pdf parse 2024-12-17 10:04:20 -05:00
Eric Ciarla
1402831a0a Replace pdf parse with pdf to md 2024-12-17 09:59:52 -05:00
Eric Ciarla
ed7d15d2af Update index.ts 2024-12-17 09:50:29 -05:00
Gergő Móricz
654d6c6e0b fix(scrapeURL): increase timeToRun 2024-12-17 13:21:24 +01:00
Gergő Móricz
47b968fede fix(scrapeURL/fire-engine): timeout calculation issues 2024-12-17 13:17:55 +01:00
Gergő Móricz
7f57c868be Revert "fix(scrapeURL): better timeToRun distribution"
This reverts commit 284a6ccedd1baede825571ee933eb7e4f773e2de.
2024-12-16 23:08:20 +01:00
Gergő Móricz
284a6ccedd fix(scrapeURL): better timeToRun distribution 2024-12-16 23:01:34 +01:00
Gergő Móricz
0013bdfcb4 feat(v1/scrape): add more context to timeout logs 2024-12-16 22:42:51 +01:00
Gergő Móricz
139e2c9a05 fix(runWebScraper): proper error handling 2024-12-16 22:24:00 +01:00
Rafael Miller
2c233bd321
Update requests.http 2024-12-16 11:48:48 -03:00
rafaelmmiller
d8150c6171 added type to reqs example 2024-12-16 11:46:56 -03:00
rafaelmmiller
b6802bc443 merged with main 2024-12-16 11:41:59 -03:00
Rafael Miller
8192d756e9
Merge branch 'main' into rafa/fix-default-on-schema-llm-extract 2024-12-16 09:33:36 -03:00
rafaelmmiller
eab30c474b added unit tests 2024-12-16 09:30:40 -03:00
Gergő Móricz
2de659d810 fix(queue-jobs): fix concurrency limit 2024-12-15 23:54:52 +01:00
Gergő Móricz
72d6a8179e fix(rate-limiter): raise crawlStatus limits 2024-12-15 23:08:23 +01:00
Gergő Móricz
e97ee4a4be fix(WebScraper/tryGetSitemap): deduplicate sitemap links list 2024-12-15 22:33:36 +01:00
Gergő Móricz
37f58efe45 fix(crawl-redis/lockURL): only add to visited_unique if lock succeeds 2024-12-15 21:01:31 +01:00
Gergő Móricz
30fa78cd9e feat(queue-worker): fix redirect slipping 2024-12-15 20:16:52 +01:00
Nicolas
126b46ee2c Update issue_credits.ts 2024-12-15 15:53:24 -03:00
Nicolas
1214d219e1 Nick: fix actions errors 2024-12-15 15:43:12 -03:00
Nicolas
20f89c3478
Merge pull request #978 from mendableai/nsc/timeout-fixes
Timeout fixes on user defined timeouts
2024-12-15 15:02:46 -03:00
Gergő Móricz
0f3a27bf27 fix(scrapeURL/engines): better timeouts 2024-12-15 18:58:29 +01:00
Nicolas
a5256827c0 Update index.ts 2024-12-15 14:36:09 -03:00
Gergő Móricz
98f27b0acc fix(crawl-redis/addCrawlJobDone): further ensure that completed doesn't go over total 2024-12-15 16:29:09 +01:00
Gergő Móricz
b4a5e1a6e9 fix(scrapeURL/fire-engine): timeout handling 2024-12-15 16:04:17 +01:00
Gergő Móricz
afbd01299a fix(scrapeURL/fire-engine): timeouts 2024-12-15 15:58:27 +01:00
NBR0KN
5e267f92ff fix: adjust Playwright service response to match API schema expectations 2024-12-15 14:59:53 +01:00
Gergő Móricz
842b522b44 feat: add scrapeOptions.fastMode 2024-12-15 14:28:47 +01:00
Nicolas
588f747ee8 chore: formatting 2024-12-15 02:54:49 -03:00
Nicolas
4987880b32 Nick: random fixes 2024-12-15 02:52:06 -03:00
Nicolas
664ba69f08 Nick: f-eng monitoring test 2024-12-14 21:40:46 -03:00
Nicolas
c325c3aa33 Nick: node sdk patch 2024-12-14 14:55:40 -03:00
Nicolas
ccbae4b155 Update auth.ts 2024-12-14 00:20:14 -03:00
Gergő Móricz
9cc6576571 feat(js-sdk/batch/scrape): add ignoreInvalidURLs option 2024-12-14 01:16:09 +01:00