Nicolas
|
e899ecbe44
|
Update llmExtract.ts
|
2024-12-18 16:52:05 -03:00 |
|
RutamBhagat
|
e776847c71
|
feat(js-sdk): improve API key handling for cloud vs self-hosted services in FirecrawlApp
|
2024-12-17 13:40:14 -08:00 |
|
RutamBhagat
|
29cea4c51d
|
feat(python-sdk): improve API key handling for cloud vs self-hosted services in FirecrawlApp
|
2024-12-17 13:31:35 -08:00 |
|
Móricz Gergő
|
bd36c441d3
|
feat(queue-worker): improve team-based logging
|
2024-12-17 22:06:36 +01:00 |
|
Móricz Gergő
|
780442d73b
|
feat: improve billing logging
|
2024-12-17 22:02:31 +01:00 |
|
Nicolas
|
ac187452c3
|
Nick: better filtering for urls that should be scraped
|
2024-12-17 17:34:55 -03:00 |
|
Nicolas
|
3b6edef9fa
|
chore: formatting
|
2024-12-17 16:58:57 -03:00 |
|
Nicolas
|
b9f621bed5
|
Nick: extract fixes
|
2024-12-17 16:58:35 -03:00 |
|
Nicolas
|
79e335636a
|
Nick: fixed extract issues
|
2024-12-17 16:40:45 -03:00 |
|
Nicolas
|
6d77879d68
|
Update extract.ts
|
2024-12-17 15:22:25 -03:00 |
|
Nicolas
|
e26a0a65a7
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2024-12-17 15:19:53 -03:00 |
|
Nicolas
|
0f8b8a717d
|
Update map.ts
|
2024-12-17 15:19:52 -03:00 |
|
Eric Ciarla
|
4ad6665d39
|
Merge pull request #987 from mendableai/default-to-pdf-parse
Revert to pdf parse
|
2024-12-17 12:13:46 -05:00 |
|
Eric Ciarla
|
a20a003c74
|
revert to pdf parse
|
2024-12-17 12:12:22 -05:00 |
|
Eric Ciarla
|
db8e9c36d6
|
Merge pull request #984 from mendableai/default-to-pdf-parse
Default to pdf2md, if under 500 chars (indicating failure) use LlamaParse
|
2024-12-17 10:06:40 -05:00 |
|
Eric Ciarla
|
194353af0d
|
Remove pdf parse
|
2024-12-17 10:04:20 -05:00 |
|
Eric Ciarla
|
1402831a0a
|
Replace pdf parse with pdf to md
|
2024-12-17 09:59:52 -05:00 |
|
Eric Ciarla
|
ed7d15d2af
|
Update index.ts
|
2024-12-17 09:50:29 -05:00 |
|
Gergő Móricz
|
654d6c6e0b
|
fix(scrapeURL): increase timeToRun
|
2024-12-17 13:21:24 +01:00 |
|
Gergő Móricz
|
47b968fede
|
fix(scrapeURL/fire-engine): timeout calculation issues
|
2024-12-17 13:17:55 +01:00 |
|
Gergő Móricz
|
7f57c868be
|
Revert "fix(scrapeURL): better timeToRun distribution"
This reverts commit 284a6ccedd1baede825571ee933eb7e4f773e2de.
|
2024-12-16 23:08:20 +01:00 |
|
Gergő Móricz
|
284a6ccedd
|
fix(scrapeURL): better timeToRun distribution
|
2024-12-16 23:01:34 +01:00 |
|
Gergő Móricz
|
0013bdfcb4
|
feat(v1/scrape): add more context to timeout logs
|
2024-12-16 22:42:51 +01:00 |
|
Gergő Móricz
|
139e2c9a05
|
fix(runWebScraper): proper error handling
|
2024-12-16 22:24:00 +01:00 |
|
Rafael Miller
|
2c233bd321
|
Update requests.http
|
2024-12-16 11:48:48 -03:00 |
|
rafaelmmiller
|
d8150c6171
|
added type to reqs example
|
2024-12-16 11:46:56 -03:00 |
|
rafaelmmiller
|
b6802bc443
|
merged with main
|
2024-12-16 11:41:59 -03:00 |
|
Rafael Miller
|
8192d756e9
|
Merge branch 'main' into rafa/fix-default-on-schema-llm-extract
|
2024-12-16 09:33:36 -03:00 |
|
rafaelmmiller
|
eab30c474b
|
added unit tests
|
2024-12-16 09:30:40 -03:00 |
|
Gergő Móricz
|
2de659d810
|
fix(queue-jobs): fix concurrency limit
|
2024-12-15 23:54:52 +01:00 |
|
Gergő Móricz
|
72d6a8179e
|
fix(rate-limiter): raise crawlStatus limits
|
2024-12-15 23:08:23 +01:00 |
|
Gergő Móricz
|
e97ee4a4be
|
fix(WebScraper/tryGetSitemap): deduplicate sitemap links list
|
2024-12-15 22:33:36 +01:00 |
|
Gergő Móricz
|
37f58efe45
|
fix(crawl-redis/lockURL): only add to visited_unique if lock succeeds
|
2024-12-15 21:01:31 +01:00 |
|
Gergő Móricz
|
30fa78cd9e
|
feat(queue-worker): fix redirect slipping
|
2024-12-15 20:16:52 +01:00 |
|
Nicolas
|
126b46ee2c
|
Update issue_credits.ts
|
2024-12-15 15:53:24 -03:00 |
|
Nicolas
|
1214d219e1
|
Nick: fix actions errors
|
2024-12-15 15:43:12 -03:00 |
|
Nicolas
|
20f89c3478
|
Merge pull request #978 from mendableai/nsc/timeout-fixes
Timeout fixes on user defined timeouts
|
2024-12-15 15:02:46 -03:00 |
|
Gergő Móricz
|
0f3a27bf27
|
fix(scrapeURL/engines): better timeouts
|
2024-12-15 18:58:29 +01:00 |
|
Nicolas
|
a5256827c0
|
Update index.ts
|
2024-12-15 14:36:09 -03:00 |
|
Gergő Móricz
|
98f27b0acc
|
fix(crawl-redis/addCrawlJobDone): further ensure that completed doesn't go over total
|
2024-12-15 16:29:09 +01:00 |
|
Gergő Móricz
|
b4a5e1a6e9
|
fix(scrapeURL/fire-engine): timeout handling
|
2024-12-15 16:04:17 +01:00 |
|
Gergő Móricz
|
afbd01299a
|
fix(scrapeURL/fire-engine): timeouts
|
2024-12-15 15:58:27 +01:00 |
|
NBR0KN
|
5e267f92ff
|
fix: adjust Playwright service response to match API schema expectations
|
2024-12-15 14:59:53 +01:00 |
|
Gergő Móricz
|
842b522b44
|
feat: add scrapeOptions.fastMode
|
2024-12-15 14:28:47 +01:00 |
|
Nicolas
|
588f747ee8
|
chore: formatting
|
2024-12-15 02:54:49 -03:00 |
|
Nicolas
|
4987880b32
|
Nick: random fixes
|
2024-12-15 02:52:06 -03:00 |
|
Nicolas
|
664ba69f08
|
Nick: f-eng monitoring test
|
2024-12-14 21:40:46 -03:00 |
|
Nicolas
|
c325c3aa33
|
Nick: node sdk patch
|
2024-12-14 14:55:40 -03:00 |
|
Nicolas
|
ccbae4b155
|
Update auth.ts
|
2024-12-14 00:20:14 -03:00 |
|
Gergő Móricz
|
9cc6576571
|
feat(js-sdk/batch/scrape): add ignoreInvalidURLs option
|
2024-12-14 01:16:09 +01:00 |
|