Gergő Móricz
|
37f58efe45
|
fix(crawl-redis/lockURL): only add to visited_unique if lock succeeds
|
2024-12-15 21:01:31 +01:00 |
|
Gergő Móricz
|
30fa78cd9e
|
feat(queue-worker): fix redirect slipping
|
2024-12-15 20:16:52 +01:00 |
|
Nicolas
|
126b46ee2c
|
Update issue_credits.ts
|
2024-12-15 15:53:24 -03:00 |
|
Nicolas
|
1214d219e1
|
Nick: fix actions errors
|
2024-12-15 15:43:12 -03:00 |
|
Nicolas
|
20f89c3478
|
Merge pull request #978 from mendableai/nsc/timeout-fixes
Timeout fixes on user defined timeouts
|
2024-12-15 15:02:46 -03:00 |
|
Gergő Móricz
|
0f3a27bf27
|
fix(scrapeURL/engines): better timeouts
|
2024-12-15 18:58:29 +01:00 |
|
Nicolas
|
a5256827c0
|
Update index.ts
|
2024-12-15 14:36:09 -03:00 |
|
Gergő Móricz
|
98f27b0acc
|
fix(crawl-redis/addCrawlJobDone): further ensure that completed doesn't go over total
|
2024-12-15 16:29:09 +01:00 |
|
Gergő Móricz
|
b4a5e1a6e9
|
fix(scrapeURL/fire-engine): timeout handling
|
2024-12-15 16:04:17 +01:00 |
|
Gergő Móricz
|
afbd01299a
|
fix(scrapeURL/fire-engine): timeouts
|
2024-12-15 15:58:27 +01:00 |
|
NBR0KN
|
5e267f92ff
|
fix: adjust Playwright service response to match API schema expectations
|
2024-12-15 14:59:53 +01:00 |
|
Gergő Móricz
|
842b522b44
|
feat: add scrapeOptions.fastMode
|
2024-12-15 14:28:47 +01:00 |
|
Nicolas
|
588f747ee8
|
chore: formatting
|
2024-12-15 02:54:49 -03:00 |
|
Nicolas
|
4987880b32
|
Nick: random fixes
|
2024-12-15 02:52:06 -03:00 |
|
Nicolas
|
664ba69f08
|
Nick: f-eng monitoring test
|
2024-12-14 21:40:46 -03:00 |
|
Nicolas
|
c325c3aa33
|
Nick: node sdk patch
|
2024-12-14 14:55:40 -03:00 |
|
Nicolas
|
ccbae4b155
|
Update auth.ts
|
2024-12-14 00:20:14 -03:00 |
|
Gergő Móricz
|
9cc6576571
|
feat(js-sdk/batch/scrape): add ignoreInvalidURLs option
|
2024-12-14 01:16:09 +01:00 |
|
Gergő Móricz
|
4b5014d7fe
|
feat(v1/batch/scrape): add ignoreInvalidURLs option
|
2024-12-14 01:11:43 +01:00 |
|
Gergő Móricz
|
e74e4bcefc
|
feat(runWebScraper): retry a scrape max 3 times in a crawl if the status code is failure
|
2024-12-14 00:54:05 +01:00 |
|
Nicolas
|
6b17a53d4b
|
Update package.json
|
2024-12-12 21:53:15 -03:00 |
|
Nicolas
|
13afe4c733
|
Update index.ts
|
2024-12-12 21:52:20 -03:00 |
|
Nicolas
|
6b41916e1a
|
Merge pull request #971 from mendableai/Hash-Urls
Remove Block List
|
2024-12-12 18:19:51 -03:00 |
|
Nicolas
|
98caf928f4
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2024-12-12 18:14:13 -03:00 |
|
Nicolas
|
3b0d192d1b
|
Update types.ts
|
2024-12-12 18:14:11 -03:00 |
|
Eric Ciarla
|
a2998d4499
|
Hash Urls
|
2024-12-12 16:10:10 -05:00 |
|
Eric Ciarla
|
e06647b4b0
|
Move full app examples to other repo
|
2024-12-12 14:41:11 -05:00 |
|
Nicolas
|
4c9e9836aa
|
Merge pull request #970 from mendableai/nsc/webhooks-custom-metadata
Metadata for webhooks
|
2024-12-12 13:36:18 -03:00 |
|
Nicolas
|
e22a0b596c
|
Nick: custom metadata
|
2024-12-12 13:30:00 -03:00 |
|
Nicolas
|
1d1a936f2c
|
Merge pull request #954 from mendableai/rafa/fix-schema-base-model-extract
Fixes schema base model extract
|
2024-12-11 20:14:35 -03:00 |
|
Nicolas
|
de57e7f4dd
|
Nick: from dependencies to dev-dependencies
|
2024-12-11 20:07:05 -03:00 |
|
Nicolas
|
90f3733533
|
Merge pull request #965 from mendableai/nsc/fixed-prettier-formatting
Fixed Prettier
|
2024-12-11 19:54:54 -03:00 |
|
Nicolas
|
8a1c404918
|
Nick: revert trailing comma
|
2024-12-11 19:51:08 -03:00 |
|
Nicolas
|
52f2e733e2
|
Nick: fixes
|
2024-12-11 19:48:22 -03:00 |
|
Nicolas
|
00335e2ba9
|
Nick: fixed prettier
|
2024-12-11 19:46:11 -03:00 |
|
Eric Ciarla
|
e5fe9e1534
|
Create .env.example
|
2024-12-11 15:31:08 -05:00 |
|
Gergő Móricz
|
f877fbfb8f
|
fix(WebCrawler/isFile): add .wav
|
2024-12-10 23:24:53 +01:00 |
|
Gergő Móricz
|
d276a23da0
|
fix(scrapeURL/pdf): handle if a presumed PDF link returns HTML (e.g. 404)
|
2024-12-10 23:24:33 +01:00 |
|
Gergő Móricz
|
d9e017e5e2
|
feat(queue-worker/crawl): solidify redirect behaviour
|
2024-12-10 22:34:26 +01:00 |
|
Gergő Móricz
|
ce460a3a56
|
fix(v1/crawl/status): completed more than total if some scrape jobs fail or are discarded
|
2024-12-10 22:33:53 +01:00 |
|
Gergő Móricz
|
ecad76978d
|
feat(scrapeURL/pdf): extend amount of time we're willing to wait for PDFs in crawl/batch scrape mode
|
2024-12-10 21:43:00 +01:00 |
|
Gergő Móricz
|
85cbfbb5bb
|
fix(crawl): disable smart wait
This increases the reliability/deterministic-ness of crawls.
|
2024-12-10 21:12:31 +01:00 |
|
Eric Ciarla
|
5d90a6c1cd
|
Merge pull request #946 from BexTuychiev/price-tracker
Add assets for the Automated Amazon Price Tracking article
|
2024-12-10 10:27:42 -05:00 |
|
Nicolas
|
2d35a52efe
|
Merge pull request #958 from mendableai/remove-microsoft
|
2024-12-10 12:00:49 -03:00 |
|
rafaelmmiller
|
468b8cdeb9
|
removing microsoft from blocklist
|
2024-12-10 11:29:36 -03:00 |
|
Gergő Móricz
|
877f072e3c
|
feat: crawl log parser (poc)
|
2024-12-09 23:40:47 +01:00 |
|
Nicolas
|
4dbe0e6236
|
Update requests.http
|
2024-12-09 19:26:33 -03:00 |
|
Nicolas
|
a47e278c97
|
Nick: bump node sdk
|
2024-12-09 19:25:48 -03:00 |
|
Gergő Móricz
|
91a1a9a1fc
|
fix(crawl-redis/lockURL): reduce logging
|
2024-12-09 19:29:42 +01:00 |
|
Gergő Móricz
|
6776aee1c3
|
feat(auth): extend rate limiter logging to make it easier to debug
|
2024-12-09 19:29:32 +01:00 |
|