Gergő Móricz
|
ecad76978d
|
feat(scrapeURL/pdf): extend amount of time we're willing to wait for PDFs in crawl/batch scrape mode
|
2024-12-10 21:43:00 +01:00 |
|
Gergő Móricz
|
85cbfbb5bb
|
fix(crawl): disable smart wait
This increases the reliability/deterministic-ness of crawls.
|
2024-12-10 21:12:31 +01:00 |
|
Nicolas
|
2d35a52efe
|
Merge pull request #958 from mendableai/remove-microsoft
|
2024-12-10 12:00:49 -03:00 |
|
rafaelmmiller
|
468b8cdeb9
|
removing microsoft from blocklist
|
2024-12-10 11:29:36 -03:00 |
|
Gergő Móricz
|
877f072e3c
|
feat: crawl log parser (poc)
|
2024-12-09 23:40:47 +01:00 |
|
Nicolas
|
4dbe0e6236
|
Update requests.http
|
2024-12-09 19:26:33 -03:00 |
|
Nicolas
|
a47e278c97
|
Nick: bump node sdk
|
2024-12-09 19:25:48 -03:00 |
|
Gergő Móricz
|
91a1a9a1fc
|
fix(crawl-redis/lockURL): reduce logging
|
2024-12-09 19:29:42 +01:00 |
|
Gergő Móricz
|
6776aee1c3
|
feat(auth): extend rate limiter logging to make it easier to debug
|
2024-12-09 19:29:32 +01:00 |
|
Nicolas
|
f007f2439e
|
Update email_notification.ts
|
2024-12-08 22:24:16 -03:00 |
|
Nicolas
|
4d287bb77f
|
Nick: moving acuc temp to read replica
|
2024-12-06 13:06:26 -03:00 |
|
Gergő Móricz
|
934363b409
|
feat(queue-worker): add better logging for worker
|
2024-12-05 22:06:07 +01:00 |
|
Gergő Móricz
|
f82b9c205c
|
fix(crawl-redis): oops
|
2024-12-05 21:42:08 +01:00 |
|
Gergő Móricz
|
845c2744a9
|
feat(app): add extra crawl logging (app-side only for now)
|
2024-12-05 20:50:36 +01:00 |
|
Gergő Móricz
|
cce94289ee
|
fix(v1/batch/scrape): horrid memory usage
|
2024-12-05 20:49:28 +01:00 |
|
Gergő Móricz
|
f8e619b5df
|
fix(crawl-status): returnvalue filtering on active jobs
|
2024-12-05 18:20:21 +01:00 |
|
Gergő Móricz
|
41d859203f
|
feat(v1/batch/scrape): appendToId
|
2024-12-04 23:35:29 +01:00 |
|
Gergő Móricz
|
7bde034020
|
auth: log team id
|
2024-12-04 23:12:55 +01:00 |
|
Nicolas
|
64546f1259
|
Update types.ts
|
2024-12-04 18:00:51 -03:00 |
|
Nicolas
|
f7207f91b4
|
Nick: temp e-s-1
|
2024-12-04 16:25:43 -03:00 |
|
Gergő Móricz
|
6b1f30e0fb
|
fix(scrapeURL/removeUnwantedElements): try to fix onlyMainContent for poorly structured sites
|
2024-12-04 19:05:12 +01:00 |
|
Gergő Móricz
|
88a16b18a3
|
fix(crawl-status): ts error
|
2024-12-04 17:55:51 +01:00 |
|
Gergő Móricz
|
d8613899e3
|
fix(crawl-status): handle failed jobs (oops)
|
2024-12-04 17:52:47 +01:00 |
|
Gergő Móricz
|
712a138404
|
fix(crawl-status): hard error bug
|
2024-12-04 17:47:37 +01:00 |
|
Nicolas
|
51a6b83f45
|
Nick: fixed the crawl + n - not respecting limit
|
2024-12-04 12:56:47 -03:00 |
|
Nicolas
|
39ff49a8f3
|
Nick: reverted redirect fix
|
2024-12-04 12:42:56 -03:00 |
|
Nicolas
|
4d2f4aad11
|
Update index.ts
|
2024-12-03 21:07:45 -03:00 |
|
Nicolas
|
f3aa32863f
|
Revert "Merge branch 'nsc/crawl-n--1-fixes'"
This reverts commit 6d325b7ce7af912b326369eace62f89f897b536b, reversing
changes made to 3d5704b73e6c4802f0344dc2e17042af9b6de0f5.
|
2024-12-03 20:53:14 -03:00 |
|
Nicolas
|
64800a1c02
|
Nick: rm fe for test
|
2024-12-03 20:34:14 -03:00 |
|
Nicolas
|
43530b3b4d
|
Nick: fixed n-1 w/ Rafa
|
2024-12-03 17:53:17 -03:00 |
|
Nicolas
|
52806807a1
|
Nick: crawl fixes
|
2024-12-03 16:25:55 -03:00 |
|
Nicolas
|
1477ab2359
|
Nick: log clear ACUC cache
|
2024-12-03 12:15:09 -03:00 |
|
Nicolas
|
990944188e
|
Nick: e2e tests for no schema extract
|
2024-12-03 12:10:33 -03:00 |
|
Nicolas
|
eb2e51e50b
|
Nick: fixed /extract without a schema
|
2024-12-03 12:08:15 -03:00 |
|
Nicolas
|
4bb46ed152
|
Nick: extract prompt fixes and limit the number of urls
|
2024-12-01 20:29:03 -03:00 |
|
rafaelmmiller
|
5ddb7eb922
|
parameter
|
2024-11-29 16:44:54 -03:00 |
|
Gergő Móricz
|
42980c899d
|
fix(scrapeURL/fire-engine): fast fail on chrome error
|
2024-11-28 18:41:48 +01:00 |
|
Móricz Gergő
|
60ea97c51c
|
fix(log_job): infinite loop
|
2024-11-28 08:49:03 +01:00 |
|
rafaelmmiller
|
943bbae88d
|
fixed nested data inside extract
|
2024-11-27 18:29:37 -03:00 |
|
Nicolas
|
53e0cb6b19
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2024-11-27 12:47:12 -03:00 |
|
Nicolas
|
02cd5bcfa4
|
Nick: bumped the status rl
|
2024-11-27 12:47:11 -03:00 |
|
rafaelmmiller
|
b69c6f9f95
|
added library.tiktok to allowedKeywords
|
2024-11-27 10:10:43 -03:00 |
|
Nicolas
|
6c33b978f3
|
Merge pull request #915 from mendableai/nsc/new-extract
Extract (beta)
|
2024-11-26 10:02:09 -08:00 |
|
Nicolas
|
5522d6af7d
|
Update extract.ts
|
2024-11-26 15:01:42 -03:00 |
|
Gergő Móricz
|
d3a9d29288
|
return bug
|
2024-11-26 18:04:09 +01:00 |
|
Gergő Móricz
|
e217952434
|
fix(crawl): finish crawl even if last one fails
|
2024-11-26 16:28:45 +01:00 |
|
Gergő Móricz
|
f395c5b008
|
fix(crawl): failed behaviour
|
2024-11-26 16:25:48 +01:00 |
|
Nicolas
|
8a26f08b14
|
Update extract.ts
|
2024-11-24 20:37:58 -08:00 |
|
Nicolas
|
2513efc971
|
Update extract.ts
|
2024-11-24 20:31:38 -08:00 |
|
Nicolas
|
a18614cd00
|
Update queue-jobs.ts
|
2024-11-24 19:48:57 -08:00 |
|