Móricz Gergő
|
977a3e13c5
|
fix(scrapeURL): remove short content check
|
2025-01-08 11:23:25 +01:00 |
|
Nicolas
|
0a41fdd35d
|
Merge branch 'nsc/extract-queue'
|
2025-01-07 18:21:57 -03:00 |
|
Nicolas
|
7918d0e1c9
|
Nick: bump 1.12.0
|
2025-01-07 18:20:56 -03:00 |
|
Nicolas
|
f82a742cd1
|
Merge pull request #1044 from mendableai/nsc/extract-queue
(feat/extract) Move extract to a queue system
|
2025-01-07 18:10:46 -03:00 |
|
Nicolas
|
b98e289f03
|
Nick:
|
2025-01-07 17:49:21 -03:00 |
|
Nicolas
|
a185c05a5c
|
Nick: sdk async and get status
|
2025-01-07 17:27:40 -03:00 |
|
Nicolas
|
9ec08d7020
|
Nick: fixed the sdks
|
2025-01-07 17:20:49 -03:00 |
|
Nicolas
|
dd14744850
|
Update types.ts
|
2025-01-07 16:55:55 -03:00 |
|
Nicolas
|
9fdcfb9314
|
Update index.ts
|
2025-01-07 16:24:46 -03:00 |
|
Nicolas
|
51636352a6
|
Merge branch 'nsc/extract-queue' of https://github.com/mendableai/firecrawl into nsc/extract-queue
|
2025-01-07 16:21:58 -03:00 |
|
Nicolas
|
11af214db1
|
Nick: update extract in case there is an error
|
2025-01-07 16:21:51 -03:00 |
|
Gergő Móricz
|
1f2a76fc23
|
Update apps/api/src/lib/extract/extraction-service.ts
|
2025-01-07 20:18:10 +01:00 |
|
Nicolas
|
eb254547e5
|
Nick:
|
2025-01-07 16:16:01 -03:00 |
|
Gergő Móricz
|
c6a63793bb
|
crawl incomplete issues
|
2025-01-07 19:38:17 +01:00 |
|
Gergő Móricz
|
ccfada98ca
|
various queue fixes
|
2025-01-07 19:15:23 +01:00 |
|
Nicolas
|
86e34d7c6c
|
Nick: wip
|
2025-01-07 12:13:12 -03:00 |
|
Móricz Gergő
|
7a03275575
|
add comment
|
2025-01-07 13:57:47 +01:00 |
|
Móricz Gergő
|
7d73ebdbf1
|
fix(crawl): never invalidate first crawl scrape if redirects
|
2025-01-07 13:57:23 +01:00 |
|
Móricz Gergő
|
b96b97ed72
|
fix(crawl): don't push rawhtml to db unless requested
|
2025-01-07 10:09:15 +01:00 |
|
Móricz Gergő
|
35d1d85978
|
fix(crawler): also take the hostname of the base url when determining isInternalLink
|
2025-01-07 09:29:58 +01:00 |
|
Nicolas
|
bb27594443
|
Merge branch 'main' into nsc/extract-queue
|
2025-01-06 13:01:15 -03:00 |
|
Nicolas
|
b82cfa8540
|
Merge pull request #1038 from 1101-1/add_actual_random_useragent
feat: use new random user agent instead of the old one
|
2025-01-06 11:51:15 -03:00 |
|
Kirill
|
736c3675b6
|
use new agent generation instead of expired one
|
2025-01-05 17:07:14 +04:00 |
|
Nicolas
|
ceb2104960
|
Merge pull request #1034 from mendableai/sdk/fixed-none-undefined-on-response
[SDK] fixed none and undefined on response
|
2025-01-04 16:31:41 -03:00 |
|
Gergő Móricz
|
461842fe8c
|
fix(v1/crawl-status): handle job's returnvalue being explicitly null (db race)
|
2025-01-04 17:24:33 +01:00 |
|
Gergő Móricz
|
b92a4eb79b
|
fix(queue-worker): only do redirect handling logic on crawls, not batch scrape
|
2025-01-04 16:59:35 +01:00 |
|
Nicolas
|
d48ddb8820
|
Update canonical-url.test.ts
|
2025-01-03 23:55:05 -03:00 |
|
Nicolas
|
f2e0bfbfe3
|
Nick: url normalization
|
2025-01-03 23:54:03 -03:00 |
|
Nicolas
|
f25c0c6d21
|
Nick: added canonical tests
|
2025-01-03 23:16:33 -03:00 |
|
Nicolas
|
aef040b41e
|
Nick: from cache fixes
|
2025-01-03 23:07:15 -03:00 |
|
Nicolas
|
e8a9d8ddcd
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2025-01-03 22:55:42 -03:00 |
|
Nicolas
|
05e845a971
|
Update cache.ts
|
2025-01-03 22:55:38 -03:00 |
|
Nicolas
|
6bfdbcdba9
|
Merge pull request #1037 from mendableai/nsc/semantic-index-extract
WIP Indexes, Caching for /extract, Improvements
|
2025-01-03 22:52:22 -03:00 |
|
Nicolas
|
c655c6859f
|
Nick: fixed
|
2025-01-03 22:50:53 -03:00 |
|
Nicolas
|
a4f7c38834
|
Nick: fixed
|
2025-01-03 22:15:23 -03:00 |
|
Nicolas
|
8df1c67961
|
Update queue-worker.ts
|
2025-01-03 21:48:28 -03:00 |
|
Nicolas
|
499479c85e
|
Update url-processor.ts
|
2025-01-03 21:28:52 -03:00 |
|
Nicolas
|
432b410678
|
Update queue-worker.ts
|
2025-01-03 21:26:05 -03:00 |
|
Nicolas
|
6b2e1cbb28
|
Nick: cache /extract scrapes
|
2025-01-03 21:19:40 -03:00 |
|
Nicolas
|
27457ed5db
|
Nick: init
|
2025-01-03 20:44:27 -03:00 |
|
Nicolas
|
81cf05885b
|
Merge branch 'main' into nsc/semantic-index-extract
|
2025-01-03 19:57:29 -03:00 |
|
Rafael Miller
|
a54a5dbb45
|
removed warnings
|
2025-01-03 17:13:34 -03:00 |
|
Rafael Miller
|
12cd9f083c
|
removed warnings
|
2025-01-03 17:12:30 -03:00 |
|
rafaelmmiller
|
b223f6ae68
|
Merge branch 'sdk/fixed-none-undefined-on-response' of https://github.com/mendableai/firecrawl into sdk/fixed-none-undefined-on-response
|
2025-01-03 13:56:46 -03:00 |
|
rafaelmmiller
|
55dad5ea13
|
fixed empty data with next causing infinite loop
|
2025-01-03 13:56:39 -03:00 |
|
Rafael Miller
|
2e53eb98ce
|
Merge branch 'main' into sdk/fixed-none-undefined-on-response
|
2025-01-03 13:27:52 -03:00 |
|
rafaelmmiller
|
bafcc008bc
|
[SDK] fixed none and undefined on response
|
2025-01-03 13:27:00 -03:00 |
|
Nicolas
|
87757d9b8e
|
Nick: fixed schemas on extract for node
|
2025-01-03 13:19:08 -03:00 |
|
Nicolas
|
ad49503f8a
|
Update search.ts
|
2025-01-02 21:15:47 -03:00 |
|
Nicolas
|
cbe0716439
|
Update search.ts
|
2025-01-02 21:13:24 -03:00 |
|