3460 Commits

Author SHA1 Message Date
Móricz Gergő
977a3e13c5 fix(scrapeURL): remove short content check 2025-01-08 11:23:25 +01:00
Nicolas
0a41fdd35d Merge branch 'nsc/extract-queue' 2025-01-07 18:21:57 -03:00
Nicolas
7918d0e1c9 Nick: bump 1.12.0 2025-01-07 18:20:56 -03:00
Nicolas
f82a742cd1
Merge pull request #1044 from mendableai/nsc/extract-queue
(feat/extract) Move extract to a queue system
2025-01-07 18:10:46 -03:00
Nicolas
b98e289f03 Nick: 2025-01-07 17:49:21 -03:00
Nicolas
a185c05a5c Nick: sdk async and get status 2025-01-07 17:27:40 -03:00
Nicolas
9ec08d7020 Nick: fixed the sdks 2025-01-07 17:20:49 -03:00
Nicolas
dd14744850 Update types.ts 2025-01-07 16:55:55 -03:00
Nicolas
9fdcfb9314 Update index.ts 2025-01-07 16:24:46 -03:00
Nicolas
51636352a6 Merge branch 'nsc/extract-queue' of https://github.com/mendableai/firecrawl into nsc/extract-queue 2025-01-07 16:21:58 -03:00
Nicolas
11af214db1 Nick: update extract in case there is an error 2025-01-07 16:21:51 -03:00
Gergő Móricz
1f2a76fc23
Update apps/api/src/lib/extract/extraction-service.ts 2025-01-07 20:18:10 +01:00
Nicolas
eb254547e5 Nick: 2025-01-07 16:16:01 -03:00
Gergő Móricz
c6a63793bb crawl incomplete issues 2025-01-07 19:38:17 +01:00
Gergő Móricz
ccfada98ca various queue fixes 2025-01-07 19:15:23 +01:00
Nicolas
86e34d7c6c Nick: wip 2025-01-07 12:13:12 -03:00
Móricz Gergő
7a03275575 add comment 2025-01-07 13:57:47 +01:00
Móricz Gergő
7d73ebdbf1 fix(crawl): never invalidate first crawl scrape if redirects 2025-01-07 13:57:23 +01:00
Móricz Gergő
b96b97ed72 fix(crawl): don't push rawhtml to db unless requested 2025-01-07 10:09:15 +01:00
Móricz Gergő
35d1d85978 fix(crawler): also take the hostname of the base url when determining isInternalLink 2025-01-07 09:29:58 +01:00
Nicolas
bb27594443 Merge branch 'main' into nsc/extract-queue 2025-01-06 13:01:15 -03:00
Nicolas
b82cfa8540
Merge pull request #1038 from 1101-1/add_actual_random_useragent
feat: use new random user agent instead of the old one
2025-01-06 11:51:15 -03:00
Kirill
736c3675b6 use new agent generation instead of expired one 2025-01-05 17:07:14 +04:00
Nicolas
ceb2104960
Merge pull request #1034 from mendableai/sdk/fixed-none-undefined-on-response
[SDK] fixed none and undefined on response
2025-01-04 16:31:41 -03:00
Gergő Móricz
461842fe8c fix(v1/crawl-status): handle job's returnvalue being explicitly null (db race) 2025-01-04 17:24:33 +01:00
Gergő Móricz
b92a4eb79b fix(queue-worker): only do redirect handling logic on crawls, not batch scrape 2025-01-04 16:59:35 +01:00
Nicolas
d48ddb8820 Update canonical-url.test.ts 2025-01-03 23:55:05 -03:00
Nicolas
f2e0bfbfe3 Nick: url normalization 2025-01-03 23:54:03 -03:00
Nicolas
f25c0c6d21 Nick: added canonical tests 2025-01-03 23:16:33 -03:00
Nicolas
aef040b41e Nick: from cache fixes 2025-01-03 23:07:15 -03:00
Nicolas
e8a9d8ddcd Merge branch 'main' of https://github.com/mendableai/firecrawl 2025-01-03 22:55:42 -03:00
Nicolas
05e845a971 Update cache.ts 2025-01-03 22:55:38 -03:00
Nicolas
6bfdbcdba9
Merge pull request #1037 from mendableai/nsc/semantic-index-extract
WIP Indexes, Caching for /extract, Improvements
2025-01-03 22:52:22 -03:00
Nicolas
c655c6859f Nick: fixed 2025-01-03 22:50:53 -03:00
Nicolas
a4f7c38834 Nick: fixed 2025-01-03 22:15:23 -03:00
Nicolas
8df1c67961 Update queue-worker.ts 2025-01-03 21:48:28 -03:00
Nicolas
499479c85e Update url-processor.ts 2025-01-03 21:28:52 -03:00
Nicolas
432b410678 Update queue-worker.ts 2025-01-03 21:26:05 -03:00
Nicolas
6b2e1cbb28 Nick: cache /extract scrapes 2025-01-03 21:19:40 -03:00
Nicolas
27457ed5db Nick: init 2025-01-03 20:44:27 -03:00
Nicolas
81cf05885b Merge branch 'main' into nsc/semantic-index-extract 2025-01-03 19:57:29 -03:00
Rafael Miller
a54a5dbb45
removed warnings 2025-01-03 17:13:34 -03:00
Rafael Miller
12cd9f083c
removed warnings 2025-01-03 17:12:30 -03:00
rafaelmmiller
b223f6ae68 Merge branch 'sdk/fixed-none-undefined-on-response' of https://github.com/mendableai/firecrawl into sdk/fixed-none-undefined-on-response 2025-01-03 13:56:46 -03:00
rafaelmmiller
55dad5ea13 fixed empty data with next causing infinite loop 2025-01-03 13:56:39 -03:00
Rafael Miller
2e53eb98ce
Merge branch 'main' into sdk/fixed-none-undefined-on-response 2025-01-03 13:27:52 -03:00
rafaelmmiller
bafcc008bc [SDK] fixed none and undefined on response 2025-01-03 13:27:00 -03:00
Nicolas
87757d9b8e Nick: fixed schemas on extract for node 2025-01-03 13:19:08 -03:00
Nicolas
ad49503f8a Update search.ts 2025-01-02 21:15:47 -03:00
Nicolas
cbe0716439 Update search.ts 2025-01-02 21:13:24 -03:00