2761 Commits

Author SHA1 Message Date
Nicolas
6b9e65c4f6
(feat/extract) Refactor and Reranker improvements (#1100)
* Reapply "Nick: extract api reference"

This reverts commit 61d7ba76f76ce74e0d230f89a93436f29dc8d9df.

* Nick: refactor analyzer

* Nick: formatting

* Nick:

* Update extraction-service.ts

* Nick: fixes

* NIck:

* Nick: wip

* Nick: reverted to the old re-ranker

* Nick:

* Update extract-status.ts
2025-01-27 20:07:01 -03:00
rafaelmmiller
ad06cde422 Merge branch 'main' of https://github.com/mendableai/firecrawl 2025-01-27 14:31:18 -03:00
rafaelmmiller
c1a2981d59 default onlyMainContent=false for extract 2025-01-27 14:31:16 -03:00
Gergő Móricz
9d448d18d3 feat(v1): support cyrillic URLs 2025-01-27 16:39:40 +01:00
Gergő Móricz
8af4e4b8dd fix(html-transformer): preserve title tag 2025-01-27 16:13:24 +01:00
Nicolas
61d7ba76f7 Revert "Nick: extract api reference"
This reverts commit 522c5b35da7d5cd997aa5ebe2002a38ede7ace93.
2025-01-26 21:06:37 -03:00
Nicolas
522c5b35da Nick: extract api reference 2025-01-26 21:00:40 -03:00
Gergő Móricz
ce3c54d7c7 fix(html-transformer.test): add further images 2025-01-25 19:06:32 +01:00
Nicolas
cf17479626 Merge branch 'main' of https://github.com/mendableai/firecrawl 2025-01-25 15:03:15 -03:00
Nicolas
d8d159b268 Nick: 2025-01-25 15:03:09 -03:00
Gergő Móricz
eb22848eba feat(test/html-transformer): add test for absolute URLs 2025-01-25 19:02:52 +01:00
Gergő Móricz
f3982c0894 fix: adapt preview team checks 2025-01-25 19:02:32 +01:00
Gergő Móricz
4d8f4109b5 fix(rust): further select fixes 2025-01-25 18:48:40 +01:00
Nicolas
02caa72f02 Nick: added html-transformer unit tests 2025-01-25 14:28:09 -03:00
Nicolas
7fdecdc4d3 Nick: fixed include tags bug 2025-01-25 14:12:10 -03:00
Móricz Gergő
dacc5d4f45 fix(rust): improve 2025-01-25 12:59:14 +01:00
Móricz Gergő
4a1ab6f01c fix(rust): handle bad tok_1 2025-01-25 12:53:03 +01:00
Móricz Gergő
e8a6c1bb65 fix(rust): avoid panic always 2025-01-25 10:15:12 +01:00
Móricz Gergő
ce2c51f6c1 fix(rust): bad comp 2025-01-25 10:11:05 +01:00
Móricz Gergő
a2d94b525f feat: rewrite html transformer in rust 2025-01-25 09:41:33 +01:00
Móricz Gergő
9c40e0cc8d fix(v1): test override for team 2025-01-25 08:27:59 +01:00
Móricz Gergő
afea2eeaac feat(v1): add insufficient credits stuff 2025-01-25 08:16:19 +01:00
Nicolas
fa5544add8
Merge pull request #1090 from mendableai/nsc/new-re-rank
Re-ranker changes
2025-01-24 19:20:39 -03:00
Nicolas
4747c6f569 Update build-prompts.ts 2025-01-24 19:19:18 -03:00
Gergő Móricz
ca78739a48 fix(koffi): duplicate type name? 2025-01-24 22:56:43 +01:00
Nicolas
10133adcc6 Update reranker.ts 2025-01-24 18:35:36 -03:00
Nicolas
2c391b0105 Nick: 2025-01-24 18:09:25 -03:00
Gergő Móricz
b005450a34
port most of cheerio stuff to rust (#1089) 2025-01-24 22:04:54 +01:00
Nicolas
d547192f37 Nick: fixed spread schemas 2025-01-24 17:55:16 -03:00
Gergő Móricz
0d9c9f36b8 feat(queue-worker): add verbosity for lock extension 2025-01-24 19:35:25 +01:00
Gergő Móricz
ce1fe6f06a update bullmq 2025-01-24 18:56:03 +01:00
Nicolas
547c09c54c
Merge pull request #1087 from mendableai/docs/update-cancel-crawl-response
docs: update cancel crawl response
2025-01-24 13:34:13 -03:00
Ademílson Tonato
34e3911a97
docs: update cancel crawl response
- add cancel crawl event to requests.http
2025-01-24 16:16:17 +00:00
rafaelmmiller
3184e91f66 layers 2025-01-24 10:25:45 -03:00
rafaelmmiller
64d116540f rerank with lower threshold + back to map if lenght = 0 2025-01-24 09:08:16 -03:00
Móricz Gergő
05d79a875a fix(extract): oops 2025-01-24 11:55:41 +01:00
Móricz Gergő
4db9a4a675 fix(extraction-service): allow no multiEntityKeys if isMultiEntity is false 2025-01-24 11:33:49 +01:00
Móricz Gergő
0dddf4c055 fix(v1/extract): add job with explicit id 2025-01-24 11:03:04 +01:00
rafaelmmiller
f1cd891a70 added today to extract prompts 2025-01-23 17:14:45 -03:00
Gergő Móricz
a1efe33c8a fix(scrapeQueue): change expiry to 1 hour 2025-01-23 20:30:20 +01:00
Gergő Móricz
a7b56ab87c feat(crawl-status): same for v0 2025-01-23 19:39:33 +01:00
Gergő Móricz
95ce3c3b71 feat(crawl-status): allow for jobs to expire out of the redis 2025-01-23 19:33:43 +01:00
Gergő Móricz
6f696d32ae feat(extract): add log on 0 links 2025-01-23 19:25:12 +01:00
Gergő Móricz
5d56627bfa feat(extraction-service): highlight req schema generation 2025-01-23 19:24:24 +01:00
Móricz Gergő
9da51a7514 feat(extract): add original schema to logs 2025-01-23 14:59:54 +01:00
Móricz Gergő
561f0186ef fix build error 2025-01-23 12:07:37 +01:00
Móricz Gergő
6557365149 feat(sitemap): change sitemap logging 2025-01-23 12:06:50 +01:00
Móricz Gergő
d3518e85a8 feat(extract): add logging 2025-01-23 12:05:15 +01:00
Móricz Gergő
434a435a4b fix(sitemap): increase limit to 20 2025-01-23 11:29:49 +01:00
Móricz Gergő
1e28ba291e fix(sitemap): increase limit 2025-01-23 09:21:38 +01:00