Nicolas
|
6b9e65c4f6
|
(feat/extract) Refactor and Reranker improvements (#1100)
* Reapply "Nick: extract api reference"
This reverts commit 61d7ba76f76ce74e0d230f89a93436f29dc8d9df.
* Nick: refactor analyzer
* Nick: formatting
* Nick:
* Update extraction-service.ts
* Nick: fixes
* NIck:
* Nick: wip
* Nick: reverted to the old re-ranker
* Nick:
* Update extract-status.ts
|
2025-01-27 20:07:01 -03:00 |
|
rafaelmmiller
|
ad06cde422
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2025-01-27 14:31:18 -03:00 |
|
rafaelmmiller
|
c1a2981d59
|
default onlyMainContent=false for extract
|
2025-01-27 14:31:16 -03:00 |
|
Gergő Móricz
|
9d448d18d3
|
feat(v1): support cyrillic URLs
|
2025-01-27 16:39:40 +01:00 |
|
Gergő Móricz
|
8af4e4b8dd
|
fix(html-transformer): preserve title tag
|
2025-01-27 16:13:24 +01:00 |
|
Nicolas
|
61d7ba76f7
|
Revert "Nick: extract api reference"
This reverts commit 522c5b35da7d5cd997aa5ebe2002a38ede7ace93.
|
2025-01-26 21:06:37 -03:00 |
|
Nicolas
|
522c5b35da
|
Nick: extract api reference
|
2025-01-26 21:00:40 -03:00 |
|
Gergő Móricz
|
ce3c54d7c7
|
fix(html-transformer.test): add further images
|
2025-01-25 19:06:32 +01:00 |
|
Nicolas
|
cf17479626
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2025-01-25 15:03:15 -03:00 |
|
Nicolas
|
d8d159b268
|
Nick:
|
2025-01-25 15:03:09 -03:00 |
|
Gergő Móricz
|
eb22848eba
|
feat(test/html-transformer): add test for absolute URLs
|
2025-01-25 19:02:52 +01:00 |
|
Gergő Móricz
|
f3982c0894
|
fix: adapt preview team checks
|
2025-01-25 19:02:32 +01:00 |
|
Gergő Móricz
|
4d8f4109b5
|
fix(rust): further select fixes
|
2025-01-25 18:48:40 +01:00 |
|
Nicolas
|
02caa72f02
|
Nick: added html-transformer unit tests
|
2025-01-25 14:28:09 -03:00 |
|
Nicolas
|
7fdecdc4d3
|
Nick: fixed include tags bug
|
2025-01-25 14:12:10 -03:00 |
|
Móricz Gergő
|
dacc5d4f45
|
fix(rust): improve
|
2025-01-25 12:59:14 +01:00 |
|
Móricz Gergő
|
4a1ab6f01c
|
fix(rust): handle bad tok_1
|
2025-01-25 12:53:03 +01:00 |
|
Móricz Gergő
|
e8a6c1bb65
|
fix(rust): avoid panic always
|
2025-01-25 10:15:12 +01:00 |
|
Móricz Gergő
|
ce2c51f6c1
|
fix(rust): bad comp
|
2025-01-25 10:11:05 +01:00 |
|
Móricz Gergő
|
a2d94b525f
|
feat: rewrite html transformer in rust
|
2025-01-25 09:41:33 +01:00 |
|
Móricz Gergő
|
9c40e0cc8d
|
fix(v1): test override for team
|
2025-01-25 08:27:59 +01:00 |
|
Móricz Gergő
|
afea2eeaac
|
feat(v1): add insufficient credits stuff
|
2025-01-25 08:16:19 +01:00 |
|
Nicolas
|
fa5544add8
|
Merge pull request #1090 from mendableai/nsc/new-re-rank
Re-ranker changes
|
2025-01-24 19:20:39 -03:00 |
|
Nicolas
|
4747c6f569
|
Update build-prompts.ts
|
2025-01-24 19:19:18 -03:00 |
|
Gergő Móricz
|
ca78739a48
|
fix(koffi): duplicate type name?
|
2025-01-24 22:56:43 +01:00 |
|
Nicolas
|
10133adcc6
|
Update reranker.ts
|
2025-01-24 18:35:36 -03:00 |
|
Nicolas
|
2c391b0105
|
Nick:
|
2025-01-24 18:09:25 -03:00 |
|
Gergő Móricz
|
b005450a34
|
port most of cheerio stuff to rust (#1089)
|
2025-01-24 22:04:54 +01:00 |
|
Nicolas
|
d547192f37
|
Nick: fixed spread schemas
|
2025-01-24 17:55:16 -03:00 |
|
Gergő Móricz
|
0d9c9f36b8
|
feat(queue-worker): add verbosity for lock extension
|
2025-01-24 19:35:25 +01:00 |
|
Gergő Móricz
|
ce1fe6f06a
|
update bullmq
|
2025-01-24 18:56:03 +01:00 |
|
Nicolas
|
547c09c54c
|
Merge pull request #1087 from mendableai/docs/update-cancel-crawl-response
docs: update cancel crawl response
|
2025-01-24 13:34:13 -03:00 |
|
Ademílson Tonato
|
34e3911a97
|
docs: update cancel crawl response
- add cancel crawl event to requests.http
|
2025-01-24 16:16:17 +00:00 |
|
rafaelmmiller
|
3184e91f66
|
layers
|
2025-01-24 10:25:45 -03:00 |
|
rafaelmmiller
|
64d116540f
|
rerank with lower threshold + back to map if lenght = 0
|
2025-01-24 09:08:16 -03:00 |
|
Móricz Gergő
|
05d79a875a
|
fix(extract): oops
|
2025-01-24 11:55:41 +01:00 |
|
Móricz Gergő
|
4db9a4a675
|
fix(extraction-service): allow no multiEntityKeys if isMultiEntity is false
|
2025-01-24 11:33:49 +01:00 |
|
Móricz Gergő
|
0dddf4c055
|
fix(v1/extract): add job with explicit id
|
2025-01-24 11:03:04 +01:00 |
|
rafaelmmiller
|
f1cd891a70
|
added today to extract prompts
|
2025-01-23 17:14:45 -03:00 |
|
Gergő Móricz
|
a1efe33c8a
|
fix(scrapeQueue): change expiry to 1 hour
|
2025-01-23 20:30:20 +01:00 |
|
Gergő Móricz
|
a7b56ab87c
|
feat(crawl-status): same for v0
|
2025-01-23 19:39:33 +01:00 |
|
Gergő Móricz
|
95ce3c3b71
|
feat(crawl-status): allow for jobs to expire out of the redis
|
2025-01-23 19:33:43 +01:00 |
|
Gergő Móricz
|
6f696d32ae
|
feat(extract): add log on 0 links
|
2025-01-23 19:25:12 +01:00 |
|
Gergő Móricz
|
5d56627bfa
|
feat(extraction-service): highlight req schema generation
|
2025-01-23 19:24:24 +01:00 |
|
Móricz Gergő
|
9da51a7514
|
feat(extract): add original schema to logs
|
2025-01-23 14:59:54 +01:00 |
|
Móricz Gergő
|
561f0186ef
|
fix build error
|
2025-01-23 12:07:37 +01:00 |
|
Móricz Gergő
|
6557365149
|
feat(sitemap): change sitemap logging
|
2025-01-23 12:06:50 +01:00 |
|
Móricz Gergő
|
d3518e85a8
|
feat(extract): add logging
|
2025-01-23 12:05:15 +01:00 |
|
Móricz Gergő
|
434a435a4b
|
fix(sitemap): increase limit to 20
|
2025-01-23 11:29:49 +01:00 |
|
Móricz Gergő
|
1e28ba291e
|
fix(sitemap): increase limit
|
2025-01-23 09:21:38 +01:00 |
|