Gergő Móricz
|
5c62bb1195
|
feat: new snips test framework (FIR-414) (#1033)
* feat: new snips test framework
* Update mock.ts
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
|
2025-01-13 20:50:47 +01:00 |
|
Nicolas
|
f4d10c5031
|
Nick: formatting fixes
|
2025-01-10 18:35:10 -03:00 |
|
Gergő Móricz
|
d1f3b96388
|
feat: add scrapeId in document.metadata
|
2025-01-09 20:52:12 +01:00 |
|
Gergő Móricz
|
97bf54214f
|
fix(scrapeURL/loop): re-add is long enough check with lt 0
|
2025-01-09 18:43:50 +01:00 |
|
Móricz Gergő
|
3c614a2e5c
|
fix(scrapeURL/engines/pdf,docx): support authorization
|
2025-01-09 10:03:27 +01:00 |
|
Móricz Gergő
|
363021ea78
|
feat(crawl): ensure url trimming
|
2025-01-08 12:35:42 +01:00 |
|
Móricz Gergő
|
977a3e13c5
|
fix(scrapeURL): remove short content check
|
2025-01-08 11:23:25 +01:00 |
|
Nicolas
|
aef040b41e
|
Nick: from cache fixes
|
2025-01-03 23:07:15 -03:00 |
|
Nicolas
|
e8a9d8ddcd
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2025-01-03 22:55:42 -03:00 |
|
Nicolas
|
05e845a971
|
Update cache.ts
|
2025-01-03 22:55:38 -03:00 |
|
Nicolas
|
a4f7c38834
|
Nick: fixed
|
2025-01-03 22:15:23 -03:00 |
|
Nicolas
|
6b2e1cbb28
|
Nick: cache /extract scrapes
|
2025-01-03 21:19:40 -03:00 |
|
Nicolas
|
c9d91af86f
|
Merge branch 'main' into nsc/semantic-index-extract
|
2025-01-02 15:26:40 -03:00 |
|
Nicolas
|
c3fd13a82b
|
Nick: fixed re-ranker and enabled url cache of 2hrs
|
2024-12-31 18:06:07 -03:00 |
|
Nicolas
|
07f4b714af
|
Update removeUnwantedElements.ts
|
2024-12-31 15:23:02 -03:00 |
|
Nicolas
|
e6da214aeb
|
Nick: async background index
|
2024-12-30 21:42:01 -03:00 |
|
Gergő Móricz
|
4d1f92f4c8
|
fix(scrapeURL/fetch): block loopback and link-local IPs
|
2024-12-29 17:35:14 +01:00 |
|
Nicolas
|
e255301005
|
Update index.ts
|
2024-12-27 21:31:29 -03:00 |
|
Nicolas
|
1eca61bffb
|
Update index.ts
|
2024-12-27 20:59:18 -03:00 |
|
Nicolas
|
f9d55efba8
|
Update index.ts
|
2024-12-27 20:54:26 -03:00 |
|
Nicolas
|
b8d7f9f257
|
Nick: we are using runpod
|
2024-12-27 19:59:05 -03:00 |
|
Nicolas
|
5fcf3fa97e
|
Merge branch 'main' into mog/mineru
|
2024-12-27 19:53:09 -03:00 |
|
Nicolas
|
05d5f84d87
|
Merge pull request #1018 from mendableai/feat/add-favicon-metadata
[FIR-37] feat: extract and return favicon URL during scraping
|
2024-12-27 17:44:03 -03:00 |
|
Nicolas
|
eba5fda9a1
|
Merge pull request #955 from mendableai/rafa/fix-default-on-schema-llm-extract
fixed optional+default bug on llm schema
|
2024-12-27 16:33:04 -03:00 |
|
Ademílson F. Tonato
|
a4cf814f70
|
feat: return favicon url when scraping
|
2024-12-27 19:18:53 +00:00 |
|
Gergő Móricz
|
4772951313
|
feat(scrapeURL/fire-engine): explicitly delete job after scrape
|
2024-12-27 16:44:41 +01:00 |
|
Gergő Móricz
|
0b55fb836b
|
feat(scrapeURL/pdf): switch to MinerU
|
2024-12-27 16:37:32 +01:00 |
|
Gergő Móricz
|
c543f4f76c
|
feat(scrapeURL/pdf): update mock Blob implementation to pass TypeScript
|
2024-12-26 20:31:51 +01:00 |
|
Gergő Móricz
|
f15ef0e758
|
feat(scrapeURL/fire-engine/chrome-cdp): handle file downloads
|
2024-12-26 20:29:09 +01:00 |
|
Nicolas
|
ed24853ca6
|
Merge pull request #996 from mendableai/fix/title-extra-info
[BUG] fixed title extra info
|
2024-12-19 16:05:49 -03:00 |
|
Gergő Móricz
|
071b9a01c3
|
fix(scrapeURL/fire-engine): pass geolocation
|
2024-12-19 18:23:21 +01:00 |
|
rafaelmmiller
|
cf2ec77131
|
fixed title extra info
|
2024-12-19 08:32:10 -03:00 |
|
Nicolas
|
066071cd54
|
Update llmExtract.ts
|
2024-12-18 23:45:43 -03:00 |
|
Nicolas
|
e899ecbe44
|
Update llmExtract.ts
|
2024-12-18 16:52:05 -03:00 |
|
Nicolas
|
3b6edef9fa
|
chore: formatting
|
2024-12-17 16:58:57 -03:00 |
|
Eric Ciarla
|
a20a003c74
|
revert to pdf parse
|
2024-12-17 12:12:22 -05:00 |
|
Eric Ciarla
|
1402831a0a
|
Replace pdf parse with pdf to md
|
2024-12-17 09:59:52 -05:00 |
|
Eric Ciarla
|
ed7d15d2af
|
Update index.ts
|
2024-12-17 09:50:29 -05:00 |
|
Gergő Móricz
|
654d6c6e0b
|
fix(scrapeURL): increase timeToRun
|
2024-12-17 13:21:24 +01:00 |
|
Gergő Móricz
|
47b968fede
|
fix(scrapeURL/fire-engine): timeout calculation issues
|
2024-12-17 13:17:55 +01:00 |
|
Gergő Móricz
|
7f57c868be
|
Revert "fix(scrapeURL): better timeToRun distribution"
This reverts commit 284a6ccedd1baede825571ee933eb7e4f773e2de.
|
2024-12-16 23:08:20 +01:00 |
|
Gergő Móricz
|
284a6ccedd
|
fix(scrapeURL): better timeToRun distribution
|
2024-12-16 23:01:34 +01:00 |
|
rafaelmmiller
|
b6802bc443
|
merged with main
|
2024-12-16 11:41:59 -03:00 |
|
Rafael Miller
|
8192d756e9
|
Merge branch 'main' into rafa/fix-default-on-schema-llm-extract
|
2024-12-16 09:33:36 -03:00 |
|
rafaelmmiller
|
eab30c474b
|
added unit tests
|
2024-12-16 09:30:40 -03:00 |
|
Nicolas
|
1214d219e1
|
Nick: fix actions errors
|
2024-12-15 15:43:12 -03:00 |
|
Gergő Móricz
|
0f3a27bf27
|
fix(scrapeURL/engines): better timeouts
|
2024-12-15 18:58:29 +01:00 |
|
Nicolas
|
a5256827c0
|
Update index.ts
|
2024-12-15 14:36:09 -03:00 |
|
Gergő Móricz
|
b4a5e1a6e9
|
fix(scrapeURL/fire-engine): timeout handling
|
2024-12-15 16:04:17 +01:00 |
|
Gergő Móricz
|
afbd01299a
|
fix(scrapeURL/fire-engine): timeouts
|
2024-12-15 15:58:27 +01:00 |
|