rafaelmmiller
014a99ef91
map benchmarks
2025-06-02 13:38:43 -03:00
Gergő Móricz
8b864345e3
feat(api/test): index envs
2025-06-02 18:07:38 +02:00
Gergő Móricz
b9dc3e738e
feat(index): FIRECRAWL_INDEX_WRITE_ONLY
2025-06-02 18:00:47 +02:00
Gergő Móricz
b3eecdc81b
chore(js-sdk): bump
2025-06-02 17:57:47 +02:00
Gergő Móricz
297d783585
feat(js-sdk): dontStoreInCache
2025-06-02 17:52:46 +02:00
Gergő Móricz
b2aeb99dd4
disable cacheable lookup for self hosting tests
2025-06-02 17:45:24 +02:00
Gergő Móricz
dceca07837
fix(api/tests/scrape): fix index test to work with batching
2025-06-02 17:41:45 +02:00
Gergő Móricz
18a7462fea
feat(index): batch insert
2025-06-02 17:07:25 +02:00
Gergő Móricz
369a8f6050
feat(map): ignoreIndex
2025-06-01 11:51:36 +02:00
rafaelmmiller
22c7685239
feat/added benchmark for scrapes
2025-05-30 18:38:20 -03:00
Gergő Móricz
99d3db743d
feat(scrapeURL/index): behaviour on non-200 index entries
2025-05-30 15:14:16 +02:00
Gergő Móricz
8c250426b3
feat(queue-worker/kickoff): use index links to kickoff crawl
2025-05-30 14:16:49 +02:00
Gergő Móricz
96c753f9a9
feat: use url split columns
2025-05-30 13:56:28 +02:00
Gergő Móricz
2fd31174fb
Merge branch 'main' into mog/index
2025-05-29 15:28:57 +02:00
Gergő Móricz
38c96b524f
feat(scrapeURL): handle contentType JSON better in markdown conversion ( #1604 )
2025-05-29 15:26:07 +02:00
Gergő Móricz
777a959e02
feat(index): some rudimentary testing
2025-05-29 15:25:42 +02:00
Gergő Móricz
5b0f2da02f
feat: dontStoreInCache
2025-05-29 15:23:24 +02:00
Gergő Móricz
7e73b01599
fix(queue-worker): call webhook after job is in DB
2025-05-29 14:40:47 +02:00
Gergő Móricz
706d378a89
feat(api/v1/scrape-status): log supa lookup errors
2025-05-29 13:02:54 +02:00
Gergő Móricz
853ff1875c
fix language comparison
2025-05-29 12:52:18 +02:00
Gergő Móricz
3557c90210
feat(js-sdk): auto mode proxy (FIR-2145) ( #1602 )
...
* feat(js-sdk): auto mode proxy
* Nick: py sdk
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-05-28 14:31:48 -03:00
Gergő Móricz
3e1e069e01
debug
2025-05-28 17:17:17 +02:00
Gergő Móricz
6256d7a0fe
separate index supa logic
2025-05-28 17:06:14 +02:00
Gergő Móricz
9a04a40f66
feat: modifiers
2025-05-28 16:54:41 +02:00
Gergő Móricz
60f2ea7a43
fix on selfhost
2025-05-28 14:48:11 +02:00
Gergő Móricz
d904ec676f
feat(index): integrate into map
2025-05-28 14:47:38 +02:00
Gergő Móricz
550d6bf7bd
url splits and better url normalization
2025-05-28 14:22:08 +02:00
Gergő Móricz
1d733d169a
poc
2025-05-28 13:55:42 +02:00
Gergő Móricz
a5efff07f9
feat(apps/api): add support for a separate, non-eviction Redis ( #1600 )
...
* feat(apps/api): add support for a separate, non-eviction Redis
* fix: misimport
2025-05-28 09:58:04 +02:00
Nicolas
756b452a01
Update batch_billing.ts
2025-05-27 19:05:00 -03:00
Nicolas
299e3e29e0
Update batch_billing.ts
2025-05-27 18:44:24 -03:00
Gergő Móricz
a36c6a4f40
feat(scrapeURL): add unnormalizedSourceURL for url matching DX (FIR-2137) ( #1601 )
...
* feat(scrapeURL): add unnormalizedSourceURL for url matching DX
* fix(tests): fixc
2025-05-27 21:33:44 +02:00
Gergő Móricz
8fc02f8604
poc progress
2025-05-27 21:03:45 +02:00
Gergő Móricz
474e5a0543
fix(crawler): always set expiry on sitemap links in redis
2025-05-27 15:39:31 +02:00
Gergő Móricz
c3738063cf
less logs even more
2025-05-25 15:50:20 +02:00
Gergő Móricz
492d97e889
reduce logging
2025-05-24 00:09:13 +02:00
Gergő Móricz
a3145ccacc
fix(extract-status): be able to get extract status even after TTL lapses ( #1599 )
2025-05-23 22:33:09 +02:00
Gergő Móricz
8389a1a78d
fix(html-transformer): bad outName for og:locale:alternate (FIR-2101) ( #1597 )
...
* fix(html-transformer): bad outName for og:locale:alternate
* oops
2025-05-23 17:10:09 +02:00
Gergő Móricz
3ec17e2d1a
fix(v1): avoid overwriting rateLimiterMode with FIRE-1 rate limiter ( #1593 )
2025-05-23 11:50:59 -03:00
Gergő Móricz
3df687e4db
feat(queue-worker/afterJobDone): improved ccq insert logic ( #1595 )
2025-05-23 11:50:14 -03:00
Gergő Móricz
a7894a2714
fix(scrapeURL/pdf): even better timeout detection
2025-05-23 16:29:28 +02:00
Gergő Móricz
8571b5a99d
Revert "feat(queue-worker/afterJobDone): improved ccq insert logic"
...
This reverts commit 97c635676d228ed1342cdd1468cb2a1aef4fcfc9.
2025-05-23 15:42:15 +02:00
Gergő Móricz
97c635676d
feat(queue-worker/afterJobDone): improved ccq insert logic
2025-05-23 15:41:57 +02:00
Gergő Móricz
f41af8241e
fix(scrapeURL/pdf): better timeout error
2025-05-23 13:59:53 +02:00
Gergő Móricz
bfe731309c
fix(scrapeURL/pdf/mu): remove log
2025-05-23 13:47:34 +02:00
Gergő Móricz
b03670a8b7
feat: parse PDFs on fc side and reject if too long for timeout (FIR-2083) ( #1592 )
...
* feat: pdf-parser, implementation in scrapeURL
* use pdf-parser for page count instead of mu
* fix(pdf-parser): bindings
* feat(scrapeURL/pdf): adjust MILLISECONDS_PER_PAGE
* implement post-runsync polling and fix
* fix(Dockerfile): copy in the pdf-parser source code
* fix(scrapeURL/pdf): better error for timeout below 0
2025-05-23 13:45:53 +02:00
Gergő Móricz
321fff1695
ok what
2025-05-23 11:41:34 +02:00
Gergő Móricz
00cc733972
more logs
2025-05-23 11:29:34 +02:00
Gergő Móricz
bb67b9812b
check if enum is being overwritten somehow
2025-05-23 11:27:49 +02:00
Gergő Móricz
d4e7bde03d
add stack
2025-05-23 10:18:30 +02:00