3512 Commits

Author SHA1 Message Date
Gergő Móricz
e1a593d162
Merge branch 'main' into mog/index 2025-06-03 21:17:29 +02:00
Gergő Móricz
6a1d284fcf Testing improvements (FIR-2209) (#1623)
* yeet ad blocking tests until further notice

* feat: re-enable billing tests

* more timeout

* cache issues with billing test

* weird thing

* fix(api/tests/scrape/status): propagation time

* stupid

* no log

* sws
2025-06-03 21:17:09 +02:00
Gergő Móricz
0d3d18be65 feat(selfhost): deploy a playwright image (#1625) 2025-06-03 21:17:09 +02:00
Ademílson Tonato
e89ecc4e4a feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624) 2025-06-03 21:17:09 +02:00
Gergő Móricz
406d696667
Testing improvements (FIR-2209) (#1623)
* yeet ad blocking tests until further notice

* feat: re-enable billing tests

* more timeout

* cache issues with billing test

* weird thing

* fix(api/tests/scrape/status): propagation time

* stupid

* no log

* sws
2025-06-03 21:16:36 +02:00
Gergő Móricz
60525220a2 async saving to index 2025-06-03 21:16:13 +02:00
Gergő Móricz
e297cf8a0d
feat(selfhost): deploy a playwright image (#1625) 2025-06-03 19:19:08 +02:00
Ademílson Tonato
41897139da
feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624) 2025-06-03 18:16:46 +02:00
Gergő Móricz
d1b5e2ef47 revert 2025-06-03 17:02:27 +02:00
Gergő Móricz
1b3f037a26 wth 2025-06-03 16:55:39 +02:00
Gergő Móricz
ede7aec1f9 ok fixed 2025-06-03 16:47:32 +02:00
Gergő Móricz
4e5feca3dd wow i'm an idiot 2025-06-03 16:41:28 +02:00
Gergő Móricz
71271cc4b8 try again 2025-06-03 16:38:40 +02:00
Gergő Móricz
c75fad5e79 improve fns 2025-06-03 16:33:14 +02:00
Gergő Móricz
6ba57306c3 asd 2025-06-03 16:26:43 +02:00
Gergő Móricz
37d1de09f3 workflow test run 2025-06-03 16:25:02 +02:00
Gergő Móricz
2fe35a4e3d remove extraneous log 2025-06-03 16:24:02 +02:00
Gergő Móricz
39dd721781 clean up on map 2025-06-03 16:22:07 +02:00
Gergő Móricz
7426e54e6c further fixes 2025-06-03 16:12:31 +02:00
Gergő Móricz
d7fef33224
Merge branch 'main' into mog/index 2025-06-03 16:09:57 +02:00
Gergő Móricz
da9a9b0d19 cleanup 2025-06-03 16:07:59 +02:00
Nicolas
e108ff3525 Update search.ts 2025-06-02 23:46:55 -03:00
Nicolas
9347de6a41 Update scrape.ts 2025-06-02 23:15:59 -03:00
Nicolas
86a9d3525b Update queue-jobs.ts 2025-06-02 23:09:09 -03:00
Nicolas
cbc47305cc Update search.ts 2025-06-02 23:09:02 -03:00
Nicolas
ce425d966f Merge branch 'nsc/bypass-billing-internal' 2025-06-02 22:37:56 -03:00
Nicolas
8c661f5329 Update scrape.ts 2025-06-02 22:37:49 -03:00
Nicolas
dc8cc99b1d
Nick: bypass billing (#1622) 2025-06-02 21:57:28 -03:00
Nicolas
8967b31465 Nick: bypass billing 2025-06-02 21:51:46 -03:00
Nicolas
bf919ceb82 Nick: __searchPreviewToken 2025-06-02 21:16:34 -03:00
Nicolas
ef789ce8d7 Nick: __experimental 2025-06-02 19:58:56 -03:00
Gergő Móricz
72be73473f
feat(api/scrape): credits_billed column + handle billing for /scrape calls on worker side with stricter timeout enforcement (FIR-2162) (#1607)
* feat(api/scrape): stricten timeout and handle billing and logging on queue-worker

* fix: abortsignal pre-check

* fix: proper level

* add comment to clarify is_scrape

* reenable billing tests

* Revert "reenable billing tests"

This reverts commit 98236fdfa03dde8cecdd6b763fcf86810e468a28.

* oof

* fix searxng logging

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-06-02 17:56:27 -03:00
Gergő Móricz
4167ec53eb
fix(scrapeURL): only allow disabling the adblock on playwright (FIR-2200) (#1616)
* fix(scrapeURL): only allow disabling the adblock on playwright

* feat(api/tests/scrape): re-enable ad blocking tests
2025-06-02 22:48:16 +02:00
Gergő Móricz
7a8be13220 remove indexes that are no longer used 2025-06-02 22:09:55 +02:00
Gergő Móricz
98ceda9bd5
feat(search): ignore concurrency limit for search (FIR-2187) (#1617)
* feat(search): ignore concurrency limit for search (temp)

* feat(search): only for low tier users for good DX
2025-06-02 17:07:44 -03:00
rafaelmmiller
014a99ef91 map benchmarks 2025-06-02 13:38:43 -03:00
Gergő Móricz
1396451d31 bump rust version pt.2 2025-06-02 18:10:14 +02:00
Gergő Móricz
07fb651a91 bump rust version 2025-06-02 18:09:12 +02:00
Supasin Liulak
6a76ccfacb
webhook param for crawl (#1609) 2025-06-02 18:08:32 +02:00
Gergő Móricz
8b864345e3 feat(api/test): index envs 2025-06-02 18:07:38 +02:00
Gergő Móricz
b9dc3e738e feat(index): FIRECRAWL_INDEX_WRITE_ONLY 2025-06-02 18:00:47 +02:00
Gergő Móricz
b3eecdc81b chore(js-sdk): bump 2025-06-02 17:57:47 +02:00
Gergő Móricz
297d783585 feat(js-sdk): dontStoreInCache 2025-06-02 17:52:46 +02:00
Gergő Móricz
b2aeb99dd4 disable cacheable lookup for self hosting tests 2025-06-02 17:45:24 +02:00
Gergő Móricz
dceca07837 fix(api/tests/scrape): fix index test to work with batching 2025-06-02 17:41:45 +02:00
Gergő Móricz
18a7462fea feat(index): batch insert 2025-06-02 17:07:25 +02:00
Gergő Móricz
369a8f6050 feat(map): ignoreIndex 2025-06-01 11:51:36 +02:00
rafaelmmiller
22c7685239 feat/added benchmark for scrapes 2025-05-30 18:38:20 -03:00
Gergő Móricz
99d3db743d feat(scrapeURL/index): behaviour on non-200 index entries 2025-05-30 15:14:16 +02:00
Gergő Móricz
8c250426b3 feat(queue-worker/kickoff): use index links to kickoff crawl 2025-05-30 14:16:49 +02:00