Gergő Móricz 95f204aab7
Index (FIR-2177) (#1605)
* poc progress

* poc

* url splits and better url normalization

* feat(index): integrate into map

* fix on selfhost

* feat: modifiers

* separate index supa logic

* debug

* fix language comparison

* feat: dontStoreInCache

* feat(index): some rudimentary testing

* feat: use url split columns

* feat(queue-worker/kickoff): use index links to kickoff crawl

* feat(scrapeURL/index): behaviour on non-200 index entries

* feat/added benchmark for scrapes

* feat(map): ignoreIndex

* feat(index): batch insert

* fix(api/tests/scrape): fix index test to work with batching

* disable cacheable lookup for self hosting tests

* feat(js-sdk): dontStoreInCache

* chore(js-sdk): bump

* feat(index): FIRECRAWL_INDEX_WRITE_ONLY

* feat(api/test): index envs

* map benchmarks

* cleanup

* further fixes

* clean up on map

* remove extraneous log

* workflow test run

* asd

* improve fns

* try again

* wow i'm an idiot

* ok fixed

* wth

* revert

* async saving to index

* feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624)

* feat(selfhost): deploy a playwright image (#1625)

* Testing improvements (FIR-2209) (#1623)

* yeet ad blocking tests until further notice

* feat: re-enable billing tests

* more timeout

* cache issues with billing test

* weird thing

* fix(api/tests/scrape/status): propagation time

* stupid

* no log

* sws

---------

Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com>
2025-06-03 21:30:19 +02:00
..
2025-06-03 21:30:19 +02:00
2024-11-19 13:49:23 -08:00
2024-11-14 15:51:27 -03:00
2024-07-08 13:37:53 -04:00
2024-10-23 16:06:08 -03:00
2025-02-06 16:07:50 +01:00
2024-05-08 16:38:49 -07:00