mirror of
https://git.mirrors.martin98.com/https://github.com/mendableai/firecrawl
synced 2025-06-20 09:02:11 +08:00

* poc progress * poc * url splits and better url normalization * feat(index): integrate into map * fix on selfhost * feat: modifiers * separate index supa logic * debug * fix language comparison * feat: dontStoreInCache * feat(index): some rudimentary testing * feat: use url split columns * feat(queue-worker/kickoff): use index links to kickoff crawl * feat(scrapeURL/index): behaviour on non-200 index entries * feat/added benchmark for scrapes * feat(map): ignoreIndex * feat(index): batch insert * fix(api/tests/scrape): fix index test to work with batching * disable cacheable lookup for self hosting tests * feat(js-sdk): dontStoreInCache * chore(js-sdk): bump * feat(index): FIRECRAWL_INDEX_WRITE_ONLY * feat(api/test): index envs * map benchmarks * cleanup * further fixes * clean up on map * remove extraneous log * workflow test run * asd * improve fns * try again * wow i'm an idiot * ok fixed * wth * revert * async saving to index * feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624) * feat(selfhost): deploy a playwright image (#1625) * Testing improvements (FIR-2209) (#1623) * yeet ad blocking tests until further notice * feat: re-enable billing tests * more timeout * cache issues with billing test * weird thing * fix(api/tests/scrape/status): propagation time * stupid * no log * sws --------- Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com>