* poc progress
* poc
* url splits and better url normalization
* feat(index): integrate into map
* fix on selfhost
* feat: modifiers
* separate index supa logic
* debug
* fix language comparison
* feat: dontStoreInCache
* feat(index): some rudimentary testing
* feat: use url split columns
* feat(queue-worker/kickoff): use index links to kickoff crawl
* feat(scrapeURL/index): behaviour on non-200 index entries
* feat/added benchmark for scrapes
* feat(map): ignoreIndex
* feat(index): batch insert
* fix(api/tests/scrape): fix index test to work with batching
* disable cacheable lookup for self hosting tests
* feat(js-sdk): dontStoreInCache
* chore(js-sdk): bump
* feat(index): FIRECRAWL_INDEX_WRITE_ONLY
* feat(api/test): index envs
* map benchmarks
* cleanup
* further fixes
* clean up on map
* remove extraneous log
* workflow test run
* asd
* improve fns
* try again
* wow i'm an idiot
* ok fixed
* wth
* revert
* async saving to index
* feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624)
* feat(selfhost): deploy a playwright image (#1625)
* Testing improvements (FIR-2209) (#1623)
* yeet ad blocking tests until further notice
* feat: re-enable billing tests
* more timeout
* cache issues with billing test
* weird thing
* fix(api/tests/scrape/status): propagation time
* stupid
* no log
* sws
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com>
* yeet ad blocking tests until further notice
* feat: re-enable billing tests
* more timeout
* cache issues with billing test
* weird thing
* fix(api/tests/scrape/status): propagation time
* stupid
* no log
* sws
* feat(scrapeURL/pdf): bill n credits per page
* Update scrape.ts
* Update queue-worker.ts
* separate billing logi
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* feat: pdf-parser, implementation in scrapeURL
* use pdf-parser for page count instead of mu
* fix(pdf-parser): bindings
* feat(scrapeURL/pdf): adjust MILLISECONDS_PER_PAGE
* implement post-runsync polling and fix
* fix(Dockerfile): copy in the pdf-parser source code
* fix(scrapeURL/pdf): better error for timeout below 0