* poc progress
* poc
* url splits and better url normalization
* feat(index): integrate into map
* fix on selfhost
* feat: modifiers
* separate index supa logic
* debug
* fix language comparison
* feat: dontStoreInCache
* feat(index): some rudimentary testing
* feat: use url split columns
* feat(queue-worker/kickoff): use index links to kickoff crawl
* feat(scrapeURL/index): behaviour on non-200 index entries
* feat/added benchmark for scrapes
* feat(map): ignoreIndex
* feat(index): batch insert
* fix(api/tests/scrape): fix index test to work with batching
* disable cacheable lookup for self hosting tests
* feat(js-sdk): dontStoreInCache
* chore(js-sdk): bump
* feat(index): FIRECRAWL_INDEX_WRITE_ONLY
* feat(api/test): index envs
* map benchmarks
* cleanup
* further fixes
* clean up on map
* remove extraneous log
* workflow test run
* asd
* improve fns
* try again
* wow i'm an idiot
* ok fixed
* wth
* revert
* async saving to index
* feat: enhance metadata extraction by including 'itemprop' attribute in HTML (#1624)
* feat(selfhost): deploy a playwright image (#1625)
* Testing improvements (FIR-2209) (#1623)
* yeet ad blocking tests until further notice
* feat: re-enable billing tests
* more timeout
* cache issues with billing test
* weird thing
* fix(api/tests/scrape/status): propagation time
* stupid
* no log
* sws
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com>
* Fix LLMs.txt cache bug with subdomains and add bypass option (#1519)
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Nick:
* Update LLMs.txt test file to use helper functions and concurrent tests
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Remove LLMs.txt test file as requested
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Change parameter name to 'cache' and keep 7-day expiration
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Update generate-llmstxt-supabase.ts
* Update JS and Python SDKs to include cache parameter
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Fix LLMs.txt cache implementation to use normalizeUrl and exact matching
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Revert "Fix LLMs.txt cache implementation to use normalizeUrl and exact matching"
This reverts commit d05b9964677b7b2384453329d2ac99d841467053.
* Nick:
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add change tracking support to Python and JS SDKs
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Replace test API keys with TEST_API_KEY placeholder
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Replace API keys with dummy values for testing
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Use environment variables for API keys in tests
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Move JS SDK test to correct location and add dependencies
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Remove old test file location
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Update test file to use TEST_API_KEY environment variable
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Update Python SDK test to use TEST_API_KEY environment variable
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Update package.json
* Update __init__.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nick@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add git-diff support to change tracking format
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Fix type issues with parse-diff library
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Fix parse-diff type definitions to match actual library structure
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Add structured output/prompt support to change tracking
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* (feat/change-tracking) Change Tracking Modes (#1447)
* Refactor change tracking to use modes array instead of separate formats
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Implement schema-based change tracking with old/new value comparison
Co-Authored-By: Nicolas Camara <nick@sideguide.dev>
* Nick:
* Nick: .json
* Update diff.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nick@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Update index.ts
* Update types.ts
* Update diff.ts
* Update scrape.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nick@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* feat(crawl): includes/excludes fixes pt. 1
* fix(snips): billing tests
* drop tha logs
* fix(ci): add replica url
* feat(crawl): drop initial scrape if it's not included
* feat(ci): more verbose logging
* fix crawl path in test
* fix(ci): wait for api
* fix(snips/scrape/ad): test for more pixels
* feat(js-sdk/crawl): add regexOnFullURL