* feat: pdf-parser, implementation in scrapeURL
* use pdf-parser for page count instead of mu
* fix(pdf-parser): bindings
* feat(scrapeURL/pdf): adjust MILLISECONDS_PER_PAGE
* implement post-runsync polling and fix
* fix(Dockerfile): copy in the pdf-parser source code
* fix(scrapeURL/pdf): better error for timeout below 0
* FIR-2006: Fix maxUrls and timeLimit enforcement in Deep Research API
Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>
* FIR-2006: Add tests for maxUrls and timeLimit enforcement
Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>
* FIR-2006: Replace mocked tests with end-to-end tests for deep research
Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>
* Delete apps/api/src/__tests__/snips/deep-research-service.test.ts
* Delete apps/api/src/__tests__/snips/lib.ts
* Revert "Delete apps/api/src/__tests__/snips/lib.ts"
This reverts commit a2af9baff89d64adc1930ea5b37b4f07f0735a67.
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nicolascamara29@gmail.com>
* feat(python-sdk/CrawlWatcher): remove max payload size from WebSocket
* Update __init__.py
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Fix: Concatenate metadata arrays into strings except for ogLocaleAlternate
Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>
* Fix: Only concatenate description field, preserve other metadata arrays
Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>
* Fix: Only concatenate description field, keep other metadata fields in original format
Co-Authored-By: Nicolas Camara <nicolascamara29@gmail.com>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Nicolas Camara <nicolascamara29@gmail.com>
* Fix LLMs.txt cache bug with subdomains and add bypass option (#1519)
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Nick:
* Update LLMs.txt test file to use helper functions and concurrent tests
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Remove LLMs.txt test file as requested
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Change parameter name to 'cache' and keep 7-day expiration
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Update generate-llmstxt-supabase.ts
* Update JS and Python SDKs to include cache parameter
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Fix LLMs.txt cache implementation to use normalizeUrl and exact matching
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Revert "Fix LLMs.txt cache implementation to use normalizeUrl and exact matching"
This reverts commit d05b9964677b7b2384453329d2ac99d841467053.
* Nick:
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* Add caching for RunPod PDF markdown results in GCS
Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>
* Update PDF caching to hash base64 directly and add metadata
Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>
* Fix PDF caching to directly hash content and fix test expectations
Co-Authored-By: thomas@sideguide.dev <thomas@sideguide.dev>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: thomas@sideguide.dev <thomas@sideguide.dev>
* feat(scrapeURL): use cacheableLookup
* feat(queue-worker): add cacheablelookup
* fix(cacheable-lookup): make it work with tailscale on local
* add devenv
* try again
* allow querying all
* log
* fixes
* asd
* fix:
* fix(lookup):
* lookup
* feat(scrapeURL): use cacheableLookup
* feat(queue-worker): add cacheablelookup
* fix(cacheable-lookup): make it work with tailscale on local
* add devenv
* try again
* allow querying all
* log
* fixes
* asd
* fix:
* fix(lookup):
* This fixes issue #1512 by making the milliseconds field optional in WaitAction and adding a validator to ensure exactly one of milliseconds or selector is provided.
Co-Authored-By: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
* Update firecrawl.py
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello+firecrawl@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>