Gergő Móricz
b03670a8b7
feat: parse PDFs on fc side and reject if too long for timeout (FIR-2083) ( #1592 )
...
* feat: pdf-parser, implementation in scrapeURL
* use pdf-parser for page count instead of mu
* fix(pdf-parser): bindings
* feat(scrapeURL/pdf): adjust MILLISECONDS_PER_PAGE
* implement post-runsync polling and fix
* fix(Dockerfile): copy in the pdf-parser source code
* fix(scrapeURL/pdf): better error for timeout below 0
2025-05-23 13:45:53 +02:00
Gergő Móricz
bd9673e104
Mog/cachable lookup ( #1560 )
...
* feat(scrapeURL): use cacheableLookup
* feat(queue-worker): add cacheablelookup
* fix(cacheable-lookup): make it work with tailscale on local
* add devenv
* try again
* allow querying all
* log
* fixes
* asd
* fix:
* fix(lookup):
* lookup
2025-05-16 15:44:52 +02:00
Gergő Móricz
d46ba95924
Revert "feat: use cacheable lookup everywhere ( #1559 )"
...
This reverts commit b8703b2a720765b92f5c4cab94cc90ea624198a8.
2025-05-16 15:31:06 +02:00
Gergő Móricz
b8703b2a72
feat: use cacheable lookup everywhere ( #1559 )
...
* feat(scrapeURL): use cacheableLookup
* feat(queue-worker): add cacheablelookup
* fix(cacheable-lookup): make it work with tailscale on local
* add devenv
* try again
* allow querying all
* log
* fixes
* asd
* fix:
* fix(lookup):
2025-05-16 15:27:24 +02:00
Nicolas
1c421f2d74
Nick: ( #1492 )
2025-04-22 21:42:37 -04:00
Nicolas
6634d236bf
(feat/fire-1) FIRE-1 ( #1462 )
...
* wip
* integrating smart-scrape
* integrate smartscrape into llmExtract
* wip
* smart scrape multiple links
* fixes
* fix
* wip
* it worked!
* wip. there's a bug on the batchExtract TypeError: Converting circular structure to JSON
* wip
* retry model
* retry models
* feat/scrape+json+extract interfaces ready
* vertex -> googleapi
* fix/transformArrayToObject. required params on schema is still a bug
* change model
* o3-mini -> gemini
* Update extractSmartScrape.ts
* sessionId
* sessionId
* Nick: f-0 start
* Update extraction-service-f0.ts
* Update types.ts
* Nick:
* Update queue-worker.ts
* Nick: new interface
* rename analyzeSchemaAndPrompt -> F0
* refactor: rename agent ID to model in types and extract logic
* agent
* id->model
* id->model
* refactor: standardize agent model handling and validation across extraction logic
* livecast agent
* (feat/f1) sdks (#1459 )
* feat: add FIRE-1 agent support to Python and JavaScript SDKs
Co-Authored-By: hello@sideguide.dev <hello@sideguide.dev>
* feat: add FIRE-1 agent support to scrape methods in both SDKs
Co-Authored-By: hello@sideguide.dev <hello@sideguide.dev>
* feat: add prompt and sessionId to AgentOptions interface
Co-Authored-By: hello@sideguide.dev <hello@sideguide.dev>
* Update index.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* feat(v1): rate limits
* Update types.ts
* Update llmExtract.ts
* add cost tracking
* remove
* Update requests.http
* fix smart scrape cost calc
* log sm cost
* fix counts
* fix
* expose cost tracking
* models fix
* temp: skipLibcheck
* get rid of it
* fix ts
* dont skip lib check
* Update extractSmartScrape.ts
* Update queue-worker.ts
* Update smartScrape.ts
* Update requests.http
* fix(rate-limiter):
* types: fire-1 refine
* bill 150
* fix credits used on crawl
* ban from crawl
* route cost limit warning
* Update generic-ai.ts
* genres
* Update llmExtract.ts
* test server diff
* cletu
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Thomas Kosmas <thomas510111@gmail.com>
Co-authored-by: Ademílson F. Tonato <ademilsonft@outlook.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello@sideguide.dev>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-04-15 00:19:45 -07:00
Gergő Móricz
3a8de846e3
read from GCS (again) ( #1433 )
...
* feat(crawl-status): retrieve job data from GCS
* feat(gcs-jobs/save): retrying saving metadata (might conflict)
* feat(gcs-jobs/save): retry save operation
* fix(gcs-jobs/save): respect metadata rules
* feat(crawl-status): log if gcs job is not found
* feat(ci/test/server): add gcs
2025-04-09 12:47:51 +02:00
Gergő Móricz
71b6b83ec2
tally rework api switchover ( #1328 )
...
* tally rework api switchover
* fix and send logs
* temp: force main instance while RPCs propagate
* Revert "temp: force main instance while RPCs propagate"
This reverts commit 4c93379cfa64efd60eb4767dd8eced1bdd302531.
2025-03-12 20:10:33 +01:00
Gergő Móricz
e1cfe1da48
feat(crawl): includes/excludes fixes (FIR-1300) ( #1303 )
...
* feat(crawl): includes/excludes fixes pt. 1
* fix(snips): billing tests
* drop tha logs
* fix(ci): add replica url
* feat(crawl): drop initial scrape if it's not included
* feat(ci): more verbose logging
* fix crawl path in test
* fix(ci): wait for api
* fix(snips/scrape/ad): test for more pixels
* feat(js-sdk/crawl): add regexOnFullURL
2025-03-06 17:05:15 +01:00
Gergő Móricz
9ad947884d
feat(tests/snips): add billing tests + misc billing fixes (FIR-1280) ( #1283 )
...
* feat(tests/snips): add billing tests + misc billing fixes
* add testing key
* asd
2025-03-02 16:51:42 -03:00
Gergő Móricz
387cc60668
fix(ci/test-server): clean up old envs
2025-02-20 15:06:37 +01:00
Gergő Móricz
04218de2b0
Revert "feat(ci): use pull_request_target (+ manual approval)"
...
This reverts commit 9142030881e0d153396279520e127b74af8417c9.
2025-02-20 10:58:08 +01:00
Gergő Móricz
9142030881
feat(ci): use pull_request_target (+ manual approval)
2025-02-20 10:52:29 +01:00
Gergő Móricz
bc5a16d048
feat(ci/test-server): build go markdown parser
2025-02-20 10:05:39 +01:00
Gergő Móricz
f4f75fe184
fix(ci): path to lock
2025-02-19 22:15:41 +01:00
Gergő Móricz
e9cb8ac956
feat(ci): caching improvements
2025-02-19 22:11:32 +01:00
Gergő Móricz
1a9f6b985a
feat(github/ci): improvements
2025-02-19 20:51:38 +01:00