rafaelmmiller
f92217e3b6
wip
2025-04-16 00:38:35 -07:00
Gergő Móricz
80b507e64e
correlate with eid
2025-04-15 23:06:13 -07:00
Gergő Móricz
512a2b1cd4
feat(extract): run on original links if reranker is weird
2025-04-15 22:57:20 -07:00
Gergő Móricz
0abe60085b
fix
2025-04-15 20:29:01 -07:00
Nicolas
6634d236bf
(feat/fire-1) FIRE-1 ( #1462 )
...
* wip
* integrating smart-scrape
* integrate smartscrape into llmExtract
* wip
* smart scrape multiple links
* fixes
* fix
* wip
* it worked!
* wip. there's a bug on the batchExtract TypeError: Converting circular structure to JSON
* wip
* retry model
* retry models
* feat/scrape+json+extract interfaces ready
* vertex -> googleapi
* fix/transformArrayToObject. required params on schema is still a bug
* change model
* o3-mini -> gemini
* Update extractSmartScrape.ts
* sessionId
* sessionId
* Nick: f-0 start
* Update extraction-service-f0.ts
* Update types.ts
* Nick:
* Update queue-worker.ts
* Nick: new interface
* rename analyzeSchemaAndPrompt -> F0
* refactor: rename agent ID to model in types and extract logic
* agent
* id->model
* id->model
* refactor: standardize agent model handling and validation across extraction logic
* livecast agent
* (feat/f1) sdks (#1459 )
* feat: add FIRE-1 agent support to Python and JavaScript SDKs
Co-Authored-By: hello@sideguide.dev <hello@sideguide.dev>
* feat: add FIRE-1 agent support to scrape methods in both SDKs
Co-Authored-By: hello@sideguide.dev <hello@sideguide.dev>
* feat: add prompt and sessionId to AgentOptions interface
Co-Authored-By: hello@sideguide.dev <hello@sideguide.dev>
* Update index.ts
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello@sideguide.dev>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
* feat(v1): rate limits
* Update types.ts
* Update llmExtract.ts
* add cost tracking
* remove
* Update requests.http
* fix smart scrape cost calc
* log sm cost
* fix counts
* fix
* expose cost tracking
* models fix
* temp: skipLibcheck
* get rid of it
* fix ts
* dont skip lib check
* Update extractSmartScrape.ts
* Update queue-worker.ts
* Update smartScrape.ts
* Update requests.http
* fix(rate-limiter):
* types: fire-1 refine
* bill 150
* fix credits used on crawl
* ban from crawl
* route cost limit warning
* Update generic-ai.ts
* genres
* Update llmExtract.ts
* test server diff
* cletu
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Thomas Kosmas <thomas510111@gmail.com>
Co-authored-by: Ademílson F. Tonato <ademilsonft@outlook.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: hello@sideguide.dev <hello@sideguide.dev>
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-04-15 00:19:45 -07:00
Gergő Móricz
6a10f0689d
ACUC: Dynamic Limits (FIR-1641) ( #1434 )
...
* extend acuc definition
* kill plan
* stuff
* stupid tests
* feat: better acuc
* feat(acuc): mock ACUC when not using db auth
2025-04-10 18:49:23 +02:00
Gergő Móricz
d3da790dc4
feat(extraction-service): teamId logging
2025-04-09 18:48:00 +02:00
Nicolas
20c93db43f
(feat/extract) URLs can now be optional in /extract ( #1346 )
...
* Nick: urls optional on extract
* Update index.ts
2025-03-16 22:29:25 -04:00
Nicolas
25d9bdb1f6
(feat/ai-sdk) Migrate to AI-SDK ( #1220 )
...
* Nick: init
* Update llmExtract.ts
* Update llmExtract.ts
* Nick rename
* fix(v1/types): extract json schema validation
* Update url-processor.ts
* feat(ai-sdk): ollama support
* feat(ai-sdk): further ollama support
* Nick: it is broken btw
* feat(ai-sdk): abstract model adapter
* Update pnpm-lock.yaml
* Update analyzeSchemaAndPrompt.ts
* Nick:
* feat(ai-sdk): ollama support
* doc(SELF_HOST): update with embedding param
* Nick:
* Update ranker.ts
* Nick:
* feat(ai-sdk): fixes
* Update llmExtract.ts
* feat: remove zod-to-json-schema
* fix
* Update llmExtract.ts
* use openai
* fixes
---------
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2025-02-20 22:48:58 +01:00
Gergő Móricz
2200f084f3
SELFHOST FIXES ( #1207 )
...
* fix(extract): construct OpenAI on demand
Fixes hard-crash if api key not specified in a self-hosting environment.
* fix(ci): try sleeping
* fix(ci): override host
* fix(ci): wait for server to start
* Support /extract and /crawl for self-hosted (FIR-1097) (#1137 )
* Support /extract for self-hosted
This returns the job response from redis rather than supabase when db auth is disabled (self hosted mode)
* Use getJob for extract and use correct types
* fix(v1/crawl-status): only poll DB for total count if DB is enabled
* feat(snips): TEST_SUITE_SELF_HOSTED
* fix(ci/test-server-self-host): use pr trigger
* fix(scrapeURL): f-e mocking in selfhosted env
* fix(snips): do not try to eval json format on selfhost
* fix(scrapeURL): further f-e mocking
* fix(snips): don't timeout on hard fail polling
* fix(v1/extract-status): fix-up the db-agnostic impl
unfortunately had to separate the functions since the schema
was too divergent :(
* fix(snips): boost screenshot delay
* feat(ci): test with openai
* feat(ci): extract, search testing
* fix(ci): matrix
* fix(ci): bleh
* Update: fix default google search (#1174 )
* fix log title
* search should always work
* asd
* fix ci
---------
Co-authored-by: Nick Roth <nlr06886@gmail.com>
Co-authored-by: William <sdustusun@gmail.com>
2025-02-20 00:41:22 +01:00
Rafael Miller
ac5c88bffb
added scrapeOptions to extract ( #1133 )
2025-02-07 13:38:08 -03:00
Rafael Miller
8d7e8c4f50
added cached scrapes to extract ( #1107 )
...
* added cached scrapes to extract
* dont save if exists
* no duplicates
* experimental tag
* Update requests.http
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2025-01-31 13:58:52 -03:00
Nicolas
04c6f511b5
(feat/extract) Add sources to the extraction ( #1101 )
...
* Nick: good state
* Nick: source tracker class
* Nick: show sources under flag
2025-01-28 13:46:21 -03:00
Nicolas
6b9e65c4f6
(feat/extract) Refactor and Reranker improvements ( #1100 )
...
* Reapply "Nick: extract api reference"
This reverts commit 61d7ba76f76ce74e0d230f89a93436f29dc8d9df.
* Nick: refactor analyzer
* Nick: formatting
* Nick:
* Update extraction-service.ts
* Nick: fixes
* NIck:
* Nick: wip
* Nick: reverted to the old re-ranker
* Nick:
* Update extract-status.ts
2025-01-27 20:07:01 -03:00
Nicolas
61d7ba76f7
Revert "Nick: extract api reference"
...
This reverts commit 522c5b35da7d5cd997aa5ebe2002a38ede7ace93.
2025-01-26 21:06:37 -03:00
Nicolas
522c5b35da
Nick: extract api reference
2025-01-26 21:00:40 -03:00
Móricz Gergő
05d79a875a
fix(extract): oops
2025-01-24 11:55:41 +01:00
Móricz Gergő
4db9a4a675
fix(extraction-service): allow no multiEntityKeys if isMultiEntity is false
2025-01-24 11:33:49 +01:00
rafaelmmiller
f1cd891a70
added today to extract prompts
2025-01-23 17:14:45 -03:00
Gergő Móricz
6f696d32ae
feat(extract): add log on 0 links
2025-01-23 19:25:12 +01:00
Gergő Móricz
5d56627bfa
feat(extraction-service): highlight req schema generation
2025-01-23 19:24:24 +01:00
Móricz Gergő
9da51a7514
feat(extract): add original schema to logs
2025-01-23 14:59:54 +01:00
Móricz Gergő
d3518e85a8
feat(extract): add logging
2025-01-23 12:05:15 +01:00
Nicolas
ccb74a2b43
Nick: increased timeouts on extract + reduced extract redis usage
2025-01-23 01:28:26 -03:00
Nicolas
498558d358
Nick: formatting done
2025-01-22 18:47:44 -03:00
Nicolas
56f048aeff
Reapply "Nick:"
...
This reverts commit 4b4385c520c7223cf79ebba981dded8ffaefde11.
2025-01-22 17:26:32 -03:00
Nicolas
4b4385c520
Revert "Nick:"
...
This reverts commit 6718ce89085339eaaceb1e88a0aa45ecff3216ac.
2025-01-22 17:26:09 -03:00
Nicolas
e1ef826ac6
Merge branch 'main' of https://github.com/mendableai/firecrawl
2025-01-22 17:25:49 -03:00
Nicolas
6718ce8908
Nick:
2025-01-22 17:25:48 -03:00
Gergő Móricz
208bd4ca0c
fix(extraction-service): marginally improve logging
2025-01-22 19:38:09 +01:00
Nicolas
d786949639
Reapply "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
...
This reverts commit 8b17af40018688c34f95727ceaec289b02ab2023.
2025-01-19 22:04:12 -03:00
Nicolas
8b17af4001
Revert "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
...
This reverts commit 406f28c04aff2ba3ae65f483627da13f02943cc3, reversing
changes made to 34ad9ec25d73f37deb1e3adec2315a121ec52f0e.
2025-01-19 22:00:28 -03:00
Nicolas
64607f3f20
Update extraction-service.ts
2025-01-18 22:40:53 -03:00
Nicolas
9cd48d7f73
Nick:
2025-01-17 23:47:22 -03:00
Nicolas
1f6abf95e8
Nick: extract billing works
2025-01-17 20:59:44 -03:00
Nicolas
4db023280d
Nick: introduce llm-usage cost analysis
2025-01-15 21:01:29 -03:00
Nicolas
957eea4113
Nick: extract without a schema should work as expected
2025-01-14 11:37:00 -03:00
Nicolas
61e6af2b16
Nick: streaming callback experimental
2025-01-14 02:13:42 -03:00
Nicolas
2dc87a2e1c
Update extraction-service.ts
2025-01-14 01:59:52 -03:00
Nicolas
033e9bbf29
Nick: __experimental_streamSteps
2025-01-14 01:45:50 -03:00
Nicolas
5e5b5ee0e2
(feat/extract) New re-ranker + multi entity extraction ( #1061 )
...
* agent that decides if splits schema or not
* split and merge properties done
* wip
* wip
* changes
* ch
* array merge working!
* comment
* wip
* dereferentiate schema
* dereference schemas
* Nick: new re-ranker
* Create llm-links.txt
* Nick: format
* Update extraction-service.ts
* wip: cooking schema mix and spread functions
* wip
* wip getting there!!!
* nick:
* moved functions to helpers
* nick:
* cant reproduce the error anymore
* error handling all scrapes failed
* fix
* Nick: added the sitemap index
* Update sitemap-index.ts
* Update map.ts
* deduplicate and merge arrays
* added error handler for object transformations
* Update url-processor.ts
* Nick:
* Nick: fixes
* Nick: big improvements to rerank of multi-entity
* Nick: working
* Update reranker.ts
* fixed transformations for nested objs
* fix merge nulls
* Nick: fixed error piping
* Update queue-worker.ts
* Update extraction-service.ts
* Nick: format
* Update queue-worker.ts
* Update pnpm-lock.yaml
* Update queue-worker.ts
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Thomas Kosmas <thomas510111@gmail.com>
2025-01-13 22:30:15 -03:00
Nicolas
f4d10c5031
Nick: formatting fixes
2025-01-10 18:35:10 -03:00
Nicolas
aa31508ccd
Nick: links-billed update (temp)
2025-01-08 15:13:33 -03:00
Gergő Móricz
1f2a76fc23
Update apps/api/src/lib/extract/extraction-service.ts
2025-01-07 20:18:10 +01:00
Nicolas
eb254547e5
Nick:
2025-01-07 16:16:01 -03:00
Nicolas
27457ed5db
Nick: init
2025-01-03 20:44:27 -03:00
rafaelmmiller
ef0fc8d0d3
broader search if didnt find results
2025-01-02 18:00:18 -03:00
Nicolas
33632d2fe3
Update extraction-service.ts
2024-12-31 15:22:50 -03:00
Nicolas
e6da214aeb
Nick: async background index
2024-12-30 21:42:01 -03:00
Nicolas
4332f18a8f
Nick: making it optional for the user
2024-12-26 12:43:58 -03:00