2007 Commits

Author SHA1 Message Date
Móricz Gergő
2cf7a4f57a fix(batch-scrape): auto finish "kickoff" (no kickoff) 2025-01-20 09:40:59 +01:00
Nicolas
f385b250be Update html-to-markdown.ts 2025-01-20 00:20:20 -03:00
Nicolas
240e4e4702 Update auth.ts 2025-01-19 23:17:12 -03:00
Nicolas
1ca50e6e8f Update llmExtract.ts 2025-01-19 22:18:51 -03:00
Nicolas
d786949639 Reapply "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 8b17af40018688c34f95727ceaec289b02ab2023.
2025-01-19 22:04:12 -03:00
Nicolas
8b17af4001 Revert "Merge pull request #1068 from mendableai/nsc/llm-usage-extract"
This reverts commit 406f28c04aff2ba3ae65f483627da13f02943cc3, reversing
changes made to 34ad9ec25d73f37deb1e3adec2315a121ec52f0e.
2025-01-19 22:00:28 -03:00
Nicolas
406f28c04a
Merge pull request #1068 from mendableai/nsc/llm-usage-extract
(feat/extract) - LLMs usage analysis + billing
2025-01-19 21:36:33 -03:00
Nicolas
02dea23892 Update auth.ts 2025-01-19 21:35:32 -03:00
Nicolas
34ad9ec25d
Merge pull request #1073 from mendableai/nsc/index-queue
(feat/index) Index/Insertion queue
2025-01-19 17:45:57 -03:00
Gergő Móricz
6637dce626 fix: status 2025-01-19 17:34:09 +01:00
Nicolas
e4b45e9e7c Update auth.ts 2025-01-19 13:23:51 -03:00
Nicolas
baa2f94765 Update crawl-maps-index.ts 2025-01-19 13:15:20 -03:00
Nicolas
92b8d97be3 Nick: 2025-01-19 13:09:29 -03:00
Nicolas
513f61a2d1 Nick: map improvements 2025-01-19 12:33:44 -03:00
Nicolas
c19af6ef42 Update map.ts 2025-01-19 12:27:08 -03:00
Nicolas
2e5785d8d9 Nick: fetch sitemap timeout param 2025-01-19 11:40:13 -03:00
Nicolas
24ddcd4a6d Update check-fire-engine.ts 2025-01-18 23:53:33 -03:00
Nicolas
382476cb36 Nick: auth extract 2025-01-18 23:16:25 -03:00
Nicolas
81c347f538 Update llmExtract.ts 2025-01-18 22:49:03 -03:00
Nicolas
64607f3f20 Update extraction-service.ts 2025-01-18 22:40:53 -03:00
Nicolas
b8a30a50e2 Update llm-cost.ts 2025-01-18 21:25:25 -03:00
Nicolas
0ec52613e2 Nick: 2025-01-18 21:10:11 -03:00
Nicolas
34b40f6a23 Nick: 2025-01-18 17:17:42 -03:00
Nicolas
9cd48d7f73 Nick: 2025-01-17 23:47:22 -03:00
Nicolas
260a726f37 Merge branch 'main' into nsc/llm-usage-extract 2025-01-17 23:02:12 -03:00
Nicolas
6e3ceccb5c Nick: fixed billing and acuc cache 2025-01-17 21:27:56 -03:00
Nicolas
1f6abf95e8 Nick: extract billing works 2025-01-17 20:59:44 -03:00
Gergő Móricz
dbc6d07871 fix(queue-worker): bring done add to earlier 2025-01-17 17:46:29 +01:00
Gergő Móricz
13abb2bc0e fix(crawl-redis/finishCrawl): increase logging to hunt down race condition 2025-01-17 17:23:13 +01:00
Gergő Móricz
078c0679aa fix(crawl-status): improve finished checking 2025-01-17 17:18:36 +01:00
Gergő Móricz
e6531278f6 feat(v1): crawl/batch scrape errors route 2025-01-17 17:12:04 +01:00
Gergő Móricz
dcd3d6d98d fix(kickoff): mark as finished if it errors out 2025-01-17 17:11:19 +01:00
Gergő Móricz
5992c57158 fix(crawler): bad urls from sitemap 2025-01-17 17:07:44 +01:00
Gergő Móricz
237d0dc197 fix(requests.http): map 2025-01-17 16:21:57 +01:00
Gergő Móricz
d5929af010 fix(queue-worker/kickoff): make crawls wait for kickoff to finish (matters on big sitemapped sites) 2025-01-17 16:04:01 +01:00
Gergő Móricz
23bb172592 fix(crawler): recognize sitemaps in robots.txt 2025-01-17 15:45:52 +01:00
Móricz Gergő
faf58dfca7 fix(removeUnwantedElements): post-includeTags excludeTags
Fixes #700
2025-01-17 12:41:00 +01:00
Móricz Gergő
de08b37480 feat: adjust CI testing 2025-01-17 11:51:46 +01:00
Móricz Gergő
4a947e385f fix(queue-worker): fill out time taken on failure too 2025-01-17 11:28:37 +01:00
Gergő Móricz
6c94db7ed0 fix(html,markdown): always get absolute links 2025-01-16 16:56:13 +01:00
Gergő Móricz
e824303d87 feat(html): always pick largest image from srcset 2025-01-16 16:51:33 +01:00
Gergő Móricz
655753cd27 fix(url): allow domains with ports 2025-01-16 16:30:14 +01:00
Nicolas
ca14c651da Update model-prices.ts 2025-01-15 21:07:53 -03:00
Nicolas
4db023280d Nick: introduce llm-usage cost analysis 2025-01-15 21:01:29 -03:00
Gergő Móricz
cbe67d89a5 feat(queue-worker): proactive job cancel 2025-01-15 19:02:20 +01:00
Gergő Móricz
ec039dcb8f fix(blocklist): unblock 2025-01-15 18:54:26 +01:00
Gergő Móricz
dde3aebac4 fix(v1/crawl-status): fix stuck on 0 jobs 2025-01-15 18:51:39 +01:00
Gergő Móricz
ce2f6ff884 fix(queue-worker/billing): fix crawl overbilling 2025-01-15 17:22:52 +01:00
Nicolas
db89e365eb Update check-fire-engine.ts 2025-01-15 01:16:42 -03:00
Nicolas
957eea4113 Nick: extract without a schema should work as expected 2025-01-14 11:37:00 -03:00