2680 Commits

Author SHA1 Message Date
Nicolas
c19af6ef42 Update map.ts 2025-01-19 12:27:08 -03:00
Nicolas
2e5785d8d9 Nick: fetch sitemap timeout param 2025-01-19 11:40:13 -03:00
Nicolas
24ddcd4a6d Update check-fire-engine.ts 2025-01-18 23:53:33 -03:00
Nicolas
4e8e58729a Update index.ts 2025-01-18 21:10:20 -03:00
Nicolas
0ec52613e2 Nick: 2025-01-18 21:10:11 -03:00
Nicolas
88904a5cde Nick: js-sdk 1.15.4 2025-01-18 18:10:23 -03:00
Nicolas
a7178c0c6f Nick: patch js-sdk 2025-01-18 18:08:50 -03:00
Nicolas
56776da28d Nick: json output to new format 2025-01-18 17:54:04 -03:00
Nicolas
092da888c6 Merge branch 'nsc/json-format' 2025-01-18 17:52:06 -03:00
Nicolas
f9d99ac652 Nick: fixed type on js sdk 2025-01-18 17:51:56 -03:00
Nicolas
9109e78e15
Merge pull request #1072 from mendableai/nsc/json-format
(feat/formats) Extract format renamed to json format
2025-01-18 17:38:52 -03:00
Nicolas
5d62e8264b Nick: 2025-01-18 17:37:11 -03:00
Nicolas
b030a1c5da Nick: extract to json in the sdks as well 2025-01-18 17:23:21 -03:00
Nicolas
34b40f6a23 Nick: 2025-01-18 17:17:42 -03:00
Eric Ciarla
6383bf270a Fix python sdk for extract 2025-01-18 17:12:06 -03:00
Eric Ciarla
3363b2d60e Make prompt not required for extract on python sdk 2025-01-18 17:06:28 -03:00
Gergő Móricz
146dc47954 feat(sdk): check crawl/batch scrape errors 2025-01-17 18:06:04 +01:00
Gergő Móricz
dbc6d07871 fix(queue-worker): bring done add to earlier 2025-01-17 17:46:29 +01:00
Gergő Móricz
13abb2bc0e fix(crawl-redis/finishCrawl): increase logging to hunt down race condition 2025-01-17 17:23:13 +01:00
Gergő Móricz
078c0679aa fix(crawl-status): improve finished checking 2025-01-17 17:18:36 +01:00
Gergő Móricz
e6531278f6 feat(v1): crawl/batch scrape errors route 2025-01-17 17:12:04 +01:00
Gergő Móricz
dcd3d6d98d fix(kickoff): mark as finished if it errors out 2025-01-17 17:11:19 +01:00
Gergő Móricz
5992c57158 fix(crawler): bad urls from sitemap 2025-01-17 17:07:44 +01:00
Gergő Móricz
237d0dc197 fix(requests.http): map 2025-01-17 16:21:57 +01:00
Gergő Móricz
805bfa457d fix(python-sdk): add JSON parse error reporting clarity 2025-01-17 16:16:41 +01:00
Gergő Móricz
d5929af010 fix(queue-worker/kickoff): make crawls wait for kickoff to finish (matters on big sitemapped sites) 2025-01-17 16:04:01 +01:00
Gergő Móricz
23bb172592 fix(crawler): recognize sitemaps in robots.txt 2025-01-17 15:45:52 +01:00
Móricz Gergő
faf58dfca7 fix(removeUnwantedElements): post-includeTags excludeTags
Fixes #700
2025-01-17 12:41:00 +01:00
Móricz Gergő
de08b37480 feat: adjust CI testing 2025-01-17 11:51:46 +01:00
Móricz Gergő
4a947e385f fix(queue-worker): fill out time taken on failure too 2025-01-17 11:28:37 +01:00
Nicolas
8e57fdec2c Update package.json 2025-01-16 14:02:25 -03:00
Nicolas
80e5acf68c Nick: error details for extract 2025-01-16 14:02:15 -03:00
Gergő Móricz
6c94db7ed0 fix(html,markdown): always get absolute links 2025-01-16 16:56:13 +01:00
Gergő Móricz
e824303d87 feat(html): always pick largest image from srcset 2025-01-16 16:51:33 +01:00
Gergő Móricz
655753cd27 fix(url): allow domains with ports 2025-01-16 16:30:14 +01:00
Nicolas
de14c0a45d Update package.json 2025-01-15 18:12:34 -03:00
Gergő Móricz
c3937996b1 feat(js-sdk): add further options to checkxstatus 2025-01-15 20:16:39 +01:00
Gergő Móricz
cbe67d89a5 feat(queue-worker): proactive job cancel 2025-01-15 19:02:20 +01:00
Gergő Móricz
ec039dcb8f fix(blocklist): unblock 2025-01-15 18:54:26 +01:00
Gergő Móricz
dde3aebac4 fix(v1/crawl-status): fix stuck on 0 jobs 2025-01-15 18:51:39 +01:00
Gergő Móricz
ce2f6ff884 fix(queue-worker/billing): fix crawl overbilling 2025-01-15 17:22:52 +01:00
Nicolas
db89e365eb Update check-fire-engine.ts 2025-01-15 01:16:42 -03:00
Nicolas
957eea4113 Nick: extract without a schema should work as expected v.1.3.0 2025-01-14 11:37:00 -03:00
Nicolas
61e6af2b16 Nick: streaming callback experimental 2025-01-14 02:13:42 -03:00
Nicolas
23d3257a57 Merge branch 'nsc/__experimental_streamSteps' 2025-01-14 02:00:58 -03:00
Nicolas
c323c64671 Update extract-redis.ts 2025-01-14 02:00:47 -03:00
Nicolas
2dc87a2e1c Update extraction-service.ts 2025-01-14 01:59:52 -03:00
Nicolas
0496b793c4
Merge pull request #1063 from mendableai/nsc/__experimental_streamSteps
__experimental_streamSteps
2025-01-14 01:48:13 -03:00
Nicolas
033e9bbf29 Nick: __experimental_streamSteps 2025-01-14 01:45:50 -03:00
Nicolas
558a7f4c08 Update package.json 2025-01-14 01:35:29 -03:00