Gergo Moricz
60c74357df
feat(ScrapeEvents): log queue events
2024-07-24 18:44:14 +02:00
Jakob Stadlhuber
497aa5d25e
Update Kubernetes configs for playwright-service, api, and worker
...
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 17:55:45 +02:00
rafaelsideguide
4eca6bd301
fix/check-for-auth-on-scrape-log
2024-07-24 12:54:14 -03:00
Nicolas
4ead89f983
Merge pull request #453 from mendableai/nsc/notion-fix
...
Notion Website Fixes
2024-07-24 11:40:19 -04:00
Nicolas
3a1b8a9797
Update website_params.ts
2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30
Update website_params.ts
2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c
feat(monitoring/scrape): include url, worker, response_size
2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc
fix(monitoring): bad success check on scrape
2024-07-24 16:21:59 +02:00
Gergo Moricz
d57dbbd0c6
fix: add jobId for scrape
2024-07-24 15:18:12 +02:00
Gergo Moricz
71072fef3b
fix(scrape-events): bad logic
2024-07-24 14:46:41 +02:00
Gergo Moricz
7cd9bf92e3
feat: scrape event logging to DB
2024-07-24 14:31:25 +02:00
Rafael Miller
5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
...
no need for regex
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
Eric Ciarla
1b7a00624d
Delete old comp
2024-07-23 21:51:08 -04:00
Eric Ciarla
565bc09439
Basic react app
2024-07-23 21:48:11 -04:00
rafaelsideguide
6208ecdbc0
added logger
2024-07-23 17:30:46 -03:00
Eric Ciarla
a0d89169ed
init
2024-07-23 15:48:12 -04:00
Nicolas
f0b07b509b
Update index.ts
2024-07-23 15:15:56 -04:00
rafaelsideguide
a684bd3c5d
added regex for links in sitemap
2024-07-23 09:07:23 -03:00
Nicolas
252bc09ee2
Merge pull request #447 from mendableai/nsc/speed-improvements
...
/scrape should now be 600ms-900ms faster
2024-07-22 19:18:24 -04:00
Nicolas
ac692ef09c
Update CONTRIBUTING.md
2024-07-22 19:17:53 -04:00
Nicolas
30e706b43f
Update scrape.ts
2024-07-22 19:15:24 -04:00
Nicolas
8916fec66c
Update index.ts
2024-07-22 19:14:53 -04:00
Nicolas
575ddc9e6e
Update scrape.ts
2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5
Nick: speed improvements
2024-07-22 18:30:58 -04:00
Nicolas
1bc36e1a56
Update fly-direct.yml
2024-07-22 14:12:55 -04:00
Nicolas
b229fbebd8
Update scrape_log.ts
2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c
fix(isFile): added .tiff extension
2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399
fix(WebCrawler): filter out file URLs when taking URLs from sitemap
2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85
fix(fly): raise heap limit to 4G per process
2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788
Update blocklist.ts
2024-07-18 14:20:19 -04:00
Nicolas
6161b83890
Update scrape_log.ts
2024-07-18 14:17:08 -04:00
Nicolas
c402c85346
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-07-18 14:16:51 -04:00
Nicolas
2dd7398aad
Update scrape_log.ts
2024-07-18 14:16:46 -04:00
Gergo Moricz
791e6b2047
fix action
2024-07-18 19:59:33 +02:00
Nicolas
f10f3f886b
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
...
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-18 13:52:08 -04:00
Nicolas
9a1a227797
Update crawl-cancel.ts
2024-07-18 13:49:51 -04:00
Nicolas
11768571ed
Update crawl-cancel.ts
2024-07-18 13:43:03 -04:00
Nicolas
ce804d3c20
Update crawl-cancel.ts
2024-07-18 13:40:24 -04:00
Nicolas
d338b05446
Merge pull request #436 from mendableai/mog/fix-infinite-regex
...
fix(WebScraper): infinite regex leading to fly.io instance hangs
2024-07-18 13:32:44 -04:00
Nicolas
d2de01d342
Nick: fixes
2024-07-18 13:19:44 -04:00
Gergo Moricz
0b8047c7a0
fix(WebScraper): infinite regex leading to fly.io instance hangs
2024-07-18 19:13:43 +02:00
Nicolas
f11137352c
Merge branch 'main' into feat/fire-engine-chrome-cdp
2024-07-18 12:48:42 -04:00
Nicolas
6d1d46a987
Merge pull request #433 from mendableai/mog/js-sdk-tests-fix
...
fix(js-sdk): transform tests with ts-jest and configure node
2024-07-18 12:40:59 -04:00
Nicolas
01b5e8fc73
Merge pull request #429 from mendableai/mog/fix-job-stuck-2
...
Fix queue stuck bug via lock settings changes
2024-07-18 12:39:21 -04:00
Nicolas
b134ba92bc
Merge pull request #427 from mendableai/docs/update-docs
...
[Docs] Updating docs
2024-07-18 11:49:08 -04:00
rafaelsideguide
f13ef02a08
Update openapi.json
2024-07-18 10:34:03 -03:00
Gergo Moricz
a23b125471
fix(js-sdk): transform tests with ts-jest and configure node
2024-07-18 14:20:51 +02:00
Gergo Moricz
361269974e
fix(js-sdk): remove autogenerated index.d.ts from git and add to gitignore
2024-07-18 13:48:39 +02:00
Gergo Moricz
2e62de4f8b
fix(js-sdk): remove built files from repo and add to gitignore
2024-07-18 13:45:51 +02:00
Gergo Moricz
a0b8a6cad3
feat(js-sdk): build both cjs and esm versions
2024-07-18 13:43:36 +02:00