Nicolas
|
4d0acc9722
|
Merge branch 'main' into v1-webscraper
|
2024-08-26 16:22:05 -03:00 |
|
Gergo Moricz
|
d591e0f51c
|
block corterix.com for performance issues
|
2024-08-25 20:06:12 +02:00 |
|
Nicolas
|
173f4ee1bf
|
Nick: chrome cdp main | simple autoscaler
|
2024-08-23 20:09:59 -03:00 |
|
Gergő Móricz
|
05c250d3b8
|
Merge branch 'main' into v1-webscraper
|
2024-08-23 19:38:57 +02:00 |
|
Nicolas
|
3d53f4e213
|
Nick: unblocking pin
|
2024-08-23 13:56:05 -03:00 |
|
Gergő Móricz
|
e7f267b6fe
|
Merge branch 'main' into v1-webscraper
|
2024-08-23 17:21:54 +02:00 |
|
Gergő Móricz
|
8d9ff90bcb
|
feat(fire-engine): propagate sentry trace
|
2024-08-22 23:38:04 +02:00 |
|
Gergő Móricz
|
8e3c2b2855
|
fix(crawler): verify URL
|
2024-08-22 23:30:19 +02:00 |
|
rafaelsideguide
|
7473b74021
|
fix: html and rawlhtmls for pdfs
|
2024-08-22 15:15:45 -03:00 |
|
rafaelsideguide
|
b1d61d8557
|
Merge remote-tracking branch 'origin/v1-webscraper' into v1/python-sdk
|
2024-08-22 13:39:09 -03:00 |
|
Gergő Móricz
|
6d48dbcd38
|
feat(sentry): add trace continuity for queue
|
2024-08-22 16:47:38 +02:00 |
|
Gergő Móricz
|
fbbc3878f1
|
fix(crawler): make sure includes/excludes is an array
|
2024-08-22 13:18:26 +02:00 |
|
rafaelsideguide
|
fe2e8c0b7a
|
includehtml fix
|
2024-08-21 15:54:00 -03:00 |
|
Gergő Móricz
|
55009e51f5
|
fix: filter out invalid URLs from crawl links
|
2024-08-21 20:49:25 +02:00 |
|
rafaelsideguide
|
52abec41c2
|
fixing delete
|
2024-08-21 10:35:50 -03:00 |
|
rafaelsideguide
|
b66553867e
|
reverting delete, fixed express bug on checkCredits
|
2024-08-21 09:28:20 -03:00 |
|
rafaelsideguide
|
138437d616
|
commenting out delete, crashing on fire-engine
|
2024-08-21 08:11:24 -03:00 |
|
rafaelsideguide
|
5e48bec1fd
|
commenting out delete, crashing on fire-engine
|
2024-08-21 08:10:46 -03:00 |
|
Nicolas
|
90b32f16c8
|
Nick: fixes
|
2024-08-20 21:38:11 -03:00 |
|
Nicolas
|
819ad50af3
|
Update fireEngine.ts
|
2024-08-20 21:16:33 -03:00 |
|
rafaelsideguide
|
e9d6ca197e
|
tests passing now
|
2024-08-20 20:00:41 -03:00 |
|
Nicolas
|
1b3ad60a2c
|
Reapply "Merge pull request #561 from mendableai/bug/dealing-with-dns-error"
This reverts commit ffe11a5bf73e3c57657972cd36c3af1d0b9a432c.
|
2024-08-20 19:22:09 -03:00 |
|
Nicolas
|
441628998f
|
Reapply "Merge pull request #561 from mendableai/bug/dealing-with-dns-error"
This reverts commit ffe11a5bf73e3c57657972cd36c3af1d0b9a432c.
|
2024-08-20 19:16:48 -03:00 |
|
Nicolas
|
ffe11a5bf7
|
Revert "Merge pull request #561 from mendableai/bug/dealing-with-dns-error"
This reverts commit 2030ec603109d6ce8786a011d431bc5c83917f1b, reversing
changes made to f494d2b707d40b690ae41611d17f77f683570fc2.
|
2024-08-20 18:16:11 -03:00 |
|
Gergő Móricz
|
1368f9a87f
|
fix: treat existing screenshot as a scraper success condition
|
2024-08-20 22:24:18 +02:00 |
|
rafaelsideguide
|
f98be7d94e
|
Update fireEngine.ts
|
2024-08-20 16:53:01 -03:00 |
|
rafaelsideguide
|
1f27182a13
|
added try catch
|
2024-08-20 15:42:39 -03:00 |
|
rafaelsideguide
|
e326249a57
|
added check job and cancel to fire-engine requests
|
2024-08-20 14:26:42 -03:00 |
|
rafaelsideguide
|
e1c9cbf709
|
bug fixed. crawl should not stop if sitemap url is invalid
|
2024-08-20 09:11:58 -03:00 |
|
rafaelsideguide
|
ecd472356b
|
added variables to beta customers
|
2024-08-19 16:41:54 -03:00 |
|
rafaelsideguide
|
b8170aaa47
|
Update blocklist.ts
|
2024-08-19 08:51:48 -03:00 |
|
Nicolas
|
47123be783
|
Nick: weird activity block
|
2024-08-16 22:01:56 -04:00 |
|
rafaelsideguide
|
086ba6280b
|
fixed markdown format
|
2024-08-16 18:39:13 -03:00 |
|
Gergő Móricz
|
aabfaf0ac5
|
clean up crawl-status, fix db ddos
|
2024-08-16 23:29:39 +02:00 |
|
rafaelsideguide
|
7a61325500
|
map + search + scrape markdown bug
|
2024-08-16 17:57:11 -03:00 |
|
Nicolas
|
23a033fe61
|
Nick: fixes and more e2e tests
|
2024-08-16 16:03:35 -04:00 |
|
rafaelsideguide
|
3f998b688d
|
scrape ready
|
2024-08-16 15:14:37 -03:00 |
|
Nicolas
|
81b2479db3
|
Merge pull request #459 from mendableai/feat/queue-scrapes
feat: Move scraper to queue
|
2024-08-15 14:19:55 -04:00 |
|
Nicolas
|
86326f34e9
|
Update single_url.test.ts
|
2024-08-15 13:48:42 -04:00 |
|
Gergő Móricz
|
29f0d9ec94
|
propagate priority to fire-engine
|
2024-08-15 19:04:46 +02:00 |
|
Nicolas
|
6e1074cdd1
|
Update website_params.ts
|
2024-08-14 17:39:54 -04:00 |
|
Thomas Kosmas
|
6410e1a81d
|
Update params
|
2024-08-15 00:10:14 +03:00 |
|
Gergo Moricz
|
d7549d4dc5
|
feat: remove webScraperQueue
|
2024-08-13 21:03:24 +02:00 |
|
Gergő Móricz
|
4a2c37dcf5
|
Merge branch 'main' into feat/queue-scrapes
|
2024-08-13 20:53:49 +02:00 |
|
Gergo Moricz
|
86e136beca
|
feat: crawl to scrape conversion
|
2024-08-13 20:51:43 +02:00 |
|
Thomas Kosmas
|
98be29c963
|
Update parameters for platform.openai.com
|
2024-08-12 22:49:28 +03:00 |
|
rafaelsideguide
|
0591000b64
|
bugfix includes excludes
|
2024-08-09 14:30:41 -03:00 |
|
Nicolas
|
f1f5605010
|
Update website_params.ts
|
2024-08-08 12:31:58 -04:00 |
|
Gergő Móricz
|
5fc7fcb77c
|
Merge branch 'main' into feat/queue-scrapes
|
2024-08-07 16:35:44 +02:00 |
|
Gergo Moricz
|
fe9fdb578b
|
revert bad hotfixes
|
2024-08-07 16:34:25 +02:00 |
|