495 Commits

Author SHA1 Message Date
Yanlong Wang
be993c2cb1
fix: there may be invalid root doc 2024-11-13 00:32:48 +08:00
Yanlong Wang
68c4df2df3
fix: deps and bugs 2024-11-13 00:27:39 +08:00
yanlong.wang
7ae2545a30
chore: tweak deployment 2024-11-12 17:33:23 +08:00
yanlong.wang
e2a187d126
fix: crawling IP url 2024-11-11 15:30:48 +08:00
yanlong.wang
67d4a9f45a
fix: expect cookie encoding issue 2024-11-11 14:58:00 +08:00
yanlong.wang
53bc91c31a
feat: compound response 2024-11-11 12:40:40 +08:00
Yanlong Wang
22647a0617
feat: script injecting and tools 2024-11-08 14:19:54 +08:00
Yanlong Wang
bd629a836b
search now requires authentication 2024-11-01 14:15:03 +08:00
Yanlong Wang
5d865651b1
chore: bump deps 2024-11-01 09:20:23 +08:00
yanlong.wang
b10931b8ed
fix: turndown rules 2024-10-31 17:22:51 +08:00
yanlong.wang
340fb517d8
chore: add internal slack report 2024-10-30 17:42:06 +08:00
yanlong.wang
a488bb8921
fix: headers in overridden request 2024-10-29 15:20:58 +08:00
yanlong.wang
3303763345
fix: salvaging with google cache does not work anymore 2024-10-29 15:09:50 +08:00
yanlong.wang
ebc09003d1
fix: walk around locale setting bug 2024-10-29 15:09:20 +08:00
yanlong.wang
9242bb393a
fix: detect poorly transformed contents 2024-10-28 14:52:13 +08:00
yanlong.wang
a8793114bb
fix 2024-10-23 18:50:39 +08:00
yanlong.wang
e38c5514e1
fix 2024-10-23 18:12:43 +08:00
yanlong.wang
fb97410e99
fix: bump deps 2024-10-23 18:03:59 +08:00
yanlong.wang
d538726bdd
revert: domain cannot be un-doomed due to google function wrapper
acdfd93097/src/function_wrappers.ts (L109-L116)
2024-10-23 17:27:23 +08:00
yanlong.wang
fedffe3dd2
fix: force process quit on firebase issue 2024-10-23 16:08:02 +08:00
yanlong.wang
102a1686b0
feat: expand shadow dom 2024-10-23 14:58:46 +08:00
Yanlong Wang
00a1278385
chore: tweak deployment 2024-10-21 21:34:08 +08:00
yanlong.wang
d6ad9e75d6
chore: suspend data crunching 2024-10-21 12:07:14 +08:00
Yanlong Wang
cf32ab4fa7
bump: deps 2024-10-18 12:59:44 +08:00
Yanlong Wang
74eac2fc18
fix: remove link url escaping 2024-10-18 12:59:36 +08:00
yanlong.wang
a54816d12d
fix 2024-10-14 17:33:24 +08:00
yanlong.wang
6a97f0bfa6
fix: uri encoding 2024-10-14 17:27:29 +08:00
Zhaofeng Miao
f82504540b fix(adaptive-crawler): fix cache problem 2024-10-10 16:37:12 +08:00
Zhaofeng Miao
db432645c3 feat: change deployment machine type to improve cpu utilization 2024-10-10 11:21:42 +08:00
Zhaofeng Miao
b9124a2ec1 chore 2024-10-10 11:20:31 +08:00
Zhaofeng Miao
b3ca557f6e chore: security 2024-10-10 11:18:38 +08:00
Zhaofeng Miao
86d69eebd1 chore: fix security dependencies 2024-10-10 11:17:18 +08:00
Zhaofeng Miao
14322140ba docs: readme changelog 2024-10-10 10:34:25 +08:00
yanlong.wang
e9258af742
fix: pdf mode and google web cache 2024-10-09 17:47:53 +08:00
yanlong.wang
f6bbddcb48
fix: pageshot missing in cache 2024-10-09 15:07:30 +08:00
Zhaofeng Miao
a44d9a2d2a feat(adaptive-crawler): optimize relevance detection 2024-10-08 15:19:03 +08:00
Zhaofeng Miao
af282eec43 fix(adaptive-crawler): useSitemap should be rewritten in certain condition 2024-10-08 14:18:13 +08:00
yanlong.wang
339af19192
fix: request to unknown domain 2024-10-08 12:02:27 +08:00
Zhaofeng Miao
5a4b35e4b9 fix(adaptive-crawler): if no sitemap, use recursive instead 2024-10-08 11:50:50 +08:00
Yanlong Wang
ee29be58f1
fix: gfm strikethrough 2024-10-01 18:57:12 +08:00
Yanlong Wang
f0c3a9b70e
fix 2024-10-01 12:55:06 +08:00
Yanlong Wang
a66791d85f
fix 2024-09-27 13:30:29 +08:00
yanlong.wang
f531056bbd
fix: pageshot not removed from page snapshot 2024-09-26 15:55:16 +08:00
Zhaofeng Miao
8008e53d57 feat(adaptive-crawl): disable invalid link 2024-09-25 14:18:28 +08:00
Zhaofeng Miao
3f88f8d2f7 fix(adaptive): url hash 2024-09-23 16:21:46 +08:00
Yanlong Wang
39e49cac63
fix: 3xx not considered errors 2024-09-18 02:35:36 +08:00
Yanlong Wang
96ce7f5aac
fix: iframe should not actively report snapshot 2024-09-18 02:33:11 +08:00
Yanlong Wang
87a6578970
chore: deployment tweak 2024-09-18 00:12:29 +08:00
Yanlong Wang
c36aa730b4
fix: target selector 2024-09-17 17:47:01 +08:00
Zhaofeng Miao
e27bcaca77
feat: add adaptive crawler (#112) 2024-09-13 14:08:07 +08:00