465 Commits

Author SHA1 Message Date
yanlong.wang
f472f02a50
fix: country code validation 2025-03-26 14:50:08 +08:00
yanlong.wang
2073c40a4f
fix: remove ip based query filling for serper 2025-03-26 14:37:18 +08:00
yanlong.wang
65ed1157c2
fix: charge amount 2025-03-26 14:15:55 +08:00
yanlong.wang
e953c01e58
fix: formatted usage entry 2025-03-26 14:07:36 +08:00
Yanlong Wang
e9d69e6201
searcher: image/news/bing search (#1176)
* wip

* refactor: more serper search features

* fix: usage naming in crawl
2025-03-26 13:47:07 +08:00
yanlong.wang
51779dc2eb
saas: new rate limit policy 2025-03-26 11:52:37 +08:00
yanlong.wang
ab715271a2
deps: bump readability to latest 2025-03-24 17:29:09 +08:00
yanlong.wang
a30a865140
fix: curl failure should return 4xx if specified explicitly 2025-03-24 16:33:43 +08:00
yanlong.wang
bc8dea9139
fix: remove readerlm automatic retry 2025-03-24 16:21:10 +08:00
yanlong.wang
61a1252762
fix: removing core-js pollution 2025-03-24 16:20:51 +08:00
Aaron Ji
b304d5809d
feat: search return with date (#1173)
* feat: search return with date

* cleanup: keep changes to searcher

---------

Co-authored-by: yanlong.wang <yanlong.wang@naiver.org>
2025-03-24 14:46:22 +08:00
yanlong.wang
f7f6a98839
fix: readerlm params 2025-03-24 13:57:51 +08:00
Yanlong Wang
f3654a786b
fix: beware readerlm capacity drain 2025-03-24 10:15:16 +08:00
Yanlong Wang
a471a6137c
fix: do img filtering in node instead of browser 2025-03-22 12:33:06 +08:00
Yanlong Wang
3b0e0207f7
cd: get font 2025-03-21 22:51:23 +08:00
Yanlong Wang
3bb731519b
fix: generated alt 2025-03-21 22:43:19 +08:00
Yanlong Wang
2a30fce1cc
fix: bring back content based respond timing 2025-03-19 19:03:51 +08:00
yanlong.wang
3a40db2590
fix: cache use and edge cases 2025-03-17 15:10:07 +08:00
Yanlong Wang
b4b99f0096
cd: eu 2025-03-14 16:26:22 +08:00
Aaron Ji
02a219a470 fix: restrict auto-fill to first page fetch 2025-03-14 13:07:15 +08:00
yanlong.wang
dd76cf3866
docs: mention about issues caused by upstream 2025-03-13 16:30:22 +08:00
yanlong.wang
b9ad4facf5
fix 2025-03-13 16:25:59 +08:00
yanlong.wang
ff7612414c
chore: use env variable to switch browser headless mode 2025-03-13 16:00:55 +08:00
yanlong.wang
adce02d708
cleanup 2025-03-13 15:57:36 +08:00
yanlong.wang
ff595c2b4c
improve: cache usage and detection of js-required pages 2025-03-13 15:55:12 +08:00
Aaron Ji
a5e3c2d281 fix: return 10 search results by default 2025-03-13 15:00:51 +08:00
Aaron Ji
f1aed656b7
fix: some response could miss meta field (#1165) 2025-03-13 11:51:24 +08:00
Yanlong Wang
9415c6a2be
fix: track if snapshot html modified by js 2025-03-12 22:26:10 +08:00
Yanlong Wang
6027963670
fix: tweak default timing to be more conservative 2025-03-12 18:32:08 +08:00
Yanlong Wang
8121d62324
fix: custom script may listen to mutationIdle 2025-03-12 18:15:06 +08:00
Yanlong Wang
f7dbadffb7
behavior change: ditch content based return timing, adopt mutationIdle as default timing 2025-03-12 18:04:22 +08:00
Yanlong Wang
5141814bc9
docs: mention about discarded 2025-03-12 10:47:01 +08:00
Yanlong Wang
cadb2fd94c
feat: provide option to discard links 2025-03-12 10:46:01 +08:00
Yanlong Wang
7ded2de54c
clean: redundant ip block 2025-03-11 23:47:23 +08:00
Yanlong Wang
f2fcacd4c1
fix: a small lru for invalid tokens 2025-03-11 23:40:52 +08:00
yanlong.wang
db27b38010
fix 2025-03-11 19:24:28 +08:00
yanlong.wang
45d1682db0
feat: pdf upload and ip check 2025-03-11 19:20:11 +08:00
yanlong.wang
7d4102e96e
fix 2025-03-11 17:49:44 +08:00
yanlong.wang
c4be29b1da
fix: request blocker 2025-03-11 17:37:56 +08:00
yanlong.wang
dceb361a7a
fix: url check 2025-03-10 18:23:30 +08:00
yanlong.wang
5bbd75a6d6
fix 2025-03-10 17:52:19 +08:00
yanlong.wang
f0560c6949
fix: caching condition 2025-03-10 17:29:17 +08:00
yanlong.wang
0da71cad34
fix: robots-txt not loaded error conditions 2025-03-10 17:19:25 +08:00
yanlong.wang
4e5abd345e
docs: minor fix 2025-03-10 17:02:16 +08:00
yanlong.wang
ce11f44b92
docs: be neutral on brand 2025-03-10 16:58:20 +08:00
yanlong.wang
d71c89a79c
fix: block suspicious requests before sideload 2025-03-10 16:46:26 +08:00
yanlong.wang
3b3a0265df
feat: control concurrent request per page instead of server bucket 2025-03-10 16:45:56 +08:00
yanlong.wang
c064fcf77e
fix: unhandledRejection log level 2025-03-10 15:31:38 +08:00
yanlong.wang
a9855dcd3b
chore: prefer ctx.URL 2025-03-10 15:25:58 +08:00
yanlong.wang
cf01a2c504
chore: add comments to clarify 2025-03-10 15:17:42 +08:00