414 Commits

Author SHA1 Message Date
yanlong.wang
df127d0207
fix: finalizer and unhandled promise rejection 2025-03-10 15:05:55 +08:00
yanlong.wang
eba1f9c0ec
fix: provide our onw robots.txt 2025-03-10 14:18:31 +08:00
yanlong.wang
0d6cf2b1d1
fix: robots-txt location 2025-03-10 14:07:52 +08:00
yanlong.wang
101cb19dde
fix: robots-txt cache location 2025-03-10 14:07:19 +08:00
yanlong.wang
a7a41250d4
fix: curl redirections 2025-03-10 13:46:18 +08:00
Aaron Ji
8ec8123ff4
chore: fix search result amount (#1163) 2025-03-10 13:38:16 +08:00
yanlong.wang
8a8ae10919
fix: curl error category 2025-03-10 12:24:54 +08:00
yanlong.wang
5f6cfdf280
deps: cleanup 2025-03-10 12:21:23 +08:00
Yanlong Wang
19a0bbe924
fix: bad snapshot in sideload should not throw directly 2025-03-10 09:48:22 +08:00
Yanlong Wang
ead906e603
fix: runtime NODE_COMPILE_CACHE dir 2025-03-10 09:32:49 +08:00
Yanlong Wang
6e78e38e95
feat: leveraging NODE_COMPILE_CACHE (#1162)
* wip: try to leverage NODE_COMPILE_CACHE

* fix

* fix

* fix

* fix

* fix: black hole detector

* bhd: also tracking curl requests
2025-03-10 09:23:25 +08:00
Yanlong Wang
d0e20cc086
fix: several crash cases 2025-03-09 12:01:52 +08:00
Yanlong Wang
6b9e14de62
feat: md options pass though to turndown 2025-03-09 10:31:39 +08:00
Yanlong Wang
2720b69e60
deps: bump 2025-03-08 23:59:02 +08:00
Yanlong Wang
3020d589b6
fix: catch jsdom errors 2025-03-08 23:17:53 +08:00
Yanlong Wang
da48d0e4a7
deps: bump 2025-03-08 22:27:56 +08:00
Yanlong Wang
4ca627c0c5
fix: guard invalid domain names 2025-03-08 22:21:25 +08:00
Yanlong Wang
4830ff5fda
fix: potential fix for firestore grpc connection reset 2025-03-08 21:37:55 +08:00
Yanlong Wang
fd328cbcc2
fix 2025-03-08 20:52:35 +08:00
Yanlong Wang
8456fcecbd
fix: somehow side-loading chromewebstore would 100% crash the browser 2025-03-08 20:44:02 +08:00
Yanlong Wang
440ff4d729
fix: expect failure while loading pdf 2025-03-08 20:25:18 +08:00
Yanlong Wang
4bc6394692
fix: potential invalid pdf issue 2025-03-08 20:19:50 +08:00
Yanlong Wang
4ab28fe971
deps: bump 2025-03-08 20:04:24 +08:00
Yanlong Wang
66db31788e
cleanup: use local project code as much as possible 2025-03-08 19:32:30 +08:00
Yanlong Wang
512f225692
fix: sideload redirections 2025-03-08 18:59:36 +08:00
Yanlong Wang
c19ca2147c
fix: bug in pptr injections 2025-03-08 18:43:39 +08:00
Yanlong Wang
e551695d17
fix: fail early on special cookie redirects 2025-03-08 18:43:10 +08:00
Yanlong Wang
26f6202f79
fix: curl cookie mimicking 2025-03-08 18:14:43 +08:00
Yanlong Wang
89e5dbbe9c
fix: curl cookie behavior 2025-03-08 18:01:38 +08:00
Yanlong Wang
3b1978fd1d
fix: implement DNT in alt-gen and pdf-extract 2025-03-08 17:52:49 +08:00
Yanlong Wang
1a2754c674
fix: sideLoad header detection 2025-03-08 17:46:09 +08:00
Yanlong Wang
63a2e15f4d
fix: curl redirection location 2025-03-08 17:20:38 +08:00
Yanlong Wang
fb43578fdd
fix: curl implicit redirect 2025-03-08 17:18:53 +08:00
Yanlong Wang
8597daa96b
fix: side load context bridging 2025-03-08 16:49:14 +08:00
Yanlong Wang
e92ff33ad0
fix 2025-03-08 15:49:22 +08:00
Yanlong Wang
b674d26f76
fix: clean HTML timer 2025-03-08 13:33:10 +08:00
Yanlong Wang
4e299bf8e2
fix: remove tailwind classes instead of the opposite 2025-03-08 13:30:06 +08:00
Yanlong Wang
434c6707fd
deps: bump 2025-03-08 13:06:01 +08:00
Yanlong Wang
4d345f40f9
fix: curl error wrapping 2025-03-08 12:59:22 +08:00
Yanlong Wang
4744b8645f
deps: bump 2025-03-08 11:30:05 +08:00
Yanlong Wang
c326088327
fix: deps 2025-03-08 10:54:08 +08:00
Yanlong Wang
23a3b807c9
restructure: nolonger a firebase application (#1160)
* fix: fine allow redefining Function.prototype.toString

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* fix: contentType encoding

* wip

* fix: error throwing

* wip

* fix

* wip

* fix

* fix

* fix: jsdom

* wip

* wip

* fix: links summary uniqueness

* wip

* wip

* robots-txt catch no robots.txt

* deps: remove puppeteer-extra-plugin-stealth

* fix: dont change waring type

* fix: curl

* fix: replace firebase-roundtrip-check with blackhole-detector

* fix: black hole detection

* sercher: black hole detecting

* fix: no h2c for searcher

* fix: bhd

* fix: search and crawl conflict

* fix: bhd

* fix

* fix: server script

* canvas: fixed avif issue

* logging: move some to debug

* fix

* fix: pptr declare ready only when page can be created without issues

* fix: bhd

* cd: cloud run deploy-health-check cannot complete pptr newPage

* cd: fix

* fix: curl body can be null

* fix

* fix

* fix: major fix regarding TC pdfs

* fix

* fix

* deps: fix civkit trie router issue

* fix

* boom: total restructure

* cd: fix docker ctx

* fix

* fix: switch to h2c

* cd: ensure http2
2025-03-08 00:46:52 +08:00
Aaron Ji
ed80c9a4a2
chore: expose 'num' param for search API (#1157)
* chore: expose 'num' param for search API

* fix: wrong count value
2025-03-05 11:20:55 +08:00
Aaron Ji
a7ffb58741
chore: support 'gl', 'hl', 'location' and 'page' when search (#1156) 2025-03-04 17:34:49 +08:00
Aaron Ji
21ae52a55a
feat: support fetching favicon (#1155)
* feat: support fetching favicon

* Update backend/functions/src/dto/scrapping-options.ts

Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>

* Update backend/functions/src/dto/scrapping-options.ts

Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>

* chore: remove 'withFavicon' from CrawlerOptions

---------

Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
2025-03-04 11:32:42 +08:00
yanlong.wang
cd6a4eedf9
cd: add dev branch 2025-03-03 18:52:07 +08:00
Aaron Ji
3d29b3069a
fix: use 'no-content' (#1154)
* fix: use 'no-content' in x-respond-with

* fix: use 'no-content' in x-respond-with
2025-03-03 17:00:18 +08:00
Aaron Ji
51109e01d2
chore: improve type definitions and type safety (#1153)
* chore: improve type definitions and type safety

* chore: format search result

* chore: use 'x-details'

* chore: remove 'domain' from search result

* chore: use 'x-respond-with'

* chore: optimize logic to return content

* chore: cleanup

* chore: optimize logic
2025-02-28 18:44:38 +08:00
Yanlong Wang
4873578c62
Revert "serving: switching to h2c"
This reverts commit 690984d44a960a27556fc46873366f785d60666f.
2025-02-27 11:37:06 +08:00
Sha Zhou
efc405b1a0 remove version params 2025-02-27 11:33:29 +08:00