yanlong.wang
df127d0207
fix: finalizer and unhandled promise rejection
2025-03-10 15:05:55 +08:00
yanlong.wang
eba1f9c0ec
fix: provide our onw robots.txt
2025-03-10 14:18:31 +08:00
yanlong.wang
0d6cf2b1d1
fix: robots-txt location
2025-03-10 14:07:52 +08:00
yanlong.wang
101cb19dde
fix: robots-txt cache location
2025-03-10 14:07:19 +08:00
yanlong.wang
a7a41250d4
fix: curl redirections
2025-03-10 13:46:18 +08:00
Aaron Ji
8ec8123ff4
chore: fix search result amount ( #1163 )
2025-03-10 13:38:16 +08:00
yanlong.wang
8a8ae10919
fix: curl error category
2025-03-10 12:24:54 +08:00
yanlong.wang
5f6cfdf280
deps: cleanup
2025-03-10 12:21:23 +08:00
Yanlong Wang
19a0bbe924
fix: bad snapshot in sideload should not throw directly
2025-03-10 09:48:22 +08:00
Yanlong Wang
ead906e603
fix: runtime NODE_COMPILE_CACHE dir
2025-03-10 09:32:49 +08:00
Yanlong Wang
6e78e38e95
feat: leveraging NODE_COMPILE_CACHE ( #1162 )
...
* wip: try to leverage NODE_COMPILE_CACHE
* fix
* fix
* fix
* fix
* fix: black hole detector
* bhd: also tracking curl requests
2025-03-10 09:23:25 +08:00
Yanlong Wang
d0e20cc086
fix: several crash cases
2025-03-09 12:01:52 +08:00
Yanlong Wang
6b9e14de62
feat: md options pass though to turndown
2025-03-09 10:31:39 +08:00
Yanlong Wang
2720b69e60
deps: bump
2025-03-08 23:59:02 +08:00
Yanlong Wang
3020d589b6
fix: catch jsdom errors
2025-03-08 23:17:53 +08:00
Yanlong Wang
da48d0e4a7
deps: bump
2025-03-08 22:27:56 +08:00
Yanlong Wang
4ca627c0c5
fix: guard invalid domain names
2025-03-08 22:21:25 +08:00
Yanlong Wang
4830ff5fda
fix: potential fix for firestore grpc connection reset
2025-03-08 21:37:55 +08:00
Yanlong Wang
fd328cbcc2
fix
2025-03-08 20:52:35 +08:00
Yanlong Wang
8456fcecbd
fix: somehow side-loading chromewebstore would 100% crash the browser
2025-03-08 20:44:02 +08:00
Yanlong Wang
440ff4d729
fix: expect failure while loading pdf
2025-03-08 20:25:18 +08:00
Yanlong Wang
4bc6394692
fix: potential invalid pdf issue
2025-03-08 20:19:50 +08:00
Yanlong Wang
4ab28fe971
deps: bump
2025-03-08 20:04:24 +08:00
Yanlong Wang
66db31788e
cleanup: use local project code as much as possible
2025-03-08 19:32:30 +08:00
Yanlong Wang
512f225692
fix: sideload redirections
2025-03-08 18:59:36 +08:00
Yanlong Wang
c19ca2147c
fix: bug in pptr injections
2025-03-08 18:43:39 +08:00
Yanlong Wang
e551695d17
fix: fail early on special cookie redirects
2025-03-08 18:43:10 +08:00
Yanlong Wang
26f6202f79
fix: curl cookie mimicking
2025-03-08 18:14:43 +08:00
Yanlong Wang
89e5dbbe9c
fix: curl cookie behavior
2025-03-08 18:01:38 +08:00
Yanlong Wang
3b1978fd1d
fix: implement DNT in alt-gen and pdf-extract
2025-03-08 17:52:49 +08:00
Yanlong Wang
1a2754c674
fix: sideLoad header detection
2025-03-08 17:46:09 +08:00
Yanlong Wang
63a2e15f4d
fix: curl redirection location
2025-03-08 17:20:38 +08:00
Yanlong Wang
fb43578fdd
fix: curl implicit redirect
2025-03-08 17:18:53 +08:00
Yanlong Wang
8597daa96b
fix: side load context bridging
2025-03-08 16:49:14 +08:00
Yanlong Wang
e92ff33ad0
fix
2025-03-08 15:49:22 +08:00
Yanlong Wang
b674d26f76
fix: clean HTML timer
2025-03-08 13:33:10 +08:00
Yanlong Wang
4e299bf8e2
fix: remove tailwind classes instead of the opposite
2025-03-08 13:30:06 +08:00
Yanlong Wang
434c6707fd
deps: bump
2025-03-08 13:06:01 +08:00
Yanlong Wang
4d345f40f9
fix: curl error wrapping
2025-03-08 12:59:22 +08:00
Yanlong Wang
4744b8645f
deps: bump
2025-03-08 11:30:05 +08:00
Yanlong Wang
c326088327
fix: deps
2025-03-08 10:54:08 +08:00
Yanlong Wang
23a3b807c9
restructure: nolonger a firebase application ( #1160 )
...
* fix: fine allow redefining Function.prototype.toString
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* fix: contentType encoding
* wip
* fix: error throwing
* wip
* fix
* wip
* fix
* fix
* fix: jsdom
* wip
* wip
* fix: links summary uniqueness
* wip
* wip
* robots-txt catch no robots.txt
* deps: remove puppeteer-extra-plugin-stealth
* fix: dont change waring type
* fix: curl
* fix: replace firebase-roundtrip-check with blackhole-detector
* fix: black hole detection
* sercher: black hole detecting
* fix: no h2c for searcher
* fix: bhd
* fix: search and crawl conflict
* fix: bhd
* fix
* fix: server script
* canvas: fixed avif issue
* logging: move some to debug
* fix
* fix: pptr declare ready only when page can be created without issues
* fix: bhd
* cd: cloud run deploy-health-check cannot complete pptr newPage
* cd: fix
* fix: curl body can be null
* fix
* fix
* fix: major fix regarding TC pdfs
* fix
* fix
* deps: fix civkit trie router issue
* fix
* boom: total restructure
* cd: fix docker ctx
* fix
* fix: switch to h2c
* cd: ensure http2
2025-03-08 00:46:52 +08:00
Aaron Ji
ed80c9a4a2
chore: expose 'num' param for search API ( #1157 )
...
* chore: expose 'num' param for search API
* fix: wrong count value
2025-03-05 11:20:55 +08:00
Aaron Ji
a7ffb58741
chore: support 'gl', 'hl', 'location' and 'page' when search ( #1156 )
2025-03-04 17:34:49 +08:00
Aaron Ji
21ae52a55a
feat: support fetching favicon ( #1155 )
...
* feat: support fetching favicon
* Update backend/functions/src/dto/scrapping-options.ts
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
* Update backend/functions/src/dto/scrapping-options.ts
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
* chore: remove 'withFavicon' from CrawlerOptions
---------
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
2025-03-04 11:32:42 +08:00
yanlong.wang
cd6a4eedf9
cd: add dev branch
2025-03-03 18:52:07 +08:00
Aaron Ji
3d29b3069a
fix: use 'no-content' ( #1154 )
...
* fix: use 'no-content' in x-respond-with
* fix: use 'no-content' in x-respond-with
2025-03-03 17:00:18 +08:00
Aaron Ji
51109e01d2
chore: improve type definitions and type safety ( #1153 )
...
* chore: improve type definitions and type safety
* chore: format search result
* chore: use 'x-details'
* chore: remove 'domain' from search result
* chore: use 'x-respond-with'
* chore: optimize logic to return content
* chore: cleanup
* chore: optimize logic
2025-02-28 18:44:38 +08:00
Yanlong Wang
4873578c62
Revert "serving: switching to h2c"
...
This reverts commit 690984d44a960a27556fc46873366f785d60666f.
2025-02-27 11:37:06 +08:00
Sha Zhou
efc405b1a0
remove version params
2025-02-27 11:33:29 +08:00