yanlong.wang
|
3b3a0265df
|
feat: control concurrent request per page instead of server bucket
|
2025-03-10 16:45:56 +08:00 |
|
yanlong.wang
|
c064fcf77e
|
fix: unhandledRejection log level
|
2025-03-10 15:31:38 +08:00 |
|
yanlong.wang
|
a9855dcd3b
|
chore: prefer ctx.URL
|
2025-03-10 15:25:58 +08:00 |
|
yanlong.wang
|
cf01a2c504
|
chore: add comments to clarify
|
2025-03-10 15:17:42 +08:00 |
|
yanlong.wang
|
531c660a5d
|
fix: missing url query param
|
2025-03-10 15:15:59 +08:00 |
|
yanlong.wang
|
df127d0207
|
fix: finalizer and unhandled promise rejection
|
2025-03-10 15:05:55 +08:00 |
|
yanlong.wang
|
eba1f9c0ec
|
fix: provide our onw robots.txt
|
2025-03-10 14:18:31 +08:00 |
|
yanlong.wang
|
0d6cf2b1d1
|
fix: robots-txt location
|
2025-03-10 14:07:52 +08:00 |
|
yanlong.wang
|
101cb19dde
|
fix: robots-txt cache location
|
2025-03-10 14:07:19 +08:00 |
|
yanlong.wang
|
a7a41250d4
|
fix: curl redirections
|
2025-03-10 13:46:18 +08:00 |
|
Aaron Ji
|
8ec8123ff4
|
chore: fix search result amount (#1163)
|
2025-03-10 13:38:16 +08:00 |
|
yanlong.wang
|
8a8ae10919
|
fix: curl error category
|
2025-03-10 12:24:54 +08:00 |
|
yanlong.wang
|
5f6cfdf280
|
deps: cleanup
|
2025-03-10 12:21:23 +08:00 |
|
Yanlong Wang
|
19a0bbe924
|
fix: bad snapshot in sideload should not throw directly
|
2025-03-10 09:48:22 +08:00 |
|
Yanlong Wang
|
ead906e603
|
fix: runtime NODE_COMPILE_CACHE dir
|
2025-03-10 09:32:49 +08:00 |
|
Yanlong Wang
|
6e78e38e95
|
feat: leveraging NODE_COMPILE_CACHE (#1162)
* wip: try to leverage NODE_COMPILE_CACHE
* fix
* fix
* fix
* fix
* fix: black hole detector
* bhd: also tracking curl requests
|
2025-03-10 09:23:25 +08:00 |
|
Yanlong Wang
|
d0e20cc086
|
fix: several crash cases
|
2025-03-09 12:01:52 +08:00 |
|
Yanlong Wang
|
6b9e14de62
|
feat: md options pass though to turndown
|
2025-03-09 10:31:39 +08:00 |
|
Yanlong Wang
|
2720b69e60
|
deps: bump
|
2025-03-08 23:59:02 +08:00 |
|
Yanlong Wang
|
3020d589b6
|
fix: catch jsdom errors
|
2025-03-08 23:17:53 +08:00 |
|
Yanlong Wang
|
da48d0e4a7
|
deps: bump
|
2025-03-08 22:27:56 +08:00 |
|
Yanlong Wang
|
4ca627c0c5
|
fix: guard invalid domain names
|
2025-03-08 22:21:25 +08:00 |
|
Yanlong Wang
|
4830ff5fda
|
fix: potential fix for firestore grpc connection reset
|
2025-03-08 21:37:55 +08:00 |
|
Yanlong Wang
|
fd328cbcc2
|
fix
|
2025-03-08 20:52:35 +08:00 |
|
Yanlong Wang
|
8456fcecbd
|
fix: somehow side-loading chromewebstore would 100% crash the browser
|
2025-03-08 20:44:02 +08:00 |
|
Yanlong Wang
|
440ff4d729
|
fix: expect failure while loading pdf
|
2025-03-08 20:25:18 +08:00 |
|
Yanlong Wang
|
4bc6394692
|
fix: potential invalid pdf issue
|
2025-03-08 20:19:50 +08:00 |
|
Yanlong Wang
|
4ab28fe971
|
deps: bump
|
2025-03-08 20:04:24 +08:00 |
|
Yanlong Wang
|
66db31788e
|
cleanup: use local project code as much as possible
|
2025-03-08 19:32:30 +08:00 |
|
Yanlong Wang
|
512f225692
|
fix: sideload redirections
|
2025-03-08 18:59:36 +08:00 |
|
Yanlong Wang
|
c19ca2147c
|
fix: bug in pptr injections
|
2025-03-08 18:43:39 +08:00 |
|
Yanlong Wang
|
e551695d17
|
fix: fail early on special cookie redirects
|
2025-03-08 18:43:10 +08:00 |
|
Yanlong Wang
|
26f6202f79
|
fix: curl cookie mimicking
|
2025-03-08 18:14:43 +08:00 |
|
Yanlong Wang
|
89e5dbbe9c
|
fix: curl cookie behavior
|
2025-03-08 18:01:38 +08:00 |
|
Yanlong Wang
|
3b1978fd1d
|
fix: implement DNT in alt-gen and pdf-extract
|
2025-03-08 17:52:49 +08:00 |
|
Yanlong Wang
|
1a2754c674
|
fix: sideLoad header detection
|
2025-03-08 17:46:09 +08:00 |
|
Yanlong Wang
|
63a2e15f4d
|
fix: curl redirection location
|
2025-03-08 17:20:38 +08:00 |
|
Yanlong Wang
|
fb43578fdd
|
fix: curl implicit redirect
|
2025-03-08 17:18:53 +08:00 |
|
Yanlong Wang
|
8597daa96b
|
fix: side load context bridging
|
2025-03-08 16:49:14 +08:00 |
|
Yanlong Wang
|
e92ff33ad0
|
fix
|
2025-03-08 15:49:22 +08:00 |
|
Yanlong Wang
|
b674d26f76
|
fix: clean HTML timer
|
2025-03-08 13:33:10 +08:00 |
|
Yanlong Wang
|
4e299bf8e2
|
fix: remove tailwind classes instead of the opposite
|
2025-03-08 13:30:06 +08:00 |
|
Yanlong Wang
|
434c6707fd
|
deps: bump
|
2025-03-08 13:06:01 +08:00 |
|
Yanlong Wang
|
4d345f40f9
|
fix: curl error wrapping
|
2025-03-08 12:59:22 +08:00 |
|
Yanlong Wang
|
4744b8645f
|
deps: bump
|
2025-03-08 11:30:05 +08:00 |
|
Yanlong Wang
|
c326088327
|
fix: deps
|
2025-03-08 10:54:08 +08:00 |
|
Yanlong Wang
|
23a3b807c9
|
restructure: nolonger a firebase application (#1160)
* fix: fine allow redefining Function.prototype.toString
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* fix: contentType encoding
* wip
* fix: error throwing
* wip
* fix
* wip
* fix
* fix
* fix: jsdom
* wip
* wip
* fix: links summary uniqueness
* wip
* wip
* robots-txt catch no robots.txt
* deps: remove puppeteer-extra-plugin-stealth
* fix: dont change waring type
* fix: curl
* fix: replace firebase-roundtrip-check with blackhole-detector
* fix: black hole detection
* sercher: black hole detecting
* fix: no h2c for searcher
* fix: bhd
* fix: search and crawl conflict
* fix: bhd
* fix
* fix: server script
* canvas: fixed avif issue
* logging: move some to debug
* fix
* fix: pptr declare ready only when page can be created without issues
* fix: bhd
* cd: cloud run deploy-health-check cannot complete pptr newPage
* cd: fix
* fix: curl body can be null
* fix
* fix
* fix: major fix regarding TC pdfs
* fix
* fix
* deps: fix civkit trie router issue
* fix
* boom: total restructure
* cd: fix docker ctx
* fix
* fix: switch to h2c
* cd: ensure http2
|
2025-03-08 00:46:52 +08:00 |
|
Aaron Ji
|
ed80c9a4a2
|
chore: expose 'num' param for search API (#1157)
* chore: expose 'num' param for search API
* fix: wrong count value
|
2025-03-05 11:20:55 +08:00 |
|
Aaron Ji
|
a7ffb58741
|
chore: support 'gl', 'hl', 'location' and 'page' when search (#1156)
|
2025-03-04 17:34:49 +08:00 |
|
Aaron Ji
|
21ae52a55a
|
feat: support fetching favicon (#1155)
* feat: support fetching favicon
* Update backend/functions/src/dto/scrapping-options.ts
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
* Update backend/functions/src/dto/scrapping-options.ts
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
* chore: remove 'withFavicon' from CrawlerOptions
---------
Co-authored-by: Yanlong Wang <yanlong.wang@naiver.org>
|
2025-03-04 11:32:42 +08:00 |
|