firecrawl

mirror of https://git.mirrors.martin98.com/https://github.com/mendableai/firecrawl synced 2025-08-15 18:06:00 +08:00

Author	SHA1	Message	Date
Gergő Móricz	b03670a8b7	feat: parse PDFs on fc side and reject if too long for timeout (FIR-2083) (#1592 ) * feat: pdf-parser, implementation in scrapeURL * use pdf-parser for page count instead of mu * fix(pdf-parser): bindings * feat(scrapeURL/pdf): adjust MILLISECONDS_PER_PAGE * implement post-runsync polling and fix * fix(Dockerfile): copy in the pdf-parser source code * fix(scrapeURL/pdf): better error for timeout below 0	2025-05-23 13:45:53 +02:00
Gergő Móricz	97dee5de21	temp(ci/self-host): remove no-proxy option	2025-03-12 18:19:24 +01:00
Gergő Móricz	e1cfe1da48	feat(crawl): includes/excludes fixes (FIR-1300) (#1303 ) * feat(crawl): includes/excludes fixes pt. 1 * fix(snips): billing tests * drop tha logs * fix(ci): add replica url * feat(crawl): drop initial scrape if it's not included * feat(ci): more verbose logging * fix crawl path in test * fix(ci): wait for api * fix(snips/scrape/ad): test for more pixels * feat(js-sdk/crawl): add regexOnFullURL	2025-03-06 17:05:15 +01:00
Gergő Móricz	c38dcd0432	feat(self-host): proxy support (FIR-1111) (#1212 ) * feat(self-host): proxy support * fix(playwright-service-ts): return untreated text/plain	2025-02-20 14:20:03 +01:00
Loris	100168ddf3	Add searxng for search endpoint (#1193 ) * add searxng.ts * update to add searxng endpoint * Apply suggestions from code review * feat(ci/self-host): add tests with searxng * feat(ci/self-host): bootstrap searxng for testing * feat(ci): improvements in syntax --------- Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>	2025-02-20 12:36:53 +01:00
Gergő Móricz	04218de2b0	Revert "feat(ci): use pull_request_target (+ manual approval)" This reverts commit 9142030881e0d153396279520e127b74af8417c9.	2025-02-20 10:58:08 +01:00
Gergő Móricz	9142030881	feat(ci): use pull_request_target (+ manual approval)	2025-02-20 10:52:29 +01:00
Gergő Móricz	bc5a16d048	feat(ci/test-server): build go markdown parser	2025-02-20 10:05:39 +01:00
Gergő Móricz	c39cc27866	feat(ci/self-host): add playwright microservice tests (#1210 ) * feat(ci/self-host): add playwright microservice tests * fix ci * fix ci 2 * fix ci 3 * fix(playwright-service): get raw JSON if response is JSON	2025-02-20 02:06:13 +01:00
Gergő Móricz	2200f084f3	SELFHOST FIXES (#1207 ) * fix(extract): construct OpenAI on demand Fixes hard-crash if api key not specified in a self-hosting environment. * fix(ci): try sleeping * fix(ci): override host * fix(ci): wait for server to start * Support /extract and /crawl for self-hosted (FIR-1097) (#1137) * Support /extract for self-hosted This returns the job response from redis rather than supabase when db auth is disabled (self hosted mode) * Use getJob for extract and use correct types * fix(v1/crawl-status): only poll DB for total count if DB is enabled * feat(snips): TEST_SUITE_SELF_HOSTED * fix(ci/test-server-self-host): use pr trigger * fix(scrapeURL): f-e mocking in selfhosted env * fix(snips): do not try to eval json format on selfhost * fix(scrapeURL): further f-e mocking * fix(snips): don't timeout on hard fail polling * fix(v1/extract-status): fix-up the db-agnostic impl unfortunately had to separate the functions since the schema was too divergent :( * fix(snips): boost screenshot delay * feat(ci): test with openai * feat(ci): extract, search testing * fix(ci): matrix * fix(ci): bleh * Update: fix default google search (#1174) * fix log title * search should always work * asd * fix ci --------- Co-authored-by: Nick Roth <nlr06886@gmail.com> Co-authored-by: William <sdustusun@gmail.com>	2025-02-20 00:41:22 +01:00
Gergő Móricz	c1ca64fc07	fix(ci): retain fail status	2025-02-19 22:31:16 +01:00
Gergő Móricz	f4f75fe184	fix(ci): path to lock	2025-02-19 22:15:41 +01:00
Gergő Móricz	e9cb8ac956	feat(ci): caching improvements	2025-02-19 22:11:32 +01:00
Gergő Móricz	0eff99007a	feat(ci): upload logs	2025-02-19 22:02:32 +01:00
Gergő Móricz	9976560567	feat(ci): self-hosted server test suite	2025-02-19 21:55:48 +01:00

15 Commits