48 Commits

Author SHA1 Message Date
Caleb Peffer
d39d3be649 Caleb: now extracting and returning a list of all links on the page for a customer 2024-07-16 18:38:03 -07:00
Nicolas
e098e88ea7 Nick: 2024-07-12 22:02:08 -04:00
Rafael Miller
f0f449fe51
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
2024-07-02 09:45:21 -03:00
Jeff Pereira
a5fb45988c new feature allowExternalContentLinks 2024-06-28 17:23:40 -07:00
Eric Ciarla
87b54488d3 update to includeRawHtml 2024-06-28 17:07:47 -04:00
Eric Ciarla
70fcf2ce03 init 2024-06-28 16:39:09 -04:00
Nicolas
1d4907acc9 Nick: 2024-06-26 21:02:58 -03:00
Rafael Miller
f9c7ca9388
Merge branch 'main' into feat/issue-266 2024-06-14 11:47:58 -03:00
Rafael Miller
3e2e76311c
Merge branch 'main' into feat/issue-205 2024-06-14 11:25:20 -03:00
rafaelsideguide
bb859ae9a7 Added metadata.pageStatusCode and metadata.pageError properties to the responses 2024-06-13 17:08:40 -03:00
rafaelsideguide
676d6e8ab5 Added pageOptions.removeTags 2024-06-13 10:51:05 -03:00
rafaelsideguide
e37d151404 added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
2024-06-12 15:06:47 -03:00
rafaelsideguide
dc6acbf1f0 Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option 2024-06-12 11:01:05 -03:00
Nicolas
520739c9f4 Nick: fixed bugs associated with absolute path replacements 2024-06-11 12:43:16 -07:00
rafaelsideguide
ee282c3d55 Added allowBackwardCrawling option 2024-06-11 15:24:39 -03:00
Nicolas
f6b06ac27a Nick: ignoreSitemap, better crawling algo 2024-06-10 18:12:41 -07:00
Nicolas
b4c6819a54 Nick: 2024-06-05 11:11:09 -07:00
Nicolas
6bea803120 Nick: 2024-05-31 15:39:54 -07:00
Nicolas
6c939d534d Nick: small refactor 2024-05-29 19:43:51 -07:00
Eric Ciarla
a0e404f94e init commit 2024-05-29 18:56:57 -04:00
Nicolas
1b3547dcf2 Nick: 2024-05-28 12:56:24 -07:00
Nicolas
77a79b5a79 Nick: max num tokens for llm extract (for now) + slice the max 2024-05-20 17:07:38 -07:00
Nicolas
8a72cf556b Nick: 2024-05-13 21:10:58 -07:00
Nicolas
a96fc5b96d Nick: 4x speed 2024-05-13 20:45:11 -07:00
Nicolas
dcedb8d798 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:49 -07:00
Nicolas
6505bf6bf2 Merge branch 'main' into feat/max-depth 2024-05-07 10:20:44 -07:00
Nicolas
bdbee963f7 Merge branch 'main' into nsc/cancel-job 2024-05-07 10:13:43 -07:00
rafaelsideguide
e1f52c538f nested includeHtml inside pageOptions 2024-05-07 13:40:24 -03:00
rafaelsideguide
83f3408634 Added max depth option 2024-05-07 11:06:26 -03:00
Nicolas
6d5da358cc Nick: cancel job 2024-05-06 17:16:43 -07:00
rafaelsideguide
509250c4ef changed to includeHtml 2024-05-06 19:45:56 -03:00
rafaelsideguide
538355f1af Added toMarkdown option 2024-05-06 11:36:44 -03:00
Nicolas
2aa09a3000 Nick: partial docs working, cleaner 2024-05-04 12:30:12 -07:00
Caleb Peffer
3ca9e5153f Caleb: trying to get loggin workng 2024-04-30 09:20:15 -07:00
Caleb Peffer
4f7737c922 Caleb: added ajv json schema validation. 2024-04-29 12:12:55 -07:00
Caleb Peffer
06497729e2 Caleb: got it to a testable state I believe 2024-04-28 15:52:09 -07:00
Roger M
f2690f6909 Support for tbs, filter, lang, country and location with Serper search. 2024-04-25 01:35:17 +01:00
Nicolas
307ea6f5ec Nick: improvements to search 2024-04-24 10:11:01 -07:00
Nicolas
8939ca570b Merge branch 'main' into nsc/returnOnlyUrls 2024-04-23 18:05:48 -07:00
Nicolas
fdb2789eaa Nick: added url as return param 2024-04-23 17:14:34 -07:00
Nicolas
41263bb4b6 Nick: serper support 2024-04-23 16:45:06 -07:00
Nicolas
5e3e2ec966 Nick: 2024-04-23 15:44:11 -07:00
Nicolas
0146157876 Nick: mvp 2024-04-23 15:28:32 -07:00
Nicolas
1a3aa2999d Nick: return the only list of urls 2024-04-20 11:59:42 -07:00
rafaelsideguide
72e1dadccd adding option to replace all relative paths with absolute paths 2024-04-19 11:47:20 -03:00
rafaelsideguide
57e5b36014 [Feat] Adding pdf parser 2024-04-18 11:43:57 -03:00
Nicolas
36abe0f7f9 Nick: 2024-04-17 18:24:46 -07:00
Nicolas
a6c2a87811 Initial commit 2024-04-15 17:01:47 -04:00