crawfishcloud
				
				
				
				
				
				
					
				
				
					
				
				
				
				
				
					A Streaming S3 Bucket Glob Crawler
				
				
				
					Install
				
				npm i crawfishcloud -S
				
					Setup
				
				// import or require
import {crawler, asVfile} from 'crawfishcloud'
// Setiup AWS-S3 with your credentials
import {S3, SharedIniFileCredentials} from 'aws-sdk'
const credentials = new SharedIniFileCredentials({profile: 'default'})
// crawfish uses your configured S3 Client to get data from S3
const crawfish = crawler({s3c: new S3({credentials})})
				
					Usage
				
				
					Async Generator
				
				for await (const vf of crawler({s3c}).vfileIter('s3://Bucket/path/*.jpg')){
  console.log({vf})
}
				
					Promise<Arrray<Vfile | Vinyl>>
				
				const allJpgs = await crawler({s3c}).vinylArray('s3://Bucket/path/*.jpg')
				
					Stream< Vfile | Vinyl >
				
				crawler({s3c}).vfileStream('/prefix/**/*.jpg').pipe(destination())
				
					Why use crawfishcloud?
				
				Ever had a set of files in S3 and you are thinking "Why can't I use a glob pattern like I would in a unix command, or in gulp, and pull all of those files out together?"
				Now you can.
				
					Features
				
				crawfishcloud supports 3 different processing patterns to handle data from your buckets.
				
					- Promised Arrays
							- While this structure is admittedly the most straight forward, it can also blow through your RAM because collapsing an S3 stream to one array can often take more space than is commericial available for RAM. Sure, maybe you are thinking "I know my data, and I just need the 5 files loaded together from this s3 prefix, and I know it will fit" - then the 
array pattern is just the ticket. 
						
					 
					- Node Streams
							- Node Streams are incredible if you are familiar with them. The 
.stream() pattern allows you to stream out a set of obejcts to your down stream processing. 
						
					 
					- AsyncGenerators
							- For many people, although Async Generators are a newer addition to the language, it will strike a sweet spot of "ease of use" and still being able to process terribly large amounts of data. since its pulled from the network on demand.
 
						
					 
					- Uses Modern Syntax
 
					- async/ await
 
					- All in about 230 lines of js code (crawfish + utils + builtin-exporters)
 
				
				
					Inspired By
				
				
				
					License
				
				MIT Eric D Moore MIT 
				
					API Reference
				
				
					Table Of Contents
					
				 
				
					
				
				
					the default export function "aka: crawler"
				
				
					params
						
							- s3c: : 
S3 
							- body: : 
boolean 
							- maxkeys: : 
number 
							- ...filters: 
string[] 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							  const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
								const arr = await crab.all({body:true, using: asVfile})
							```
						 
					
 
				
				
					> Base Returns
				
				
					
				
				
					get an AsyncGenerator<T> ready to use with a for await (){} loop where each elemement is of Type based on the Using Function
				
				
					params
						
							- body : 
boolean 
							- using : 
UsingFunc: (i:S3Item)=><T> 
							- NextContinuationToken? : 
string | undefined 
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
							- AsyncGenerator with elements of type where 
T is the  
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
							for await (const vf of crab.iter({body:true, using: asVfile}) ){
							console.log(vf)
							}
							```
						 
					
 
				
				
					
				
				
					get a Readable Node Stream ready to pipe to a transform or writable stream
				
				
					params
						
							- body : 
boolean 
							- using : 
UsingFunc: (i:S3Item)=><T> 
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
							crab.stream({body:true, using: asVfile})
							.pipe(rehypePipe())
							.pipe(destinationFolder())
							```
						 
					
 
				
				
					
				
				
					load all of the s3 url into an array. Where the array is resolved when all of the elements are populated to the array.
				
				
					params
						
							- body : 
boolean 
							- using : 
UsingFunc: (i:S3Item)=><T> 
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns 
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							  const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
								const arr = await crab.all({body:true, using: asVfile})
							```
						 
					
 
				
				
					
				
				
					Reduce the files represented in the glob into a new Type. The process batches sets of 1000 elements into memory and reduces.
				
				
					params
						
							- init : 
<OutputType> - starting value for reducer 
							- using : 
UsingFunc: (i:S3Item)=><ElementType> 
							- reducer : 
(prior:OutputType, current:ElementType, i:number)=>OutputType 
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns 
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							  const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
								const count = await crab.reduce(0, using: asS3, reducer:(p)=>p+1)
							```
						 
					
 
				
				
					> Streams
				
				
					
				
				
					a stream of vfiles
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c})
							crab.vfileStream('s3://ericdmoore.com-images/*.jpg')
							.pipe(jpgOptim())
							.pipe(destinationFolder())
							```
						 
					
 
				
				
					
				
				
					a stream of vinyls
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c})
							crab.vinylStream('s3://ericdmoore.com-images/*.jpg')
							.pipe(jpgOptim())
							.pipe(destinationFolder())
							```
						 
					
 
				
				
					
				
				
					a stream of S3 Items where S3 list object keys are mixed in with the the getObject keys - called an S3Item
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c})
							crab.s3Stream('s3://ericdmoore.com-images/*.jpg')
							.pipe(S3ImageOptim())
							.pipe(destinationFolder())
							```
						 
					
 
				
				
					> AsyncGenerators
				
				
					
				
				
					get an AyncGenerator thats is ready to run through a set of VFiles
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
							AsyncGenerator<VFile, void, undefined> 
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c})
							for await (const vf of crab.vfileIter('s3://ericdmoore.com-images/*.jpg') ){
							console.log(vf)
							}
							```
						 
					
 
				
				
					
				
				
					get an AyncGenerator thats is ready to run through a set of Vinyls
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
							AsyncGenerator<Vinyl, void, undefined> 
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c})
							for await (const v of crab.vinylIter('s3://ericdmoore.com-images/*.jpg') ){
							console.log(vf)
							}
							```
						 
					
 
				
				
					
				
				
					get an AyncGenerator thats is ready to run through a set of S3Item
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
							AsyncGenerator<S3Item, void, undefined> 
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c})
							for await (const s3i of crab.s3Iter('s3://ericdmoore.com-images/*.jpg') ){
							console.log(s3i)
							}
							```
						 
					
 
				
				
					> Promised Arrays
				
				
					
				
				
					get an array of vfiles all loaded into a variable
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							  const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
								const vfArr = await crab.vfileArray()
							```
						 
					
 
				
				
					
				
				
					get an array of vinyls all loaded into a variable
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							  const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
								const vArr = await crab.vinylArray()
							```
						 
					
 
				
				
					
				
				
					get an array of S3Items all loaded into a variable
				
				
					params
						
							- ...filters: 
string[] - will overite any configured filters already given to the crawfish - last filters in wins 
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							  const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
								const arr = await crab.s3Array()
							```
						 
					
 
				
				
					> Exporting Functions
				
				
					
				
				
					turn an S3 object into a vfile
				
				
					params
						
					 
					returns
						
					 
					
							example
							```ts
							import {crawler, asVfile} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
							for await (const vf of crab.iter({body:true, using: asVfile}) ){
							console.log(vf)
							}
							```
						 
					
 
				
				
					
				
				
					turn an S3 object into a vinyl
				
				
					params
					 
					returns
					 
					
							example
							```ts
							import {crawler, asVinyl} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
							for await (const vf of crab.iter({body:true, using: asVinyl}) ){
							console.log(vf)
							}
							```
						 
					
 
				
				
					
				
				
					Just pass the S3 object structure along
				
				
					params
					 
					returns
					 
					
							example
							```ts
							import {crawler, asS3} from 'crawfishcloud'
							import {S3, SharedIniFileCredentials} from 'aws-sdk'
							const credentials = new SharedIniFileCredentials({profile:'default'})
							const s3c = new S3({credentials, region:'us-west-2'})
							const crab = crawler({s3c}, 's3://ericdmoore.com-images/*.jpg')
							for await (const vf of crab.iter({body:true, using: asS3}) ){
							console.log(vf)
							}
							```
						 
					
 
				
				
					namesake
				
				crawfish cloud because why not, and because regular crawfish are delightful and they crawl around in a a bucket for a time. So clearly crawfishcloud is a crawler of cloud buckets.
				Logo credit: deepart.io