Starts the execution of a job starting with the
url in the job definition or picks up a new job from the queue.
Validates the job definition. See API docs for what constitutes a valid definition.
Registers the job with the Redis queue. Behaviour will differ based on whether it is the first run of this job.
On first run of a new job,
queue.register will add a single job to the Redis
processing list with the
url defined in the job definition and process that job.
On all subsequent runs,
queue.register will pop the next job from the Redis
queued list. If the depth limit has been execeeded, the concurrency limit has been met, or another stop condition has been triggered, the execution will abort.
Executes a user defined hook that runs once before the first request is made on the first run.
This hook can be used to handle authentication flows and get crendentials for the first and subsequent job runs.
Executes a user defined hook that runs before each request is made.
This hook can be used to set user credentials prior to making the request.
Using the connection to the Chrome browser defined in the job defintion, a request is executed to the url for the current job.
Links are also extracted from the page which will eventually added to the queue.
Executes a user defined hook that runs as part of the request made to the url.
This hook can be used to extract data from the page.
Moves the current job to the
done Redis list and adds the links extracted from the page to the queue.
Executes a user defined hook that runs after the request completes.
This hook can be used to process the data from the request and/or persist data to datastore.
Starts additional job runs up to the concurrency limit by calling the
start user defined hook which is used to define how new jobs are started.
In a serverless context, the
start function might invoke a new instance of the serverless function.
In a local, non-distributed context, the
start function might start a new system process where a job would run.