API
function: execute(jobDefinition)
locust.execute(jobDefinition)
jobDefinition<Object>
returns: <Promise<Object>> Returns a promise that resolves to a jobResult
// example.js
const { execute } = require('locust');
const execSync = require('child_process');
const job = {
start: () => execSync('./example.js'),
url: 'http://localhost:3001',
config: {
name: 'collect-data',
depthLimit: 1,
},
connection: {
redis: {
port: 6379,
host: 'localhost'
},
chrome: {
browserWSEndpoint: 'ws://localhost:3000'
},
}
};
(() => execute(job))()
Starts a Locust job. On first run, the job runs against the entrypoint url and on subsequent runs, the first queued job is run.
object: jobDefinition
jobDefinition<Object>url<string>the entrypoint url for the jobbeforeAll<Function>optionalbefore<Function>optionalafter<Function>optionalstart<Function>extract<Function>optionalconfig<Object>Defines settings that determine global behavior of Locustname<string>a unique name to identify the joblogLevel<Number>optionalRFC5424 log level - logging is disabled if omittedconcurrencyLimit<Number>the maximum number of concurrent jobsdepthLimit<Number>the maximum link depth from the entrypoint url - when met, the Locust will stop processing additional jobs accross all instances of this jobdelay<Number>optionalwait time in milliseconds before starting a job after popping it from the queue
filter<Function|Object>optionalfilter links by a hostname or functionconnection<Object>
Configuration object that defines how to connect to Chrome and Redis and how the system behaves.
object: jobResult
jobResult<Object>cookies<Object>data<?Object>Return value of thejobDefinition.extractfunction if one was definedlinks<Array>response<Object>
Object containing the result of the job including the raw response, extracted links, cookies, and extracted data.
object: jobData
jobData<Object>url<string>address for the jobdepth<Number>page distance of the job from the entrypoint url in thejobDefinition
Minimal job representation used primarily within the Redis queue
object: snapshot
snapshot<Object>state<'ACTIVE'|'INACTIVE'>current state of Redis queuequeue<Object>each value contains an array of urlsprocessing<Array<string>>done<Array<string>>queued<Array<string>>
A snapshot of the Redis queue at a given point in time
object: response
response<Object>ok<Boolean>status<Number>HTTP response codestatusText<string>HTTP response messageheaders<Object>url<string>url after following redirects or any page navigationbody<string>html content of the page
Response from the HTTP request after navigating to the url in jobData or url in the jobDefinition
function: beforeAll
jobDefinition.beforeAll(browser, snapshot, jobData)browser<Puppeteer.Browser>Puppeteer browser instancesnapshot<Object>A snapshot of the Redis queue at the time the job was poped from the Redis queuejobData<Object>Current job's data
User defined hook to run once before the first job is processed
function: before
jobDefinition.before(page, snapshot, jobData)page<Puppeteer.Page>Puppeteer page instancesnapshot<Object>A snapshot of the Redis queue at the time the job was poped from the Redis queuejobData<Object>Current job's data
User defined hook to run before every job is processed
function: after
jobDefinition.after(jobResult, snapshot, stopQueue)
User defined hook to run after every job is processed
...
after: async (jobResult, snapshot, stop) => {
if (snapshot.queue.done.length >= 5)
await stop()
}
...
function: start
jobDefinition.start()
User defined hook to define how to invoke a new instance of Locust within the parent context (e.g. AWS Lambda, system process)
function: extract
jobDefinition.extract($, browser, jobData)$(selector)<Function>convenience function to get the text of an element on the pageselector<string>CSS selector e.g.ul li .description- returns:
<Promise<string>>the text content of the first element at the selector - throws
BrowserError: when there is no element found at the selector
page<Puppeteer.Page>Puppeteer current page instancebrowser<Puppeteer.Browser>Puppeteer browser instancejobData<Object>Current job's data
User defined hook to extract data from the page
function: filter
jobDefinition.filter(links)
Filter which links are added to the queue from the page
object: filter
filter<Object>allowList<Array<string>>list of hostnames to allowblockList<Array<string>>list of hostnames to block
Filters which links are added to the queue from the page based on the hostname. Both lists can be used in conjunction.
class: GeneralJobError
locust.error.GeneralJobError(message, url)message<string>url<string>
Thrown when Locust encounters an error that causes it to abort
// example.js
const { execute, error: { GeneralJobError } } = require('locust');
const job = require('./job');
(async () => {
try {
await execute(job)
} catch (e) {
if (e instanceof GeneralJobError)
return console.log(e.message);
throw e;
}
})()
class: QueueEndError
locust.error.QueueEndError(message, url)message<string>url<string>
Returned when a global queue end condition is met e.g. no more queued jobs remaining or depth limit has been met
class: QueueError
locust.error.QueueError(message, url)message<string>url<string>
Returned when a transient condition its met where another job can not be started e.g. concurrency limit has been met
class: BrowserError
locust.error.BrowserError(response)message<string>url<string>response<Object>
Thrown when Chrome encounters an error