API
execute(jobDefinition)
function: locust.execute(jobDefinition)
jobDefinition
<Object>
returns: <Promise<Object>>
Returns a promise that resolves to a jobResult
// example.js
const { execute } = require('locust');
const execSync = require('child_process');
const job = {
start: () => execSync('./example.js'),
url: 'http://localhost:3001',
config: {
name: 'collect-data',
depthLimit: 1,
},
connection: {
redis: {
port: 6379,
host: 'localhost'
},
chrome: {
browserWSEndpoint: 'ws://localhost:3000'
},
}
};
(() => execute(job))()
Starts a Locust job. On first run, the job runs against the entrypoint url and on subsequent runs, the first queued job is run.
jobDefinition
object: jobDefinition
<Object>
url
<string>
the entrypoint url for the jobbeforeAll
<Function>
optionalbefore
<Function>
optionalafter
<Function>
optionalstart
<Function>
extract
<Function>
optionalconfig
<Object>
Defines settings that determine global behavior of Locustname
<string>
a unique name to identify the joblogLevel
<Number>
optionalRFC5424 log level - logging is disabled if omittedconcurrencyLimit
<Number>
the maximum number of concurrent jobsdepthLimit
<Number>
the maximum link depth from the entrypoint url - when met, the Locust will stop processing additional jobs accross all instances of this jobdelay
<Number>
optionalwait time in milliseconds before starting a job after popping it from the queue
filter
<Function|Object>
optionalfilter links by a hostname or functionconnection
<Object>
Configuration object that defines how to connect to Chrome and Redis and how the system behaves.
jobResult
object: jobResult
<Object>
cookies
<Object>
data
<?Object>
Return value of thejobDefinition.extract
function if one was definedlinks
<Array>
response
<Object>
Object containing the result of the job including the raw response, extracted links, cookies, and extracted data.
jobData
object: jobData
<Object>
url
<string>
address for the jobdepth
<Number>
page distance of the job from the entrypoint url in thejobDefinition
Minimal job representation used primarily within the Redis queue
snapshot
object: snapshot
<Object>
state
<'ACTIVE'|'INACTIVE'>
current state of Redis queuequeue
<Object>
each value contains an array of urlsprocessing
<Array<string>>
done
<Array<string>>
queued
<Array<string>>
A snapshot of the Redis queue at a given point in time
response
object: response
<Object>
ok
<Boolean>
status
<Number>
HTTP response codestatusText
<string>
HTTP response messageheaders
<Object>
url
<string>
url after following redirects or any page navigationbody
<string>
html content of the page
Response from the HTTP request after navigating to the url in jobData
or url
in the jobDefinition
beforeAll
function: jobDefinition.beforeAll(browser, snapshot, jobData)
browser
<Puppeteer.Browser>
Puppeteer browser instancesnapshot
<Object>
A snapshot of the Redis queue at the time the job was poped from the Redis queuejobData
<Object>
Current job's data
User defined hook to run once before the first job is processed
before
function: jobDefinition.before(page, snapshot, jobData)
page
<Puppeteer.Page>
Puppeteer page instancesnapshot
<Object>
A snapshot of the Redis queue at the time the job was poped from the Redis queuejobData
<Object>
Current job's data
User defined hook to run before every job is processed
after
function: jobDefinition.after(jobResult, snapshot, stopQueue)
User defined hook to run after every job is processed
...
after: async (jobResult, snapshot, stop) => {
if (snapshot.queue.done.length >= 5)
await stop()
}
...
start
function: jobDefinition.start()
User defined hook to define how to invoke a new instance of Locust within the parent context (e.g. AWS Lambda, system process)
extract
function: jobDefinition.extract($, browser, jobData)
$(selector)
<Function>
convenience function to get the text of an element on the pageselector
<string>
CSS selector e.g.ul li .description
- returns:
<Promise<string>>
the text content of the first element at the selector - throws
BrowserError
: when there is no element found at the selector
page
<Puppeteer.Page>
Puppeteer current page instancebrowser
<Puppeteer.Browser>
Puppeteer browser instancejobData
<Object>
Current job's data
User defined hook to extract data from the page
filter
function: jobDefinition.filter(links)
Filter which links are added to the queue from the page
filter
object: filter
<Object>
allowList
<Array<string>>
list of hostnames to allowblockList
<Array<string>>
list of hostnames to block
Filters which links are added to the queue from the page based on the hostname. Both lists can be used in conjunction.
GeneralJobError
class: locust.error.GeneralJobError(message, url)
message
<string>
url
<string>
Thrown when Locust encounters an error that causes it to abort
// example.js
const { execute, error: { GeneralJobError } } = require('locust');
const job = require('./job');
(async () => {
try {
await execute(job)
} catch (e) {
if (e instanceof GeneralJobError)
return console.log(e.message);
throw e;
}
})()
QueueEndError
class: locust.error.QueueEndError(message, url)
message
<string>
url
<string>
Returned when a global queue end condition is met e.g. no more queued jobs remaining or depth limit has been met
QueueError
class: locust.error.QueueError(message, url)
message
<string>
url
<string>
Returned when a transient condition its met where another job can not be started e.g. concurrency limit has been met
BrowserError
class: locust.error.BrowserError(response)
message
<string>
url
<string>
response
<Object>
Thrown when Chrome encounters an error