Automation
Extraction
Extraction uses JSON schemas with element counters, CSS selectors, attributes, and saved persist descriptors.
Extraction
Extraction turns a page into structured JSON. The schema is explicit, replayable, and can be saved under a persist key for reuse later.
Schema fields
Each leaf field can target one of these sources:
{ element: 3 }{ selector: ".price" }{ source: "current_url" }
The top level must be a JSON object.
element and selector fields can also read an attribute:
{
name: { element: 3 },
href: { element: 4, attribute: "href" },
canonicalUrl: { source: "current_url" }
}
URL-list attributes such as srcset, imagesrcset, and ping normalize to one value.
Basic extraction
const data = await opensteer.extract({
schema: {
title: { element: 2 },
price: { selector: ".price" },
url: { source: "current_url" },
},
});
Save and replay an extraction descriptor
Save the descriptor while you explore:
const data = await opensteer.extract({
persist: "product summary",
schema: {
title: { element: 2 },
price: { element: 5 },
url: { source: "current_url" },
},
});
Replay it later without resending the schema:
const replayed = await opensteer.extract({
persist: "product summary",
});
Arrays
Array fields use one or more sample row objects:
const data = await opensteer.extract({
schema: {
items: [
{
title: { element: 10 },
price: { element: 11 },
href: { element: 12, attribute: "href" },
},
],
},
});
If a page has multiple repeating row shapes, include multiple sample row objects in the array. OpenSteer treats each object as a variant when matching rows.
CLI usage
The CLI takes the schema as a positional JSON object:
opensteer extract '{"title":{"element":2},"url":{"source":"current_url"}}' \
--workspace demo
Save the descriptor while extracting:
opensteer extract '{"title":{"element":2},"url":{"source":"current_url"}}' \
--workspace demo \
--persist "page summary"
Good workflow
- Take
snapshot extraction. - Pick counters from the current snapshot.
- Write the smallest schema that proves the page shape.
- Save a
persistkey if you expect to reuse it. - Rebuild the schema if the site layout changes substantially.
