Conference Program Extraction

A conference program is the noisiest structured document you will routinely parse. Formats differ between conferences and often between days of the same conference; abstracts are short, frequently missing, and written to sell rather than to classify; and the fields a downstream scheduler needs most — how advanced a talk is, whether it is recorded, whether it has a capacity cap — are almost never stated outright. They have to be inferred, and an inference you cannot tell apart from a fact is a liability.

This skill turns raw program text into normalized event records in which every field carries a calibrated confidence and a one-line basis, and in which the classification signal is split into independent axes rather than crushed into a single blob. The governing principle is honest uncertainty: a record should make it obvious to the next stage which fields are solid, which are guessed, and which are simply absent.

It does three jobs and stops: detect the format, extract into the schema, score the confidence. It does not enrich thin abstracts (that is conf-abstract-enrichment), cluster (that is conf-theme-clustering), or schedule. It only flags what is thin so the next agent knows where to spend effort.

The event record (output contract)

The output is a single JSON file with a meta block (so an orchestrator can verify the stage gate without parsing every record) and an events array. This schema is canonical — reproduce it exactly; downstream skills read these field names.

conf-program-extraction

Conference Program Extraction

The event record (output contract)