Coding & Interpretation
Job postings are a rich source of information regarding the openings they advertise. The typical posting includes some combination of descriptive information (e.g. employer name, job title, etc.), job requirements (e.g. skills, education, experience, certifications, etc.) and compensation (salary and benefits). Resumes are similarly laden with highly relevant information regarding the job seeker they present, which duplicates to a large extent the types of information contained in job postings (e.g. employer names, job titles, skills, education, etc.). The only difference is that job postings describe just the one job, which could be filled by any of multiple job seekers, whereas a resume describes just the one job seeker, whose personal history might very well include more than one prior job and many different qualifications.
Our parsing engine’s ability to extract all this information from free-text job postings and resumes and convert it into actionable intelligence is the foundation of Burning Glass’s suite of job matching, career exploration, training referral, and labor market intelligence solutions.
The core of the engine consists of a statistical model which uses word meaning, context, sentence structure, and surroundings to determine form, subject and meaning. Burning Glass has “trained” the model on thousands of properly annotated postings and resumes; through repetition, the model learns to mimic the behavior of human annotators.
Here’s how it works:
- In order for our parsing engine to extract any of this data from a free-text job posting or resume, a variable (e.g. JobTitle) must be added to our taxonomy. This enables “coding” of the variable whenever it is found in a posting.
- As the engine parses a posting or resume, it finds data that matches the variables in our taxonomy and inserts “tags” (markers) describing what the adjacent text is all about. The tags thus inserted include such things as “the beginning of the title of the second job from the first employer described in the resume’s experience section.”
- Tags also denote inferred information, e.g. the latitude and longitude of a given address, the Standard Industry Classification (SIC) code for each employer, a standardized job title for each job, the level of each academic degree, and a code for the major area of study.
- When the coding process is complete, any one posting or resume can feature hundreds of tags that define hundreds of data fields suitable for use in a standard database or for job matching and search.
Coding can be performed using any of these techniques:
- Referenced Values – For certain fields, we look up text strings (e.g. employer name) against reference tables to attribute a structured value
- Inferred Values/Text Mining – When a Referenced Value is not expressly stated in a posting but is described and implied by text, our system is able to infer a specific value (e.g. Inferred NAICS)
- Semantic Analysis – For unstated values, our system “reads” text looking for word patterns that might indicate a value
Burning Glass is continually expanding our taxonomy by adding new variables, but only upon rigorous evaluation of the information in question. Not all information warrants conversion to data. Context (i.e. where the information is typically found within a job posting or resume) is important, as is the determination of how best to capture the data and how best to present it.
Moreover, our unique deduplication algorithm further leverages our parsing and coding capabilities by considering the actual job functions and skills described by the employer rather than text – we focus on the content of the posting, not simply the words or basic fields. While this methodology currently yields an almost 90% accuracy rate, we continue to strive to bring duplicates close to null.
In total, Burning Glass’s technology can extract, derive, and infer more than 70 data elements from any given free text resume or job posting – far more than any other parsing engine in the industry today.