Parsing & Extraction
Burning Glass’s technology for reading, understanding, and cataloging information directly from free text resumes and job postings is truly state-of-the-art. These documents present unique challenges for data parsing and extraction because free text is by its very nature unstructured (i.e. not comprised of defined fields of information) and features considerable ambiguities in the meaning and context of words which may not be readily discerned in an automated way.
Conventional systems approach such challenges through rudimentary semantic, or lexical, analysis, in which computers are tasked with inferring the meaning of words by combining a thesaurus reference against a true/false decision-tree structure. They have a limited ability to process data values that are not found within their internal dictionaries or when data is presented in unexpected formats. As a result, these solutions restrict the number of data elements extracted only to those most easily retrieved from data lists or with fixed, hard-coded rules (e.g. job titles, locations, employer names).
Unlike other engines which rely on fixed rules and predefined templates, Burning Glass has developed patented technology for human capital data extraction using a field of artificial intelligence known as Statistical Natural Language Processing (SNLP). By leveraging SNLP, our system learns from the way the millions of past resumes and job postings it has processed were written and formatted. Rather than looking for information to be in specific sequences or formats, the system uses context to categorize information based on what it describes, evaluating each word independently based both on meaning and contextual cues. For example, it knows the difference between someone who has worked for the University of Iowa and someone who has attended the University of Iowa and between someone who has prepared budgets and someone who has worked at Budget Rent A Car.
Other solutions can be easily confounded when resumes or job postings are written in non-standard formats or when information sequence is jumbled in a table conversion. Because Burning Glass relies primarily on contextual cues and observed patterns of writing rather than fixed rules, our technology is particularly adept when it comes to discerning information from complex fields like work history – which can be quite unstructured. As a result, our parsing engine delivers the highest levels of accuracy in the industry.
In addition, our technology goes far beyond other solutions at normalizing the data that it extracts. This includes automated inference of standardized geocodes, industry codes, functional codes, educational levels, academic major codes, and normalization of dates and years of experience. Skills are rolled up to a standardized, hierarchical, and customizable dictionary of skills that also includes a full contextual summary including context, duration, and recentness of use. All coding schemes and skills dictionaries are fully customizable.
The implications of this are profound: by coding and normalizing a wide array of information from resumes and jobs, Burning Glass’s parsing engine opens up powerful structured search options for job seekers, employers, and anyone else who is researching the job market. Some examples of structured search are range-bound ("find jobs with salary between $30,000 and $40,000 per annum") and contextual (“find job seekers who have worked for Microsoft, not those who have used Microsoft Office”). Additional structured search criteria include experience level, degree level, specific degrees or certifications, salary, industry code, functional code, and distance.
Structured search is a significant value add, because it largely removes the ambiguity and hit-or-miss nature of keyword searching. For example, when conducting a keyword search for individuals with a college degree, users specifying "college", "bachelor", "BA", "BS", "B.S.", and "B.A." would still miss candidates with degrees such as "LLB" or "BN" or "B.Sc.", among many others. By contrast, Burning Glass’s advanced technology engine facilitates easy retrieval and reporting using structured search techniques by normalizing all applicable degrees to a cataloged degree name and to a standardized degree level as part of the parsing and extraction process.