From unstructured data to product properties

Motivation

The DLR Institute of Data Sciences focuses on finding solutions for the new challenges of the digital era. The focus is on data management, IT security, smart systems and citizen sciences. This also includes planning processes in space travel. The planning of satellites is tied to technical products from suppliers. Products available on the market are often represented by technical descriptions in the form of PDF files. This unstructured data source cannot be searched for product properties, and the product type cannot be derived for a search.

Objective

  • Developing a method to convert text documents (technical component descriptions) into structured data
  • Deriving a research tool that works with specific characteristics instead of text components

Results

  • Semantic research in component descriptions based on PDF collections
  • Added value: Simplified embedding of documents in planning processes

DLR Institute of Data Sciences

Location: Jena

Size: 65 employees

Industry: data management, IT security

Contact

Dr. Andreas Niekler

University of Leipzig

aniekler@informatik.uni-leipzig.de