Automatic classification of technical documents using artificial intelligence

The Classifier

Friday, 04/26/2019

At a glance

Today’s industrial companies have vast amounts of data and technical documents stored on their product lifecycle and data management systems, and the only way of managing this data is through an automated pre-sorting and classification process.
Now, PITERION has implemented a solution to help. Based on the open source framework TensorFlow, the solution is trained with labeled data, which then enables it to assign labels to similar text or PDF files.
The solution can be installed as a web service — either on a company’s own server or in the cloud — and can be trained with new data as required.

Lots of data, numerous tasks

Digitalization within the product lifecycle, combined with the integration of more and more areas into product lifecycle management (PLM) systems, is generating all kinds of data. This data is then required to carry out various different tasks, which means it has to be in a form that people can work with. While most PLM systems already give data a certain structure, data is nevertheless often stored in a non-standardized form, is poorly maintained, or quite simply missing. Systems using artificial intelligence can help, for example by adding metadata based on existing data in a product structure, thus ensuring that entries are complete. What’s more, new data loaded into the system can be given the necessary attributes based on the properties of similar data already available.

Developing machine learning models

As part of a student’s paper written in cooperation with Prof. Dr. Klaus Brinker from the University of Applied Sciences in Hamm-Lippstadt, Germany, a number of models for classifying texts were examined and compared (see Natural Language Processing (NLP) for more information). The project involved loading freely accessible academic articles on the subject of medicine from the Europe PubMed Central platform (https://europepmc.org/). The articles had already been categorized according to the symptoms discussed or treated in each case. This set of data — consisting of some 3,500 training and 1,500 test data sets — was then used to train and validate the various machine learning models.

The models created as a result are, of course, not restricted to medical data. They can also be trained and used together with other data sets — for instance, technical documentation that has to be assigned to certain specialist departments. The model’s parameters are adapted depending on the data so that the underlying mathematical process (e.g. neural networks) can create a good configuration.

The best model was identified based on various metrics, and this was then extracted and used in other software programs.

Using artificial intelligence in software solutions

The best model has been rolled out as a “pickle object” on a web service that we host for demonstration purposes. Here, it can classify medical articles uploaded as PDF files.
These machine learning models can, of course, be used in a variety of scenarios. This kind of classification model would be ideal, for example, in a workflow that assigns files to a specialist department for inspection.
There are countless other applications, too, particularly in the field of PLM, with each one depending on the data available and the requirements of the customer.

Customer benefits

With PITERION, customers benefit not only from the company’s long-standing experience as a PLM specialist in the seamless integration of various applications, but also from university and college innovations, introduced by colleagues with new and fresh ideas.

Further topics in cooperation with universities and research institutions are already in the planning stage, and are set to become exciting new additions to PITERION's portfolio.

The Classifier
share this article

About PITERION

PITERION is an international PLM service provider headquartered in Germany with branches in the USA, India, Tunisia, Sweden, Switzerland and Poland. Our highly qualified staff offers solutions and services independent of system manufacturers. We always follow the provisions of the EN 9100 (aerospace), ISO 9001 and ISO 14001 quality management standards. We have set up internal standards and processes that enable us to maintain a consistently high quality of service across our sites throughout the world.

We constitute a competent and independent partner of our customers providing tailor-made solutions and services. Jointly with our technology providers we may therefore support the PLM strategy of our customers in the best way possible.