Our PROMISE: Our ads will never cover up content.
Our children thank you.
Michael Sharp
Published: Wednesday, November 9, 2022 - 12:00 (NIST: Gaithersburg, MD) -- Technical language processing, or TLP, involves using computers for capturing, understanding and translating jargon for other users. Industry and businesses have long been known to have their own specialized “languages”—words and phrases that mostly only make sense to someone in that business. This technical jargon, slang, or industry lingo has largely developed as a shorthand method to convey complex or very specific ideas and directives using a minimal amount of effort. “Peter, please get me that TSP printout for my retirement ASAP.” “Don’t overdo it with the salt, the tsp should be enough.” “I need to finish my white paper for the ivory tower by COB.” “Engine one is due for a lube inspection and a rewinding. Let’s push it until next week’s line wide PM.” Sentences like these may mean a very specific thing to you, or may mean nothing at all. Maybe you think you understand parts of them, but those same parts may mean something else to another person. Even if the letters and words are familiar to you, their context and meaning can be lost without the specific insight into where they came from. Sometimes that context can be found in the sentence itself; other times it is more elusive. Consider the term “TSP.” Any average English speaker could recognize it as an abbreviation for something. But, depending on who is reading it, where, and when, the answer to what it means could be very different. Perhaps it stands for “teaspoon,” or “Thrift Savings Plan,” or “trisodium phosphate,” or any number of other possibilities. It is the context around it that must be interpreted to understand its intent. People generally are very good at learning and translating context and intent with comparatively little additional information. Computers, however, are not. In the example above, words like “salt,” “retirement,” or “chemical” could be added to quickly allow a computer to figure out the context. But even then, there might be confusion depending on whether the word is used in a technical setting vs. a casual one. Trisodium phosphate is chemically a salt, leading to correct but confusing phrases like “ONE TSP: TSP.” I lead a group at NIST that is very interested in these kinds of highly contextual coded languages. After reading “ONE TSP: TSP,” we want a computer to be able to translate that phrase to another user as “Add one teaspoon of trisodium phosphate to the mix.” My colleagues and I study and work in the area of technical language processing (TLP)—the act of using computers for capturing, understanding, and translating jargon for other users. These can be direct actions like controlling a robot, but often, more importantly, we want computers to be able to communicate the ideas they capture back to another person. For our purposes, technical languages can be anything written or spoken in an industrial or scientific setting, where context is especially important. In many cases, this includes words or phrases that might not even appear outside of a very small group. But clearly not all language is technical. So let’s briefly talk about the wider-known counterpart to TLP. Natural language processing (NLP) is a formal area of study that takes communications by humans and transforms that information into something more suitable for computer use and analysis. In broad terms, this is performed by restructuring the communication into a form that allows it to be compared to “concepts” or ideas that the computer has previously learned. But where NLP focuses on the most common uses for words, TLP focuses on the less common uses, or meanings that can change based on context. For example, “running” and “jogging” are similar concepts, but may or may not work interchangeably depending on the context. An NLP tool might recognize both as means of locomotion, but a TLP tool could also know that jogging a memory has little relation to running a store and that neither are a means of locomotion. There are, of course, ways to get NLP to recognize these differences. But this type of problem is where TLP lives. Some of the most common applications of NLP that you encounter in your daily life are translation tools. These can be language translations, such as English to Spanish, but they can also be voice-to-text translations. Interactive chatbots and some search engines use forms of NLP. While machines have started to provide real societal benefit from NLP, TLP has yet to really show its full potential and remains a much harder task. Industry leaders have begun to recognize the need to both process high volumes of text and translate information between individuals in areas where NLP struggles to perform. So, they are starting to lean more and more toward TLP to help them. One reason is that specialized industry lingo and technical jargon are significantly different from the way people normally communicate. NLP tools trained for “normal” speech just don’t work in technical settings. NLP defaults to the most common way to use a word, which often is incorrect. Also, for most factories and businesses, the numbers of examples needed to teach a computer technical communications just don’t exist. Most NLP tools need numbers of examples in the hundreds of thousands to millions to teach them. TLP is targeted at solving these types of problems. Part of my job is to help people teach computers contextually specialized language with the fewest possible examples. Often, the only successful way to do that is with direct human oversight and input, so teaching people is also a very real part of TLP. Some areas, such as the medical field, have a head start on TLP because of a yearslong effort to create rigorous consistency in how terms are used. But other fields are just now realizing its potential. Misspellings, inconsistent shorthand, formatting differences, and slang are all common occurrences in industrial documents. My goal is to help people teach artificial intelligence that when someone inputs “Fixxed leek,” “Leak Repaired” or “John applied sealant to drip site,” they all mean the same thing despite having zero words in common. Many cases like this exist, where something obvious to a human is nearly impossible for a computer to learn on its own. Another one of the goals and challenges of TLP is to help researchers and workers in vastly different fields collaborate and search through one another’s work, despite having very different ways to talk about things. A common practice to one person might be the innovative solution needed by another, but the difference in how they speak about things keeps them separate. A sound editor in Hollywood might have found the solution to a gene sequencing problem but would never know it because she calls her method “dynamic time warping” instead of a “Levenshtein distance measure.” In another case, John may be looking to find a way to replant forests quickly after a wildfire. To ensure maximum coverage and sprouting, he needs to project a high volume of nutrient-encased seed pods a long distance without rupturing them on launch. Jim is a master paintball player and is widely known to have the longest-shooting guns with the biggest bullets. Jim may be able to help solve John’s problem . . . but John’s not interested in paintball and Jim couldn’t care less about ecology. So, despite them both having very detailed webpages about their respective work, they never find each other. TLP could help connect them. Beyond our own research, NIST is linking academic and industrial communities to help advance the development and use of TLP technologies. We helped found and continue to support an active TLP Community of Interest where everyone from researchers to users to even just the curious can come and actively participate in research and conversations on the subject. We have projects evaluating how operators assess and communicate problems with equipment, a project developing methods for analyzing technical documents, one to make diagnostic models from manuals, and more. TLP is largely about helping workers do what they already do, but making it easier, more productive, and hopefully a little less tedious. We hope the tools of TLP will soon be able to: • Help reliability engineers discover what machines are costing the most time and money for parts and repairs • Give researchers the ability to track trends in technical research, predict next steps, and identify new areas of study • Let workers tell their machines if they are doing a good or bad job, and have them respond with better behavior • Help anthropologists learn tribal or ancient languages much faster • Make connecting and tracking information from all areas of your facilities as simple a few clicks on the screen • Give computers the ability to better understand intent and context And so much more. Whether with computers or humans, language and communication operate on ideas, intent, and context. Every day that I work on TLP with my colleagues, we are pushing the boundaries of how people and computers interact through language. What could be more exciting than that? As we like to say, Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Michael Sharp is a leader in metrology research at NIST. He is an engineer, data scientist, and author with a Ph.D. in nuclear engineering. Teaching Computers to Read ‘Industry Lingo’
Technical vs. natural language processing
Credit: N. Hanacek/NIST
Credit: N. Hanacek/NIST
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Author
Michael Sharp
© 2023 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.