Word embeddings in NLP

1751022147 Malaya Rout
Share the Reality


We will discuss word embeddings this week. Word embeddings represent a fundamental shift in natural language processing (NLP), transforming words into dense vector representations that capture semantic and syntactic meaning. Moving beyond sparse, context-agnostic methods such as bag-of-words and one-hot encoding, modern embedding techniques (from Word2Vec to transformers) enable machines to capture linguistic relationships and nuances essential for advanced NLP applications.

Fundamentals of word embedding

Word embeddings are vector representations of words. They help in capturing the semantic and syntactic meaning of phrases and sentences. Word embeddings play a significant role in natural language processing (NLP) tasks.

The bag-of-words and one-hot encoding have been traditional methods of representation. The bag-of-words (BoW) model converts text documents into numerical representations for machine learning by treating a document as an unordered collection of words and encoding their frequencies. One-hot converts categorical variables into a numerical format suitable for machine learning algorithms by creating a binary vector representation where exactly one bit is”hot” (set to 1), and all others are “cold” (set to 0). There were two huge limitations of the conventional methods. First, they could not determine semantic similarity. Second, they produced a very sparse dataset. Today, many solutions have emerged to these limitations.

Word embeddings rely on three mathematical foundations. First, Vector space models. Second, dimensionality reduction principles. Third, distance and similarity metrics such as Euclidean Distance and Cosine Similarity. Techniques and Models for Word Embedding

There are three prediction-based models: Word2Vec, GloVe (Global Vectors for Word Representation), and FastText. Word2Vec can use a Continuous Bag-of-Words (CBOW) or Skip-Gram architecture. CBOW (Continuous Bag of Words) is a shallow neural network with three layers that learns word embeddings by predicting a centre target word from its surrounding context words. Skip-gram is a neural network architecture within the Word2Vec framework that learns

word embeddings by predicting surrounding context words given a target centre word. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. FastText extends Word2Vec by representing words as sums of character n-gram vectors. This subword approach captures morphological structure, enabling the model to generate meaningful embeddings for rare and out-of-vocabulary words.

There are two types of contextualised word embeddings: ELMo (Embeddings from Language Models) and transformer-based models (BERT, GPT). Prediction models assign a fixed vector to each word, irrespective of context. Contextualised models address this by assigning a word’s vector representation based on its context. I guess BERT (from Google) and GPT (from OpenAI) don’t need an introduction.

Challenges and limitations

First, there could be bias in the generated embeddings. The bias could stem from the training dataset used to train these embedding models. The impact on downstream applications is negative and amplified in such cases.

Techniques for bias detection and mitigation are, no wonder, critical. Second, word embeddings lack interpretability and explainability. It isn’t easy to understand embedding dimensions. Hence, research on visualisation and interpretability tools for word embeddings is popular. Third, there are resource and computational constraints in training and inferencing embedding models.

Hence, deploying them on edge devices is a challenge. Fourth, there are limitations in adapting to specialised domains and out-of-vocabulary words. Hence, innovative methods for domain-specific fine-tuning and extension of the embedding models are necessary. Fifth, static predictive models lack context, as discussed.

Word embeddings have revolutionised natural language processing by enabling machines to capture semantic relationships and contextual meaning in ways traditional methods could not. However, realising their full potential requires addressing significant challenges: mitigating bias in training data, improving interpretability through visualisation tools, optimising computational efficiency for edge deployment, and developing domain-specific adaptation strategies. Contextualised models like BERT and GPT represent substantial progress, yet limitations persist in handling out-of-vocabulary words and specialised domains. Who knew large language models (LLMs) could be a possibility one day! Why not give the credit to embeddings for making LLMs a reality?



Linkedin


Disclaimer

Views expressed above are the author’s own.



END OF ARTICLE





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *