A beginner’s guide to modern natural language processing

Would you like to do a natural language processing project but feel overwhelmed by all the talk of attention-based models and text embeddings? Would you like to understand how you can take a set of raw texts and put them into a form that a machine-learning model will understand?

In this talk, you'll learn some of the theory behind two of the most widely used techniques in natural language processing today: word embeddings and BERT. This'll allow you to understand what's going on under the hood. You'll follow a practical demonstration of how you can use these techniques yourself, in which you'll see how to build a clickbait headline classifier in Python with user-friendly packages like gensim and transformers.

By the end of this talk, you’ll have an understanding of why each technique works, what meaning it extracts from the text and the advantages and disadvantages of using each of them. Even if you haven't done any machine learning before, you'll gain enough knowledge to go home and start experimenting with your own natural language processing project.

Presented at:

WeAreDevelopers Python Day 2024
PyCon PT 2022