Sign in

Computerlinguistics master student trying to keep up with my reading list and an intermediate hipster.

Azure services to bring your machine learning project to the cloud

Photo by Dallas Reedy on Unsplash

It is becoming increasingly difficult to do data science in your own computer nowadays. Sure, normal computers have no problem to handle data exploration, analysis and visualization. But when it comes to model training, unless you own a GPU or you are using a classical machine learning model and not a neural network, your computer will likely struggle to train a model. Both regarding RAM and length of computation.

Some people purchase more RAM, a stronger processor or a GPU. Other people use already pre-trained models from Huggingface for example, which is a very user-friendly way to interact with large…


In this blog post we introduce a new and very exciting European project that Beck et al. is participating: AI-SPRINT, short for Artificial Intelligence in Secure PRIvacy-preserving computing coNTinuum. Beck et al. will apply extensive cloud computing knowledge in a consortium consisting of several industry partners, along with several European universities.

For most of its history, artificial intelligence has been used in academia. In recent years it has grown in popularity and slowly made its way into the mainstream public. However, the journey from academic specialization and complexity to accessible tools for businesses is not yet complete.

Academics and companies…

Create a system in order to be up to date with deep learning research

Photo by Caspar Camille Rubin on Unsplash

Deep learning is moving so fast, that the only way to keep up is by reading directly from the people who publish these new findings. If you’re a technical person and want to learn about deep learning in 2021, you need to read papers.

Formal education will only get you so far. Unfortunately, universities in general are slow to incorporate new material into their curriculums and only a few years ago did they start to teach deep learning. This is the case in Europe, I acknowledge it might be different in the US.

Deep learning in university

Under the fancy names of AI and…

Given similar sentences, how similar are their contextual embeddings? Let’s find out

Photo by Myriam Jessier on Unsplash

Embeddings are a key part of modern NLP, they encode the meaning of words or other linguistic units into vectors of numbers. The embedding of a specific word might seem random, but the idea is that similar words have similar embeddings, and opposite words have opposite embeddings.

For example, imagine this is king’s embedding: [2, 3, 1, 0, 5]. Prince’s embedding might be [1, 3, 1, 0, 4], the difference between the vectors is just 2, which means these 2 words are linguistically very close. Queen’s embedding might be [2, 3, -1, -10, 0] where some similarity is mantained (king…

Create systematically good writing assignments no matter the topic

Photo by NeONBRAND on Unsplash

Why am I writing this?

I have had to write many long essays and writing assignments during my 6 years at university. When I started at 18, I had little idea how to do this properly. I started building the house from the roof, added windows and rooms, and then I would realize the foundation was all wrong. I wasted many hours this way.

However, as the productivity nerd that I am, I have been developing a technique for this past years that works better and better. …

Introduction to tokenization methods, including subword, BPE, WordPiece and SentencePiece

Photo by Hannah Wright on Unsplash

This article is an overview of tokenization algorithms, ranging from word level, character level and subword level tokenization, with emphasis on BPE, Unigram LM, WordPiece and SentencePiece. It is meant to be readable by both experts and beginners alike. If any concept or explanation is unclear, please contact me and I will be happy to clarify whatever is needed.

What is tokenization?

Tokenization is one of the first steps in NLP, and it’s the task of splitting a sequence of text into units with semantic meaning. …

Key lessons from a degree in a fast-paced research field

Photo by Mikael Kristenson on Unsplash

Follow me on Twitter for more stories ✨

I had the idea to write this post by the Github graduation initiative and the post was originally published in

I started my Masters degree in NLP in LMU Munich, Germany in 2018. I had been interested in machine learning for about a year and had completed Coursera’s Deep Learning specialization. I wanted to learn NLP more deeply and become a kind of expert in the topic. That’s what a Masters degree is supposed to make you, right? After spending 2 years receiving lectures on it and doing programming assignments, I…

How? Do as little as you’re comfortable with to make minimal progress

Photo by Daria Tumanova on Unsplash

If you’re like me, you have a long list of things you’d like to do, but you never find time to start with them. These can be anything from coding projects to household ideas, playing an instrument, writing, reading, working out.

Any habit that you wish you did regularly but you don’t start because they would take up many hours per week and you don’t have that kind of time.

If you’re already efficient with your time and don’t have any time left in your week, you don’t need this post. …

An overview of the research in a highly complex phenomenon

Photo by Melanie Dretvic on Unsplash

You’ve probably made this face when you’ve tried to know if someone is being sarcastic. It’s not easy to detect it for humans, how about machines?

Note: for the sake of brevity, this post will only consider sarcasm detection with tweets and using deep learning models.

Sarcasm detection is a very narrow research field in NLP, a specific case of sentiment analysis where instead of detecting a sentiment in the whole spectrum, the focus is on sarcasm. Therefore the task of this field is to detect if a given text is sarcastic or not.

The first problem we come across…

They are supposed to reduce road casualties, but what else do they entail?

Cars are the most ubiquitous transport method nowadays. They offer comfort, freedom, availability, various expenses and a slim-but-not-insignificant probability of being involved in an accident.

Casualties are diminishing, yet cars remain as one of the deadliest transport systems, only surpassed mainly by motorcycles. They are also an untenable economic burden, costing between 1% and 3% of each country’s GDP according to the World Health Organization [1].

Joint efforts by governments and car manufacturers are striving to reduce this rate significantly: the European Commission, for instance, has developed several strategies, a Cooperative Intelligent Transport Systems among others, which focuses on reducing…

Ane Berasategi

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store