![Hands-On Meta Learning with Python](https://wfqqreader-1252317822.image.myqcloud.com/cover/8/36699008/b_36699008.jpg)
上QQ阅读APP看书,第一时间看更新
Algorithm
The algorithm of the prototypical networks is shown here:
- Let's say we have the dataset, D, comprising {(x1, y1), (x2, y2), ... (xn, yn)} where x is the feature and y is the class label.
- Since we perform episodic training, we randomly sample n number of data points per each class from our dataset, D, and prepare our support set, S.
- Similarly, we select n number of data points and prepare our query set, Q.
- We learn the embeddings of the data points in our support set using our embedding function, f∅ (). The embedding function can be any feature extractor—say, a convolutional network for images and an LSTM network for text.
- Once we have the embeddings for each data point, we compute the prototype of each class by taking the mean embeddings of the data points under each class:
![](https://epubservercos.yuewen.com/35CBBA/19470383201513106/epubprivate/OEBPS/Images/4fc41dda-0243-4e99-9b90-d2f7e6f321a8.png?sign=1739202805-kMLmphEdBaSUuAQhts6iVptOg9tiulWi-0-530462f1a0d5befb6720b2a8496a9916)
- Similarly, we learn the query set embeddings.
- We calculate the Euclidean distance, d, between query set embeddings and the class prototype.
- We predict the probability, p∅(y = k|x), of the class of a query set by applying softmax over the distance d:
![](https://epubservercos.yuewen.com/35CBBA/19470383201513106/epubprivate/OEBPS/Images/b3e00903-8a06-4a64-a492-1cd538d442db.png?sign=1739202805-wVANYe4EExHicg2qYlprfXqUCFIM8jzz-0-ababb5a13421f4846da0a6b46c6cb658)
- We compute the loss function, J(∅), as a negative log probability, J(∅) = -logp∅(y=k|x), and we try to minimize the loss using stochastic gradient descent.