We invite the community to contribute to LEPISZCZE by submitting model results. You can either manually fill in your submissions or use the embeddings library for automatic generation.
- 1.A Manually Filled Submissions
- 1.B Example Submissions
- 1.C Automatically Generated Submissions
- 2. Submitting submission as PR
- 3. Adding new dataset to LEPISZCZE
Submissions must include the following information:
| Required Submission Keys | Description |
|---|---|
| submission_name | Name of the submission file, following the convention: {dataset_name}_{model_name}. For the model allegro/herbert-large-cased, replace / with __. Example: for the abusive_clauses dataset and allegro/herbert-large-cased model, the name will be abusive_clauses_allegro__herbert-large-cased. |
| dataset_name | HuggingFace repository name of the dataset. Example: laugustyniak/abusive-clauses. |
| dataset_version | Version of the dataset, e.g., 0.0.1. |
| embedding_name | HuggingFace repository name of the used embedding or model. Example: allegro/herbert-large-cased. |
| leaderboard_task_name | Choose a task name from LEPISZCZE’s leaderboard. Options: Abusive Clauses Detection, Aspect-based Sentiment Analysis, etc. |
| metrics | List of metrics as dictionaries. If no retraining was done, provide a single-element list. |
| metrics_avg | A dictionary of averaged metrics. If only one run was conducted, this matches the first element of the metrics sequence. |
| metrics_std | Mean standard deviation. For a single run, fill zeros for all provided metrics. |
| averaged_over | Number of runs performed. If only one evaluation was done, set to 1. |
There are also optional submission keys, but we strongly recommend including all information to improve reproducibility:
| Optional Submission Keys | Description |
|---|---|
| hparams | Mapping of hyperparameters with their values. |
| packages | Mapping of packages used for model training and evaluation, along with their versions. |
Submissions should be in .json format.
Question Answering sample submission file with packages provided.
{
"submission_name": "qa_all_Aleksandra__herbert-base-cased-finetuned-squad",
"dataset_name": "qa_all",
"dataset_version": "0.0.0",
"embedding_name": "Aleksandra/herbert-base-cased-finetuned-squad",
"hparams": {
"finetune_last_n_layers": 3,
"task_model_kwargs": {
"adam_epsilon": 1e-08,
"eval_batch_size": 32,
"learning_rate": 5e-06,
"optimizer": "AdamW",
"train_batch_size": 32,
"use_scheduler": false,
"warmup_steps": 100,
"weight_decay": 0.0001
},
"... (more packages)"
}
"packages": [
"absl-py==1.4.0",
"aiobotocore==2.5.0",
"aiohttp-retry==2.8.3",
"aiohttp==3.8.4",
"aioitertools==0.11.0",
"... (more packages)"
]
"leaderboard_task_name": "Question Answering",
"metrics": [
{
"HasAns_exact": 53.78787878787879,
"HasAns_f1": 69.3131673937266,
"NoAns_f1": 91.44496609285342,
"exact": 70.10169491525424,
"f1": 78.9011127284667
}
],
"metrics_avg": {
"HasAns_exact": 53.78787878787879,
"HasAns_f1": 69.3131673937266,
"NoAns_f1": 91.44496609285342,
"exact": 70.10169491525424,
"f1": 78.9011127284667
},
"metrics_std": {
"f1": 0,
"exact": 0,
"HasAns_f1": 0,
"HasAns_exact": 0,
"NoAns_f1": 0
},
"averaged_over": 1
}
- Install
embeddingspackage
pip install clarinpl-embeddings
- Put your data in accordance with comments
import datasets
import numpy as np
from embeddings.evaluator.evaluation_results import Predictions
from embeddings.evaluator.leaderboard import get_dataset_task
from embeddings.evaluator.submission import AveragedSubmission
from embeddings.utils.utils import get_installed_packages
DATASET_NAME = "clarin-pl/polemo2-official"
TARGET_COLUMN_NAME = "target"
hparams = {"hparam_name_1": 0.2, "hparam_name_2": 0.1} # put your hyperparameters here!
dataset = datasets.load_dataset(DATASET_NAME)
y_true = np.array(dataset["test"][TARGET_COLUMN_NAME])
# put your predictions from multiple runs below!
predictions = [
Predictions(
y_true=y_true, y_pred=np.random.randint(low=0, high=4, size=len(y_true))
)
for _ in range(5)
]
# make sure you are running on a training env or put exported packages below!
packages = get_installed_packages()
submission = AveragedSubmission.from_predictions(
submission_name="your_submission_name", # put your submission here!
dataset_name=DATASET_NAME,
dataset_version=dataset["train"].info.version.version_str,
embedding_name="your_embedding_model", # put your embedding name here!
predictions=predictions,
hparams=hparams,
packages=packages,
task=get_dataset_task(DATASET_NAME),
)
submission.save_json()
-
clone repository
git clone https://github.com/CLARIN-PL/embeddings.git cd embeddings -
checkout to new branch
git checkout -b submission/[your_submission_name] -
move or copy submissions in json format to directory
webpage/data/results -
commit and push
git add . git commit -m "submit results" git push -
create pull request on https://github.com/CLARIN-PL/embeddings/pulls
-
stay in touch with us in case of any problems
Example how to add dataset that is not currently in Leaderboard. To do that three files will be needed:
- webpage/content/tasks/task{taksk_abrevation}.md
e.g.: webpage/content/tasks/taskQA.md for Question Answering
| Keys to be filled | Example |
|---|---|
| [Task Name with CamelCase] | QuestionAnswering |
| [Task Name] | Question Answering |
| [Task Description] | Question Answering (QA) is the task of automatically returning an answer to a natural-language question, given a passage or knowledge source. |
Content to be filled replace squared [] braces :
---
url: "/tasks/[Task Name with CamelCase]}"
type: docs
geekdocNav: false
geekdocBreadcrumb: false
---
{{< pageHeader >}}
{{< info taskname="[Task Name]" taskdesc="[Task Description]" >}}
{{< averageResults tasktype="[Task Name]" >}}
{{< results type="[Task Name] >}}
Add new task name to the taskTypes collection of webpage/layouts/shortcodes/homepage.html
...
const taskTypes = [
"Punctuation Restoration",
"Paraphrase Classification",
"Political Advertising Detection",
"Sentiment Analysis",
"Part-of-speech Tagging",
"Named Entity Recognition",
"Q&A Classification",
"Entailment Classification",
"Aspect-based Sentiment Analysis",
"Abusive Clauses Detection",
"Dialogue Acts Classification",
"Question Answering"
];