Eval JavaScript - Search News

Developer-targeting campaign using malicious Next.js repositories

A developer-targeting campaign leveraged malicious Next.js repositories to trigger a covert RCE-to-C2 chain through standard ...

GitHub

Python Library for Evaluation

Evaluation allows us to assess how a given model is performing against a set of specific tasks. This is done by running a set of standardized benchmark tests against the model. Running evaluation ...

IEEE

CAST-Eval: A Domain-Specific Benchmark for Large Language Models in Civil Aviation Safety

Abstract: In this paper, we present CAST-Eval, a novel, comprehensive and domain-specific benchmark designed to assess the knowledge and reasoning capabilities of large language models (LLMs) in the ...

IEEE

DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

Abstract: Recently, DALL-E [45], a multimodal transformer language model, and its variants including diffusion models have shown high-quality text-to-image generation capabilities. However, despite ...

GitHub

isShayulajiao/CCL25-Eval-ZhengMing

*注：所有任务的提示（Prompt）都经过严格的人工评估，以确保提示适应不同的模型。提示的评估小组由8名研究生和2 ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results