Starcoder is a performant open-source model for copyright-compliant code


BigCode, a joint initiative of Hugging Face and ServiceNow, introduces Starcoder and StarcoderBase, two large open-source code language models. The researchers place special emphasis on transparent and copyright-compliant data selection.

The 15.5 billion parameter Starcoder models can generate code in 86 programming languages. In a novel approach, the researchers used a method called “multi-query attention,” which allows the Starcoder models to focus on multiple parts of the code at once, rather than processing each token in turn. This enables both Starcoder models to read larger amounts of code (8K context windows) faster and more efficiently, speeding up code understanding and code generation.

According to participating researcher Lubna Ben Allal, the Starcoder models were trained on heavily curated data, which meant a lot of human effort: “We manually inspected 50–100 files for all the extensions in the selected programming languages ​​and choose adequate filters,” Ben Allah said.

The work seems to have paid off: Both models perform better in benchmarks than any other open model that supports multiple programming languages, and even equal or surpass the OpenAI “code-cushman-001” model.


This adds Starcoder to the growing list of open-source AI models that can compete with proprietary industrial AI models, although Starcoder’s code performance may still lag GPT-4.

Starcoder team respects privacy and copyrights

Both models also aim to set a new standard in data governance. The team says it has only used permissible data without personal references for data training, and has also implemented an opt-out mechanism and a code snippet search engine in case you want to check if your code is included in the data used from The Stack database .

The team releases the Starcoder model under the Open Responsible AI Model license, which supports commercial use. The model is not instruction-optimized out of the box, but can be optimized as a technical assistant with some additional instructions. All relevant further information and links can be found at HuggingFace Starcoder.

