Bloom, the open-source artificial intelligence: an alternative to OpenAI's ChatGPT and big tech

|News|18 January 2023

Pexels - Copyright: Abdul KayulResulting from the shared work of hundreds of artificial intelligence researchers and experts, Bloom is the latest AI model in the language field (NLP - Natural Language Processing) in the public domain that - through open science - aims to oppose the monopoly of multinational corporations in the field of artificial intelligence.

Several language models already exist - such as OpenAI's ChatGPT, Meta's OPT or Google's Switch Transformers - but with Bloom we are witnessing a paradigm shift in the development of artificial intelligence algorithms for text production, with open-science as its 'North star'.

OpenAI's GPT, which has had considerable prominence in the world's media in recent weeks, is in fact a proprietary system developed primarily with funding from Microsoft. Bloom is equivalent to GPT in terms of performed training, but the new software that has been released is an open-source version.

Knowledge sharing - which is part of Bloom's DNA - opens indeed the door to a collaborative and participatory way of working in the field of AI, putting transparency of models, data, information and values first, with the goal of addressing even the most critical aspects of artificial intelligence systems, such as bias (algorithm results that express bias or injustice) and ethics.

Open-source artificial intelligence: what is Bloom?

Acronym for 'BigScience Large Open-science Open-access Multilingual Language Model,' Bloom is a 176-billion-parameter large language model developed by BigScience - a project born out of a collaboration between Hugging Face, the Grand équipement national de calcul intensif (GENCI) and the Institut du développement et des ressources en informatique scientifique (IDRIS) - to create an open-source algorithm trained on a large multilingual text database.

The training of the model - carried out on the Jean Zay supercomputer just outside Paris - lasted more than three months and culminated the work that more than 1,000 researchers from around the world, in collaboration with more than 250 institutions, have been carrying out for more than a year, starting in January 2021, as part of the BigScience Research Workshop.

During the work, both the model and the dataset were studied by considering various aspects - including bias, environmental and social impact, operational limitations, performance, ethics, etc. - with the aim of overcoming some of the critical issues that characterize Large Language Models (LLMs), i.e., artificial intelligence systems that produce text (text generation).

Bloom, in fact, is able to produce coherent text in 46 languages and 13 programming languages, but - unlike other LLMs - it does it transparently, according to an approach devoted to open-source and open-science, far from the rationale of peculiar to big tech.

Artificial intelligence and open-science: why Bloom challenges big tech?

First of all, Bloom's algorithm code is in the public domain, meaning that it can be downloaded by anyone on the Hugging Face website. Moreover, all information about how the model works and the data used to train it are public, with details on critical aspects and recorded performance.

Bloom's developers then published the BigScience Ethical Charter, which collects the values underlying the BigScience project, divided into two categories:

  • Intrinsic values: inclusiveness, diversity, reproducibility, openness, accountability
  • Extrinsic values: accessibility, transparency, interdisciplinarity, multilingualism

In addition to this code of ethics, there is the list of restrictions on use - provided for in Bloom's license - which includes a number of prohibitions on the application of the LLM, which, for example, cannot be used to provide medical advice or administer justice.

As Alberto Romero writes in Towards Data Science, "Bloom is the most important artificial intelligence model of the decade" because it changes the rules of the game in a field that is on the brink of radical change, putting transparency, knowledge sharing and ethics first, rather than profit and control of technologies.

In doing so, Bloom and BigScience chart a course toward a more democratic model of technology development rooted in open-source and open-science.

Finally, there are other AI models in open-source linguistics that are worth mentioning and that can be used for application development without having to resort to OpenAI's paid services; out of all of them it is worth mentioning those of Eleuther, which can also be found on the Hugging Face platform.

While Bloom requires very expensive hardware to run, these models are more affordable and often give, with additional training done on the specific functions required by the application, results very close to those of the more expensive models. 

The true democratization of artificial intelligence will be achieved when even the smallest individuals can have hardware resources and AI models that can achieve results similar to those of systems that "run" on supercomputers.