What does the term ‘open source’ mean? Why should you care about it? And why does it matter with respect to artificial intelligence?
Open source software typically describes any computer software released under a license that allows users to use, examine, modify and distribute the program and its source code to anybody and for any purpose. Open source software plays a significant role in our daily lives, often without us even realizing it. For instance, the Android operating system, which powers millions of smartphones worldwide, is built on an open source foundation. Similarly, the Firefox web browser, used by many for browsing the internet, is an open source project. When browsing websites, you might encounter WordPress, an open source content management system that powers a large portion of the internet’s sites. Even in the realm of entertainment, open source software like VLC media player is widely used for playing various video and audio formats. These are just a few examples of how open source software has become an integral part of our everyday technology experiences.
Open source principles regarding artificial intelligence follow similar definitions to open source software, with a few minor differences as various people have differing views on what exactly ‘open source AI’ is. For context, the output of an AI system can typically be attributed to two main parts: (1) the (untrained) model architecture (implemented through code) and (2) the training data. Some argue that just following the typical open source software guidelines is sufficient, meaning that only the code needs to be made publicly available. Others argue that both the code and the training data should be made available, as the same code trained on different data could produce rapidly different results. And, some further argue that the model weights, which are the learnable parameters of machine learning models (think the product of using the training data on the code, or in other words, the ‘trained’ model), should also be made publicly available. While there are still disputes about what constitutes ‘open source AI,’ all sides of this debate generally have the same intent: to ensure the development of AI is transparent and accountable.
However, although the premise of open and collaborative AI development may seem enticing at first, some argue against it, citing the significant risks it poses that are currently difficult to mitigate. Namely, those against open source assert that unsecured AI systems can be easily manipulated by bad actors who can remove safety features and use these powerful tools for malicious purposes. This includes generating and spreading large volumes of misleading or harmful content, creating non-consensual deep-fake pornography and potentially even facilitating the production of dangerous materials like chemical and biological weapons. The uncontrolled distribution of AI-generated content through channels like social media could have severe consequences for the integrity of information ecosystems and democratic processes.
It is important to note that these concerns are not completely misguided – open source AI could potentially pose risks. But, these concerns shouldn’t diminish the importance of transparency in developing such technologies. Responsible development practices, like strong safety measures and ethical guidelines, can help manage these risks. Importantly, misuse can occur with both open and closed-source AI (see UPenn’s paper Jailbreaking Black Box Large Language Models in Twenty Queries, published Oct. 2023, for reference). In fact, the opacity of closed-source systems makes it harder to prevent malicious activities. Ultimately, the benefits of open source AI—accelerated innovation, reduced bias and decentralization of power—outweigh the risks, which can be managed through proactive measures and ongoing collaboration between developers, researchers, policymakers and society at large.
Open source as an enabler and driver of progress
The history of artificial intelligence is the history of a research community that freely shares and builds upon each other’s discoveries. OpenAI’s ChatGPT would not be possible without Google Research’s 2017 paper, “Attention is All You Need”, which detailed the Transformer architecture (the ‘T’ in GPT). HuggingFace, an open source AI repository, allows users to download free pre-trained AI models and has influenced an array of disciplines. For instance, HuggingFace has assisted researchers in studying linguistic phenomena in different languages, mining medical literature, protein structure prediction, and even analyzing the impact of climate change. Even more fundamentally, code libraries like Google’s TensorFlow and Meta’s PyTorch allow researchers to focus on implementing AI models themselves without worrying about implementing low-level infrastructure to interface with hardware. The benefits arising from open source extend past just the research sector: an analysis from Harvard Business School found that without open source, companies would likely have to spend 3.5 times more on software, thus limiting the amount of money available for the production of new software.
History has shown and continues to show that open source is a fundamental enabler of progress in artificial intelligence and beyond. By allowing researchers and developers to freely share, collaborate, and build upon each other’s work, open source accelerates innovation, reduces barriers to entry and democratizes access to cutting-edge technologies. Without the culture of openness and collaboration fostered by open source, the rapid advancements we have witnessed in AI and other fields would simply not be possible, as each individual or organization would be forced to start from scratch and work in isolation, greatly limiting the pace and scope of progress.
Open source as a reduction in bias and an increase in diversity
Completely eliminating AI bias is an extremely difficult task. If an AI system’s output is a product of its training data, does data that is truly unbiased even exist when humans themselves carry inherent biases that permeate the information we produce? Despite the immense challenge, open source AI provides several mechanisms to help address and reduce bias.
- Transparency: Making AI models open source so that the public can inspect the underlying code makes it easier to identify the source of bias in the training data or the model architecture itself. On the contrary, we’ve seen closed-source AI built by certain corporations make racist inferences and even display historically inaccurate information, such as generating images of an Asian woman in a German World War II-era military uniform or a female Pope. Transparency would have helped alleviate all of these problems.
- Auditing: When AI is made open source, unlike closed source AI, third parties can audit the system without requiring special access, authorization, or nondisclosure agreements (NDAs). This would make the designers of the systems more accountable and further guarantee that biases are identified and fixed.
- Diverse Community Involvement: More generally, contributions to open source code frequently come from a diverse range of backgrounds. A more homogeneous group that might be experiencing tunnel vision itself may fail to notice and correct prejudices that a diverse range of participants can bring to the table by offering their unique viewpoints.
One might assert that open source is not necessary to achieve all these qualities. Proponents of closed-source AI could argue that private companies can still maintain transparency, allow for auditing, and foster diverse perspectives within their own teams. However, regardless of these points, a single closed-source AI model cannot suffice to meet the diverse needs and perspectives of our global society.
Yann LeCun, a Turing award winner, NYU Professor and Meta AI’s Chief Scientist, regards open source AI systems as a moral necessity. He argues that “in the future, our entire information diet is going to be mediated by [AI] systems. They will constitute basically the repository of all human knowledge. And you cannot have this kind of dependency on a proprietary, closed system.”
“People will only do this if they can contribute to a widely available open platform. They’re not going to do this for a proprietary system. So the future has to be open source, if nothing else, for reasons of cultural diversity, democracy, diversity. We need a diverse AI assistant for the same reason we need a diverse press.”
Open source as a decentralization of power
1984, Blade Runner, The Matrix. What do these dystopian movies have in common? The presence of a centralized power structure that exerts control over society through advanced technology. This could be a mega-corporation, authoritarian government, or other dominant entity that wields technology as a tool of oppression and manipulation. The centralized power often uses technology to surveil citizens, limit freedoms, and maintain a hierarchy that benefits the elite at the expense of the masses. There is usually a stark divide between those who control the technology and those who are controlled by it.
Open source AI has the potential to disrupt this dystopian paradigm by decentralizing power and putting advanced technology into the hands of the people. With open source, anyone can access, inspect, and modify cutting-edge AI systems. This transparency and accessibility make it much harder for dictatorial governments or large corporations to use AI as a tool of control and oppression. If an open source AI system has backdoors for surveillance or biases that discriminate, the community can identify and fix these issues. Furthermore, communities can create AI systems that preserve their wants, needs, and culture, resisting the homogenizing forces of big tech. In an open source world, the power of AI is not concentrated in the hands of a few but distributed among the many. A multitude of AI systems reflecting different values and interests can act as a check against any single entity wielding AI for dystopian control. Open source is a powerful tool to keep AI accountable to the people and aligned with an open, free, and pluralistic society.
The internet itself is a great example of the power of open and decentralized technologies. No single government or corporation controls the Internet. It is a globally distributed network that has empowered billions of people to connect, create, and access information freely. While not perfect, the internet’s decentralized architecture has resisted attempts at centralized control and censorship.
Whether you agree or disagree with my arguments, I think it is imperative that we have this discussion now. Even if you do not consider yourself to be someone ‘interested’ in or ‘knowledgeable’ about computer science, your opinion matters. There is likely no one correct approach to AI development: as with many things, the answer is probably nuanced, lying somewhere in between the spectrum of fully open and fully closed source AI. And we will not be able to come to this conclusion if the discourse is dominated by one group of people. Politicians have begun discussing AI regulation, and large corporations and their CEOs are already lobbying for policies that would benefit themselves and harm the startup ecosystem. As members of the younger generation whom the trajectory of AI development will most impact, we must educate ourselves on these issues and make our voices heard. By participating in the conversation and advocating for AI development grounded in ethics and morality, we can help shape a future where this transformative technology benefits all of humanity.