Legal Alert

Don’t Be Lazy: Lessons in Licensing Large Language Models

by Jonathan P. Hummel and Jonathon A. Talcott

July 27, 2023

Summary

Large language models (LLMs) are the brains behind many generative artificial intelligence systems, like ChatGPT. Not all models are created equal, however, and not all models are available under the same terms.

The Upshot

Many LLMs are restricted by license to non-commercial uses, and users may be liable for impermissible use thereof.
Before incorporating a LLM into a product or service, users must understand the scope of permissible uses of the model.
Users should weigh technical specifications and legal risks before deciding on a LLM.

The Bottom Line

In selecting LLMs for various purposes, users must weigh the technical advantages and drawbacks of the different models (e.g., network architecture, weights and biases of algorithms, performance parameters, computing budget, and training model inputs) against the legal liabilities that may arise from using these LLMs. Critically, before investing too much time or resources into a product or service that makes use of a LLM, business leaders must review any licensing terms associated with the model in order to fully understand the scope of permissible use and take actions to ensure compliance to avoid liabilities.

Llama? Vicuña? Alpaca? You might be asking yourself, “what do these camelids have to do with licensing LLM artificial intelligence?” The answer is, “a lot.”

LLaMa, Vicuña, and Alpaca are the names of three recently developed large language models (LLMs). LLMs are a type of artificial intelligence (AI) that uses deep learning techniques and large data sets to understand, summarize, generate, and predict content (e.g., text). These and other LLMs are the brains behind the generative chatbots showing up in our daily lives, grabbing headlines, and sparking debate about generative artificial intelligence. The LLaMa model was developed by Meta (the parent company of Facebook). Vicuña is the result of a collaboration between UC Berkeley, Stanford University, UC San Diego, and Carnegie Mellon University. And Alpaca was developed by a team at Stanford. LLaMa was released in February, 2023; Alpaca was released on March 13, 2023; and Vicuña was released two weeks later on March 30, 2023.

LLMs like these are powerful tools and present attractive opportunities for businesses and researchers alike. Potential applications of LLMs are virtually limitless, but typical examples are customer service interfaces, content generation (both literary and visual), content editing, and text summarization.

While powerful, these tools present risks. Different models have diverse technical strengths and weaknesses. For example, the team that developed Vicuña recognizes “it is not good at tasks involving reasoning or mathematics, and it may have limitations in accurately identifying itself or ensuring the factual accuracy of its outputs.” Thus, Vicuña might not be the best choice for a virtual math tutor. Moreover, in a general sense, the most popular type of LLM – the recurrent neural network (RNN) – is well-suited for modeling sequential data, but suffers from something called the “vanishing gradient problem” (i.e., as more layers using certain activation functions are added to neural networks, the gradients of the loss function approach zero, making the network hard to train). Meanwhile, transformers (the “T” in GPT), are great with long-range dependencies which help with translation style tasks, but are limited in their ability to perform complex compositional reasoning.

Beyond understanding such technical differences, businesses must understand that using these tools may create legal liabilities. Decision makers must understand the differences in the terms of use (including licensing terms) under which various LLMs (and/or associated chatbots) are made available. For example, the terms of use of GPT-3 (by OpenAI), LaMDA (by Google), and LLaMa are all different. Some terms may overlap or are similar, but the organizations developing the models may have different objectives or motives and therefore may place different restrictions on the use of the models.

For example, Meta believes that “[b]y sharing the code for LLaMA, other researchers can more easily test new approaches to limiting or eliminating [] problems in large language models,” and thus Meta released LLaMa “under a noncommercial license focused on research use cases,” where “[a]ccess to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world.” Thus, generally speaking, LLaMa is available for non-commercial purposes (e.g., research). Similarly, Vicuña, which is a fine-tuned LLaMa model that was trained on approximately 70,000 user shared conversations from ChatGPT, is also available for non-commercial uses. On the other hand, OpenAI’s GPT terms of service tell users “you can use Content (e.g., the inputs of users and outputs generated by the system) for any purpose, including commercial purposes such as sale or publication…” Meanwhile, the terms of use of Google’s Bard (which relies on the LaMDA model developed by Google), as laid out in the “Generative AI Additional Terms of Service,” make it plain that users “may not use the Services to develop machine learning models or related technology.” As is standard in industry, any misuse of the service gives rise to the LLM’s owner and operator to terminate the user’s use and likely creates exposure to civil liabilities under contract law and other related liabilities.

The waters are muddied further when these large corporations start lending and sharing availability of LLMs with each other. There are further indications that Meta is opening up access to its LLaMa model beyond the world of academia as reports surface about partnerships with Amazon and Microsoft. For example, Meta’s LLaMa large language model is now available to Microsoft Azure users.

Thus, in selecting LLMs for various purposes, users must weigh the technical advantages and drawbacks of the different models (e.g., network architecture, weights and biases of algorithms, performance parameters, computing budget and the actual data on which the model was trained) with the legal liabilities that may arise from using these LLMs. Critically, before investing too much time or resources into a product or service that makes use of an LLM, business leaders must review the terms associated with the model in order to fully understand the scope of legally permissible use and take actions to ensure legal compliance with those terms so as to avoid liabilities.

Related Insights

Subscribe to Ballard Spahr Mailing Lists

Get the latest significant legal alerts, news, webinars, and insights that affect your industry.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, including electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of the author and publisher.

This alert is a periodic publication of Ballard Spahr LLP and is intended to notify recipients of new developments in the law. It should not be construed as legal advice or legal opinion on any specific facts or circumstances. The contents are intended for general informational purposes only, and you are urged to consult your own attorney concerning your situation and specific legal questions you have.

Jonathan P. Hummel

Associate

hummelj@ballardspahr.com
- Tel 678.420.9434

Don’t Be Lazy: Lessons in Licensing Large Language Models

Share

Summary

The Upshot

The Bottom Line

Related Insights

A National AI Platform Takes Shape: What Corporate Innovators Need to Know About the Genesis Mission

New USPTO Guidance Hones AI-Related Patent Eligibility Criteria

Subscribe to Ballard Spahr Mailing Lists

Contacts

Related Areas