Vicuna is an LLM. It is LLAMA fine-tuned on ShareGPT chats. Hence, it is trained on OpenAI-generated data. The LLAMA license prohibits commercial use. OpenAI terms of use prohibit using OpenAI-generated data to create competing models. ShareGPT data is self-disclosed.
Now, I want to use Vicuna to generate text that I want to use to train a language model. Does this model also have to be non-commercial?
In other words, the Vicuna weights are "tainted" with non-commercial Meta's and OpenAI's licenses. However, if I train my language model from scratch on open-source datasets and some Vicuna-generated text, will it also be "tainted", or not - because its training data doesn't contain any original OpenAI data and its weights are not fine-tuned from LLAMA?
2. (c) You may not (iii) use output from the Services to develop models that compete with OpenAI. So this clearly states that Vicuna itself must be non-commercial, right? I just want to know if they have right to say this - is it a "law" or is it like saying something like... non-Americans can't look at the moon 'cause there's a flag there. – janekb04 Apr 26 '23 at 09:02