In LLM, in order to avoid discrimination and abuse, the author will add filter and watermark functionalities. Could you tell me in which process these functionalities take place? Is it in weights(pertaining), output layer(transformer), or fine-tuning?
Can I remove the filter by myself if I have an open-source AI like alpaca?