Project 1999 - View Single Post

Ekco · #4 01-27-2025, 03:44 AM

no you can do exactly that, and people are to a degree sorta, but like

Quote:

Training: LLMs are trained on huge datasets from various sources (books, articles, code, websites).

Learning: They learn to predict the next word or code by identifying patterns.

Fine-tuning: They are further trained on specific tasks (like answering questions) to improve performance.

Alignment: Techniques like Reinforcement Learning from Human Feedback (RLHF) help align LLM behavior with human values, making their responses more helpful and harmless.
Reinforcement Learning: This is a separate process that focuses on aligning the LLM's behavior with human values and preferences. It involves:
Reward Model: Creating a model that predicts how humans would rate the LLM's responses.
Feedback Loop: The LLM generates responses, and the reward model evaluates them.

Beyond Fine-tuning: Researchers are exploring other techniques like prompt engineering to influence the behavior of LLMs without directly modifying the model itself.

most people are hacking the models by doing the Fine-tuning part from my understanding, because the "training" step 1 costs like millions of dollars in GPUs and power bill, so people aren't fuckin around as much on that step, like that step you have to explain to someone how your model is going to make money n shit lol.

but some those training data sets some are opensource and free and you can fuck around with them