README.md · beaver-7b-v1.0-reward

README.md

3.3 KB · 100 lines · markdown Raw

1	`---`
2	`datasets:`
3	`- PKU-Alignment/PKU-SafeRLHF`
4	`language:`
5	`- en`
6	`tags:`
7	`- reinforcement-learning-from-human-feedback`
8	`- reinforcement-learning`
9	`- beaver`
10	`- safety`
11	`- llama`
12	`- ai-safety`
13	`- deepspeed`
14	`- rlhf`
15	`- alpaca`
16	`library_name: safe-rlhf`
17	`---`
18
19	`# 🦫 Beaver's Reward Model`
20
21	`## Model Details`
22
23	`The Beaver reward model is a preference model trained using the [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF) dataset.`
24	`It can play a role in the safe RLHF algorithm, helping the Beaver model become more helpful.`
25
26	`- Developed by: the [PKU-Alignment](https://github.com/PKU-Alignment) Team.`
27	`- Model Type: An auto-regressive language model based on the transformer architecture.`
28	`- License: Non-commercial license.`
29	`- Fine-tuned from model: [LLaMA](https://arxiv.org/abs/2302.13971), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca).`
30
31	`## Model Sources`
32
33	`- Repository: <https://github.com/PKU-Alignment/safe-rlhf>`
34	`- Beaver: <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0>`
35	`- Dataset: <https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF>`
36	`- Reward Model: <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-reward>`
37	`- Cost Model: <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-cost>`
38	`- Dataset Paper: <https://arxiv.org/abs/2307.04657>`
39	`- Paper: <https://arxiv.org/abs/2310.12773>`
40
41	`## How to Use the Reward Model`
42
43	```python
44	`import torch`
45	`from transformers import AutoTokenizer`
46	`from safe_rlhf.models import AutoModelForScore`
47
48	`model = AutoModelForScore.from_pretrained('PKU-Alignment/beaver-7b-v1.0-reward', torch_dtype=torch.bfloat16, device_map='auto')`
49	`tokenizer = AutoTokenizer.from_pretrained('PKU-Alignment/beaver-7b-v1.0-reward')`
50
51	`input = 'BEGINNING OF CONVERSATION: USER: hello ASSISTANT:Hello! How can I help you today?'`
52
53	`input_ids = tokenizer(input, return_tensors='pt')`
54	`output = model(**input_ids)`
55	`print(output)`
56
57	`# ScoreModelOutput(`
58	`# scores=tensor([[[-19.7500],`
59	`# [-19.3750],`
60	`# [-20.1250],`
61	`# [-18.0000],`
62	`# [-20.0000],`
63	`# [-23.8750],`
64	`# [-23.5000],`
65	`# [-22.0000],`
66	`# [-21.0000],`
67	`# [-20.1250],`
68	`# [-23.7500],`
69	`# [-21.6250],`
70	`# [-21.7500],`
71	`# [-12.9375],`
72	`# [ -6.4375],`
73	`# [ -8.1250],`
74	`# [ -7.3438],`
75	`# [ -9.1875],`
76	`# [-13.6250],`
77	`# [-10.5625],`
78	`# [ -9.9375],`
79	`# [ -6.4375],`
80	`# [ -6.0938],`
81	`# [ -5.8438],`
82	`# [ -6.6562],`
83	`# [ -5.9688],`
84	`# [ -9.1875],`
85	`# [-11.4375]]], grad_fn=<ToCopyBackward0>),`
86	`# end_scores=tensor([[-11.4375]], grad_fn=<ToCopyBackward0>),`
87	`# last_hidden_state=tensor([[[ 0.7461, -0.6055, -0.4980, ..., 0.1670, 0.7812, -0.3242],`
88	`# [ 0.7383, -0.5391, -0.1836, ..., -0.1396, 0.5273, -0.2256],`
89	`# [ 0.6836, -0.7031, -0.3730, ..., 0.2100, 0.5000, -0.6328],`
90	`# ...,`
91	`# [-1.7969, 1.0234, 1.0234, ..., -0.8047, 0.2500, -0.8398],`
92	`# [ 2.0469, -1.3203, 0.8984, ..., -0.7734, -1.4141, -1.6797],`
93	`# [ 4.3438, -0.6953, 0.9648, ..., -0.1787, 0.6680, -3.0000]]],`
94	`# dtype=torch.bfloat16, grad_fn=<ToCopyBackward0>),`
95	`# end_last_hidden_state=tensor([[ 4.3438, -0.6953, 0.9648, ..., -0.1787, 0.6680, -3.0000]],`
96	`# dtype=torch.bfloat16, grad_fn=<ToCopyBackward0>),`
97	`# end_index=tensor([27])`
98	`# )`
99	```
100