README.md
7.4 KB · 251 lines · markdown Raw
1 ---
2 license: apache-2.0
3 base_model: 01-ai/Yi-1.5-34B
4 tags:
5 - generated_from_trainer
6 - axolotl
7 datasets:
8 - cognitivecomputations/Dolphin-2.9
9 - teknium/OpenHermes-2.5
10 - m-a-p/CodeFeedback-Filtered-Instruction
11 - cognitivecomputations/dolphin-coder
12 - cognitivecomputations/samantha-data
13 - microsoft/orca-math-word-problems-200k
14 - Locutusque/function-calling-chatml
15 - internlm/Agent-FLAN
16 ---
17
18 # Dolphin 2.9.1 Yi 1.5 34b 🐬
19
20 Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
21
22 This is our most spectacular outcome ever. FFT, all parameters, 16bit. 77.4 MMLU on 34b. And it talks like a dream.
23
24 Although the max positional embeddings is 4k, we used rope theta of 1000000.0 and we trained with sequence length 8k. We plan to train on the upcoming 32k version as well.
25
26 Website: https://dphn.ai
27 Twitter: https://x.com/dphnAI
28 Web Chat: https://chat.dphn.ai
29 Telegram bot: https://t.me/DolphinAI_bot
30
31 <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
32
33 Our appreciation for the sponsors of Dolphin 2.9.1:
34 - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
35 - [OnDemand](https://on-demand.io/) - provided inference sponsorship
36
37 This model is based on Yi-1.5-34b, and is governed by apache 2.0 license.
38
39 The base model has 4k context, but we used rope theta of 1000000.0 and the full-weight fine-tuning was with 8k sequence length.
40
41 Dolphin 2.9.1 uses ChatML prompt template format.
42
43 example:
44
45 ```
46 <|im_start|>system
47 You are Dolphin, a helpful AI assistant.<|im_end|>
48 <|im_start|>user
49 {prompt}<|im_end|>
50 <|im_start|>assistant
51
52 ```
53
54 Dolphin-2.9.1 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
55
56 Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
57
58 Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
59
60 ## Evals
61
62 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/coI4WEJEJD4lhSWgMOjIr.png)
63
64 ## Training
65
66 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
67 <details><summary>See axolotl config</summary>
68
69 axolotl version: `0.4.0`
70 ```yaml
71 base_model: 01-ai/Yi-1.5-34B
72 model_type: LlamaForCausalLM
73 tokenizer_type: LlamaTokenizer
74 trust_remote_code: true
75
76 # load_in_8bit: false
77 # load_in_4bit: true
78 # strict: false
79
80 # adapter: qlora
81 # lora_modules_to_save: [embed_tokens, lm_head]
82
83 # lora_r: 32
84 # lora_alpha: 16
85 # lora_dropout: 0.05
86 # lora_target_linear: True
87 # lora_fan_in_fan_out:
88
89 datasets:
90 - path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
91 type: sharegpt
92 conversation: chatml
93 - path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
94 type: sharegpt
95 conversation: chatml
96 - path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
97 type: sharegpt
98 conversation: chatml
99 - path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
100 type: sharegpt
101 conversation: chatml
102 - path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
103 type: sharegpt
104 conversation: chatml
105 - path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
106 type: sharegpt
107 conversation: chatml
108 - path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
109 type: sharegpt
110 conversation: chatml
111 - path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
112 type: sharegpt
113 conversation: chatml
114 - path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
115 type: sharegpt
116 conversation: chatml
117 - path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
118 type: sharegpt
119 conversation: chatml
120 - path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
121 type: sharegpt
122 conversation: chatml
123 - path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
124 type: sharegpt
125 conversation: chatml
126 - path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
127 type: sharegpt
128 conversation: chatml
129
130 chat_template: chatml
131
132 dataset_prepared_path: yi34b
133 val_set_size: 0.01
134 output_dir: ./out-yi
135
136 sequence_len: 8192
137 sample_packing: true
138 pad_to_sequence_len: true
139
140 wandb_project: dolphin-2.9-yi-34b
141 wandb_watch:
142 wandb_run_id:
143 wandb_log_model:
144
145 gradient_accumulation_steps: 8
146 micro_batch_size: 1
147 num_epochs: 3
148 optimizer: adamw_8bit
149 lr_scheduler: cosine
150 learning_rate: 1e-5
151
152 train_on_inputs: false
153 group_by_length: false
154 bf16: auto
155 fp16:
156 tf32: true
157
158 gradient_checkpointing: true
159 gradient_checkpointing_kwargs:
160 use_reentrant: false
161 early_stopping_patience:
162 # resume_from_checkpoint: /workspace/axolotl/dbrx-checkpoint
163 logging_steps: 1
164 xformers_attention:
165 flash_attention: true
166
167 warmup_steps: 10
168 evals_per_epoch: 4
169 eval_table_size:
170 saves_per_epoch: 4
171 save_total_limit: 2
172 save_steps:
173 debug:
174 deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
175 weight_decay: 0.05
176 fsdp:
177 fsdp_config:
178 special_tokens:
179 bos_token: "<|startoftext|>"
180 eos_token: "<|im_end|>"
181 pad_token: "<unk>"
182 unk_token: "<unk>"
183 tokens:
184 - "<|im_start|>"
185
186
187 ```
188
189 </details><br>
190
191 # out-yi
192
193 This model is a fine-tuned version of [01-ai/Yi-1.5-34B](https://huggingface.co/01-ai/Yi-1.5-34B) on the None dataset.
194 It achieves the following results on the evaluation set:
195 - Loss: 0.4425
196
197 ## Model description
198
199 More information needed
200
201 ## Intended uses & limitations
202
203 More information needed
204
205 ## Training and evaluation data
206
207 More information needed
208
209 ## Training procedure
210
211 ### Training hyperparameters
212
213 The following hyperparameters were used during training:
214 - learning_rate: 1e-05
215 - train_batch_size: 1
216 - eval_batch_size: 1
217 - seed: 42
218 - distributed_type: multi-GPU
219 - num_devices: 8
220 - gradient_accumulation_steps: 8
221 - total_train_batch_size: 64
222 - total_eval_batch_size: 8
223 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
224 - lr_scheduler_type: cosine
225 - lr_scheduler_warmup_steps: 10
226 - num_epochs: 3
227
228 ### Training results
229
230 | Training Loss | Epoch | Step | Validation Loss |
231 |:-------------:|:-----:|:----:|:---------------:|
232 | 0.6265 | 0.0 | 1 | 0.6035 |
233 | 0.4674 | 0.25 | 327 | 0.4344 |
234 | 0.4337 | 0.5 | 654 | 0.4250 |
235 | 0.4346 | 0.75 | 981 | 0.4179 |
236 | 0.3985 | 1.0 | 1308 | 0.4118 |
237 | 0.3128 | 1.23 | 1635 | 0.4201 |
238 | 0.3261 | 1.48 | 1962 | 0.4157 |
239 | 0.3259 | 1.73 | 2289 | 0.4122 |
240 | 0.3126 | 1.98 | 2616 | 0.4079 |
241 | 0.2265 | 2.21 | 2943 | 0.4441 |
242 | 0.2297 | 2.46 | 3270 | 0.4427 |
243 | 0.2424 | 2.71 | 3597 | 0.4425 |
244
245
246 ### Framework versions
247
248 - Transformers 4.40.0.dev0
249 - Pytorch 2.2.2+cu121
250 - Datasets 2.15.0
251 - Tokenizers 0.15.0