The code repository includes a comprehensive collection of tools for LLM training, finetuning, evaluation, and deployment using Composer and the MosaicML platform. The codebase is intended to be user-friendly, efficient, and adaptable, allowing for easy experimentation with cutting-edge techniques.

In this article we going to learn LLM training code for MosaicML models.

MPT

MosaicML is an organisation focused on producing fast and scalable machine learning tools, has created the “Mosaic Propagation Transformer” (MPT) language model. MPT models are similar to OpenAI’s GPT models, but with some major architectural and training differences.

MPT-7B, a model in the MosaicML Foundation Series, is a GPT-style language model trained on 1 trillion tokens from a MosaicML-curated dataset. It is open-source and commercially viable, with evaluation metrics comparable to LLaMa 7B.

MPT design uses the most recent LLM modelling approaches, such as Flash Attention for increased efficiency, Alibi for context length extrapolation, and stability improvements to mitigate loss spikes.

There are multiple variants of the MPT model, including a 64K context length fine-tuned model, that are available for use.

Model Context Length Commercial use MPT-7B 2048 Yes MPT-7B-Instruct 2048 Yes MPT-7B-Chat 2048 No MPT-7B-StoryWriter 65536 Yes MPT model

MPT-7B is a general-purpose language model that was trained on a large text corpus. It is intended to produce high-quality content in a number of scenarios, including text completion, summarization, and translation.

MPT-7B-Instruct is an MPT-7B version that has been specifically optimised for producing instructive material. It can generate step-by-step instructions for a variety of tasks, including food recipes, DIY projects, and technical guides.

MPT-7B-Chat is a conversational form of the MPT-7B model that produces realistic and engaging replies to user input. It has a wide range of applications, including chatbots, virtual assistants, and customer assistance.

MPT-7B-StoryWriter is an MPT-7B model version that has been optimised for producing creative writing such as short tales, poems, and scripts. It can be utilised as a tool for inspiration and idea generation by authors and other creatives.

Prerequisites

Here’s what you need to get started with our LLM stack:

Use a Docker image with PyTorch 1.13+, e.g. MosaicML’s PyTorch base image Recommended tag: mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04 This image comes pre-configured with the following dependencies: PyTorch Version: 1.13.1 CUDA Version: 11.7 Python Version: 3.10 Ubuntu Version: 20.04 FlashAttention kernels from HazyResearch

Use a system with NVIDIA GPUs

Installation

1.Open your terminal or command prompt and navigate to the directory where you want to clone the repository.

2. Run the following command to clone the repository:

git clone https://github.com/mosaicml/llm-foundry.git

3. Change your current working directory to the cloned repository:

cd llm-foundry

4. (Optional) It’s highly recommended to create and use a virtual environment to manage dependencies. Run the following command to create and activate a new virtual environment:

python -m venv llmfoundry-venv source llmfoundry-venv/bin/activate

5. Install the required packages by running:

pip install -e ".[gpu]"

If you don’t have an NVIDIA GPU, you can instead run:

pip install -e .

Start Running LLM

Here’s how to prepare a portion of the C4 dataset, train an MPT-125M model for 10 batches, convert the model to HuggingFace format, evaluate the model on the Winograd challenge, and generate replies to prompts.

You can upload your model to the Hub if you have a HuggingFace auth token that is write-enabled! Simply export your token as follows: and remove the comment from the line containing –hf_repo_for_upload….

export HUGGING_FACE_HUB_TOKEN=your-auth-token

It’s important to remember that the code below is meant to be a quickstart to demonstrate the tools. To acquire high-quality results, the LLM must be trained for more than ten batches.

Change the directory to scripts:

cd scripts

Convert the C4 dataset to StreamingDataset format by running the following command:

python data_prep/convert_dataset_hf.py \ --dataset c4 --data_subset en \ --out_root my-copy-c4 --splits train_small val_small \ --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text ''

Train an MPT-125m model for 10 batches by running the following command:

composer train/train.py \ train/yamls/mpt/125m.yaml \ data_local=my-copy-c4 \ train_loader.dataset.split=train_small \ eval_loader.dataset.split=val_small \ max_duration=10ba \ eval_interval=0 \ save_folder=mpt-125m

Convert the model to HuggingFace format by running the following command:

python inference/convert_composer_to_hf.py \ --composer_path mpt-125m/ep0-ba10-rank0.pt \ --hf_output_path mpt-125m-hf \ --output_precision bf16 \ # --hf_repo_for_upload user-org/repo-name

Evaluate the model on Winograd by running the following command:

python eval/eval.py \ eval/yamls/hf_eval.yaml \ icl_tasks=eval/yamls/winograd.yaml \ model_name_or_path=mpt-125m-hf

Generate responses to prompts by running the following command:

python inference/hf_generate.py \ --name_or_path mpt-125m-hf \ --max_new_tokens 256 \ --prompts \ "The answer to life, the universe, and happiness is" \ "Here's a quick recipe for baking chocolate chip cookies: Start by"

