dataset/README_MINIGPTv2_FINETUNE.md
Download the dataset
| Image source | Download path |
|---|---|
| COCO 2014 images | <a href="http://images.cocodataset.org/zips/train2014.zip">images</a> <a href="https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json"> captions</a> |
| COCO VQA | <a href="https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vqav2/vqa_train.json">vqa train</a> <a href="https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/vqav2/vqa_val.json"> vqa val</a> |
| Visual Genome | <a href="https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip">images part1</a> <a href="https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip">images part2</a> <a href="https://homes.cs.washington.edu/~ranjay/visualgenome/data/dataset/image_data.json.zip"> image meta data </a> |
| TextCaps | <a href="https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip">images</a> <a href="https://dl.fbaipublicfiles.com/textvqa/data/textcaps/TextCaps_0.1_train.json"> annotations</a> |
| RefCOCO | <a href="https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip"> annotations </a> |
| RefCOCO+ | <a href="https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip"> annotations </a> |
| RefCOCOg | <a href="https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip"> annotations </a> |
| OKVQA | <a href="https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/okvqa/okvqa_train.json"> annotations </a> |
| AOK-VQA | <a href="https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/aokvqa/aokvqa_v1p0_train.json"> annotations </a> |
| OCR-VQA | <a href="https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing"> annotations </a> |
| GQA | <a href="https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip">images</a> <a href="https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/gqa/train_balanced_questions.json"> annotations </a> |
| Filtered flickr-30k | <a href="https://drive.google.com/drive/folders/19c_ggBI77AvdtYlPbuI0ZpnPz73T5teX?usp=sharing"> annotations </a> |
| Multi-task conversation | <a href="https://drive.google.com/file/d/11HHqB2c29hbSk-WLxdta-nG8UCUrcCN1/view?usp=sharing"> annotations </a> |
| Filtered unnatural instruction | <a href="https://drive.google.com/file/d/1lXNnBcb5WU-sc8Fe2T2N8J0NRw4sBLev/view?usp=sharing"> annotations </a> |
| LLaVA | <a href="https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/resolve/main/complex_reasoning_77k.json"> Compelex reasoning </a> <a href="https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/resolve/main/detail_23k.json"> Detailed description </a> <a href="https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/resolve/main/conversation_58k.json"> Conversation </a> |
Download the COCO 2014 images and captions
coco 2014 images path
${MINIGPTv2_DATASET}
├── coco
│ ├── images
...
coco caption annotation path
${MINIGPTv2_DATASET}
├── coco_captions
│ └── annotations
│ ├── coco_karpathy_train.json
...
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the coco_karpathy_train.json path
Download the vqa v2 train and validation json files
├── ${MINIGPTv2_DATASET}
│ ├── vqav2
│ ├── vqa_train.json
| ├── vqa_val.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the vqa_train.json and vqa_val.json path
Download visiual genome images and annotation files
${MINIGPTv2_DATASET}
├── visual_genome
│ ├── VG_100K
│ ├── VG_100K_2
│ └── region_descriptions.json
│ └── image_data.json
...
Set image_path to visual_genome folder. Similarly, set ann_path to the visual_genome folder.
Download the TextCaps images and annotation files
├── ${MINIGPTv2_DATASET}
│ ├── textcaps
│ ├── train_images
│ ├── TextCaps_0.1_train.json
Set image_path to TextCaps train_images folder. Similarly, set ann_path to the TextCaps_0.1_train.json path
Download the RefCOCO, RefCOCO+, RefCOCOg annotation files
${MINIGPTv2_DATASET}
├── refcoco_annotations
│ ├── refcoco
│ │ ├── instances.json
│ │ ├── refs(google).p
│ │ └── refs(unc).p
│ ├── refcoco+
│ │ ├── instances.json
│ │ └── refs(unc).p
│ └── refcocog
│ ├── instances.json
│ ├── refs(google).p
│ └─── refs(und).p
...
Set image_path to the COCO 2014 image folder. Similarly, set ann_path in all the following configs to the above folder refcoco_annotations that contains refcoco, refcoco+, and refcocog.
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── okvqa
│ ├── okvqa_train.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the OKVQA dataset
Download the AOK-VQA annotation dataset
export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── aokvqa
│ ├── aokvqa_v1p0_train.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the AOKVQA dataset
Download the OCR-VQA annotation files download the images with loadDataset.py script
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── ocrvqa
│ ├── images
│ ├── dataset.json
Set image_path as the ocrvqa/images folder. Similarly, set ann_path to the dataset.json
Download the GQA annotation files and images
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── gqa
│ ├── images
│ ├── train_balanced_questions.json
Set image_path as the gqa/images folder. Similarly, set ann_path to the train_balanced_questions.json
Download filtered Flickr-30k images (fill this form on official website or from kaggle) and annotation files
${MINIGPTv2_DATASET}
├── filtered_flickr
│ ├── images
│ ├── captiontobbox.json
│ ├── groundedcaption.json
│ └── phrasetobbox.json
...
Set image_path as the flickr-30k images foler. Similarly, set ann_path to the groundedcaption.json, captiontobbox.json and phrasetobbox.json for the grounded image caption, caption to bbox, and phrase to bbox datasets.
Download the multi-task converstation dataset
Location_you_like
${MINIGPTv2_DATASET}
├── multitask_conversation
│ └── multitask_conversation.json
...
Set image_path as the COCO 2014 images folder. Similarly, set ann_path to the multitask_conversation.json file path
Download the filtered unnatural instruction annotation files (we remove the very long sentences from the original unnatural instruction dataset)
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── unnatural_instructions
│ ├── filtered_unnatural_instruction.json
There is no image path. Similarly, set ann_path to the filtered_unnatural_instruction.json file path
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── llava
│ ├── conversation_58k.json
│ ├── detail_23k.json
│ ├── complex_reasoning_77k.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the previous downloaded conversation_58k.json, detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yaml, and reason.yaml, respectively.