Back to Open Assistant

README

data/datasets/tv_dialogue/README.md

0.0.12.7 KB
Original Source

Dataset Card for "tv_dialogue"

This dataset contains transcripts for famous movies and TV shows from multiple sources.

An example dialogue would be:

[PERSON 1] Hello
[PERSON 2] Hello Person 2!
How's it going?

(they are both talking)

[PERSON 1] I like being an example
on Huggingface!

They are examples on Huggingface.
CUT OUT TO ANOTHER SCENCE

We are somewhere else
[PERSON 1 (v.o)] I wonder where we are?

All dialogues were processed to follow this format. Each row is a single episode / movie (2781 rows total) Following the OpenAssistant format The METADATA column contains dditional information as a JSON string.

Dialogue only, with some information on the scene

ShowNumber of scriptsViaSource
Friends236 episodeshttps://github.com/emorynlp/character-miningfriends/emorynlp
The Office186 episodeshttps://www.kaggle.com/datasets/nasirkhalid24/the-office-us-complete-dialoguetranscriptoffice/nasirkhalid24
Marvel Cinematic Universe18 movieshttps://www.kaggle.com/datasets/pdunton/marvel-cinematic-universe-dialoguemarvel/pdunton
Doctor Who306 episodeshttps://www.kaggle.com/datasets/jeanmidev/doctor-whodrwho/jeanmidev
Star Trek708 episodeshttp://www.chakoteya.net/StarTrek/index.html based on https://github.com/GJBroughton/Star_Trek_Scripts/statrek/chakoteya

Actual transcripts with detailed information on the scenes

ShowNumber of scriptsViaSource
Top Movies919 movieshttps://imsdb.com/imsdb
Top Movies171 movieshttps://www.dailyscript.com/dailyscript
Stargate SG-118 episodeshttps://imsdb.com/imsdb
South Park129 episodeshttps://imsdb.com/imsdb
Knight Rider80 episodeshttp://www.knightriderarchives.com/knightriderarchives