Simultaneous subtitles in two languages

Simultaneous subtitles in two languages

[Instrucciones simplificadas en español aparecen al final de la página]

From time to time I do something completely apart from astronomy, and in this case what I have recently done is to create subtitles in srt format for a lot of movies. What I wanted to do is to show the original English and the Spanish subtitles at the same time, so that the english one appears in the first line of text and the Spanish translation just below. I have a home cinema at home (I mean a real cinema screen 1.5m width and a projector for 2D/3D movies) and I wanted to practice/learn some English.

There are out there some options for such purpose. For instance, it seems there's some commercial software in Windows allowing to show two subtitle tracks simultaneously. A plugin for VLC also allows this, but it isn't very friendly or practical. In addition, these solutions show one language in one side and another on the other side of the screen, so that both tracks are imposible to read at the same time, specially when they are spreaded in several lines of text. I wanted to have always only two lines of text so that most of the times the word you don't recognize is translated just below it into your preferred language.

So one weekend I decided to implement a confortable program (at least for me) which simply extracts the English and Spanish subtitle tracks (must be in srt format, which is the most used one) using mkvmerge and combines them to create the new output srt file. The program iterates around all movies in a given set of directories, and can optionally ignore all files previously processed in a previous execution, working in an incremental way. The output srt files are stored in a subdirectory named 'srt' and located inside the directories of the movies.

Well, here is the code.

package research.other;
 
import java.io.File;
 
import javax.swing.JOptionPane;
import javax.swing.JScrollPane;
import javax.swing.JTextArea;
 
import jparsec.ephem.Functions;
import jparsec.graph.DataSet;
import jparsec.io.ApplicationLauncher;
import jparsec.io.FileIO;
import jparsec.io.ReadFile;
import jparsec.io.WriteFile;
import jparsec.time.DateTimeOps;
import jparsec.util.JPARSECException;
 
/**
 * To create subtitles files (srt format) with the fusion of 2 subtitle files with 2 languages,
 * so that they can be shown in any video player. The program uses mkvmerge to get automatically
 * the spanish/english track IDs for all files in a given directory, all of them processed
 * automatically.
 * @author T. Alonso Albi - OAN (Spain)
 */
public class SubtitleFusion {
 
	private static final boolean USE_GUI = false;
	private static String LANG_UP = "English", LANG_DOWN = "Spanish";
	private static String SUBTITLE_LANG_UP = LANG_UP.substring(0, 3).toLowerCase();
	private static String SUBTITLE_LANG_DOWN = LANG_DOWN.substring(0, 3).toLowerCase();
	private static String sep = FileIO.getLineSeparator();
	private static StringBuffer log = new StringBuffer("");
 
	  /**
	   * Test program.
	   * @param args Not used.
	   */
	  public static void main(String args[]) {
		  System.out.println("SubtitleFusion");
 
		  boolean overwriteOldSRT = false; // false => Incremental work, ignored previously processed files
 
		  try {
			  String pathMovies[] = new String[] {
					  "/mnt/sdc3/",
					  "/mnt/sdc2/Pelis/",
					  "/mnt/sdc2/Pelis3D/",
					  "/mnt/sdc3/Juego de Tronos - Temporada 1/",
					  "/mnt/sdc3/Juego de Tronos - Temporada 2/",
					  "/mnt/sdc3/Juego de Tronos - Temporada 3/",
			  };
 
			  if (USE_GUI) {
				  String info = "There are two main steps to use this program (install mkvtoolnix first!):"+sep+"1. Select basic options (languages to extract)."+sep+"2. Select the directory with the movies."+sep;
				  info += "Output will be .srt files located in the subdirectory 'srt'"+sep+"inside the provided directory with the movies."+sep+"At the end a log will be shown with all problems found."+sep+sep;
				  info += "Hay 2 pasos principales para usar el programa (instala mkvtoolnix primero!):"+sep+"1. Elegir las opciones básicas (lenguajes a extraer)."+sep+"2. Elegir el directorio con las películas."+sep;
				  info += "La salida del programa serán ficheros de extension .srt en el"+sep+"subdirectorio 'srt' localizado en el directorio de las peliculas."+sep+"Al final se dará un registro con los problemas encontrados.";
				  JOptionPane.showMessageDialog(null, info, "Welcome/Bienvenido", JOptionPane.INFORMATION_MESSAGE);
				  String value = SUBTITLE_LANG_UP+","+SUBTITLE_LANG_DOWN+",no";
				  info = "Select the language identifier of the track to show up,"+sep+"the one to show below, and select if all movies"+sep+"should be processed (yes), or only the new ones (no)."+sep+sep;
				  info += "Selecciona el identificador de la pista de lenguaje a mostrar arriba,"+sep+"el otro a mostrar debajo, y elige si todas las pelis"+sep+"deben ser procesadas (sí), o sólo deben procesarse las nuevas (no)"+sep;
				  String out = JOptionPane.showInputDialog(null, info, value);
				  if (out == null) System.exit(0);
				  int n = FileIO.getNumberOfFields(out, ",", false);
				  if (n != 3) System.exit(0);
				  SUBTITLE_LANG_UP = FileIO.getField(1, out, ",", false);
				  SUBTITLE_LANG_DOWN = FileIO.getField(2, out, ",", false);
				  if (!SUBTITLE_LANG_UP.equals("eng")) LANG_UP = SUBTITLE_LANG_UP;
				  if (!SUBTITLE_LANG_DOWN.equals("spa")) LANG_DOWN = SUBTITLE_LANG_DOWN;
				  overwriteOldSRT = false;
				  if (!FileIO.getField(3, out, ",", false).equals("no")) overwriteOldSRT = true;
				  pathMovies = new String[] {FileIO.directoryChooser(null, true)};
			  }
 
			  for (int j=0; j<pathMovies.length; j++) {
				  String files[] = FileIO.getFiles(pathMovies[j]);
 
				  FileIO.createDirectory(pathMovies[j]+"srt"+FileIO.getFileSeparator());
				  for (int file=0;file<files.length;file++) {
					  if (files[file].endsWith(".srt") || files[file].endsWith(".zip") || 
							  files[file].endsWith(".txt") || FileIO.getFileNameFromPath(files[file]).startsWith(".")) continue;
 
					  String srt = pathMovies[j]+"srt"+FileIO.getFileSeparator()+FileIO.getFileNameFromPath(files[file]);
					  srt = srt.substring(0, srt.lastIndexOf("."));
					  srt += ".srt";
					  if (!overwriteOldSRT) {
						  File f = new File(srt);
						  if (f.exists()) continue;
					  }
 
					  report("Processing "+files[file]+" ("+(file+1)+"/"+files.length+")");
					  Process p = ApplicationLauncher.executeCommand(new String[] {"mkvmerge", "-I", files[file]});
					  String out = ApplicationLauncher.getConsoleOutputFromProcess(p);
					  if (out.indexOf("Error") >= 0) {
						  report(" Error: cannot obtain tracks info. Maybe a bad/strange file name");
						  continue;
					  }
 
					  String s[] = DataSet.toStringArray(out, sep);
					  int tracks[] = new int[] {-1, -1};
					  int nid = -1;
					  for (int i=0; i<s.length; i++) {
						  s[i] = s[i].toLowerCase();
						  if (s[i].indexOf("subtitle") >= 0) {
							  int isSRT = s[i].indexOf("s_text/utf8");
							  if (isSRT < 0) continue;
							  int n1 = -1, n2 = -1, n = s[i].indexOf("track_name");
							  if (n > 0) {
								  String tn = s[i].substring(n);
								  tn = tn.substring(0, tn.indexOf(" "));
								  n1 = tn.indexOf("forced");
								  n2 = tn.indexOf("forzado");
							  }
							  if (n1 < 0 && n2 < 0) {
								  n1 = s[i].indexOf(SUBTITLE_LANG_UP);
								  n2 = s[i].indexOf(SUBTITLE_LANG_DOWN);		
								  nid ++;
								  int n3 = s[i].indexOf(":");
								  String id = s[i].substring(0, n3);
								  id = id.substring(id.lastIndexOf(" ")).trim();
								  int track = Integer.parseInt(id);
								  if (n1 < 0 && n2 < 0) {
									  if (nid < 2 && tracks[nid] < 0) tracks[nid] = track;
								  } else {
									  if (n1 > 0 && n2 < 0) {
										  tracks[1] = track;
									  } else {
										  if (n2 > 0 && n1 < 0) {
											  tracks[0] = track;
										  } else {
											  if (nid < 2 && tracks[nid] < 0) tracks[nid] = track;										  
										  }									  
									  }
								  }
							  }
						  }
					  }
 
					  if (tracks[0] >= 0 && tracks[1] >= 0) {
						  if (nid > 1) report(" Warning: found "+(nid+1)+" non forced subtitle tracks. Processing will continue");
						  report(" Extracting tracks: "+tracks[1]+" as "+LANG_UP+", "+tracks[0]+" as "+LANG_DOWN);
						  String c[] = new String[] {
								  "mkvextract", "tracks", files[file], tracks[0]+":"+pathMovies[j]+LANG_DOWN.toLowerCase()+".srt", tracks[1]+":"+pathMovies[j]+LANG_UP.toLowerCase()+".srt" 
						  };
						  Process p2 = ApplicationLauncher.executeCommand(c);
						  out = ApplicationLauncher.getConsoleOutputFromProcess(p2);
						  if (out.indexOf("Error") >= 0) {
							  report(" Error: cannot launch mkvextract");
							  continue;
						  }
 
						  String sub = createFusionedSubtitles(files[file]);
						  FileIO.deleteFile(pathMovies[j]+LANG_DOWN.toLowerCase()+".srt");
						  FileIO.deleteFile(pathMovies[j]+LANG_UP.toLowerCase()+".srt");
						  if (sub != null && !sub.trim().equals("")) {
							  WriteFile.writeAnyExternalFile(srt, sub, ReadFile.ENCODING_UTF_8);
						  } else {
							  report(" Warning: empty subtitles data, probably not SRT format. No file generated");					  						  
						  }
					  } else {
						  report(" Error: could not find both subtitles tracks");					  
					  }
				  }
			  }
 
			  if (USE_GUI)
				  JOptionPane.showMessageDialog(null, new JScrollPane(new JTextArea(log.toString(), 20, 40)), "Log/Registro", JOptionPane.INFORMATION_MESSAGE);
		  } catch (Exception e) {
			  e.printStackTrace();
			  JOptionPane.showMessageDialog(null, DataSet.toString(JPARSECException.toStringArray(e.getStackTrace()), sep), "Error", JOptionPane.ERROR_MESSAGE);
		  }
	  }
 
	  private static void report(String message) {
		  System.out.println(message);
		  if (USE_GUI) log.append(message + sep);
	  }
 
	  private static String createFusionedSubtitles(String file) throws JPARSECException {
		  String p = FileIO.getDirectoryFromPath(file);
		  String pathEng = p+LANG_UP.toLowerCase()+".srt";
		  String pathSpa = p+LANG_DOWN.toLowerCase()+".srt";
		  String charsetEng = ReadFile.ENCODING_UTF_8;
		  String charsetSpa = ReadFile.ENCODING_UTF_8;
		  //String path = FileIO.getDirectoryFromPath(pathEng) + "salida.srt";
		  //String charset = ReadFile.ENCODING_UTF_8;
		  String separator = "  |||  ";
		  int eliminateAllBefore = 7; // Skip all before x seconds
		  String eliminateAllContaining1 = "Subtitles downloaded";
		  String eliminateAllContaining2 = "traducción de";
		  boolean debug = false;
		  double offset = 0; // seconds
 
		  String dataEng[] = DataSet.arrayListToStringArray(ReadFile.readAnyExternalFile(pathEng, charsetEng));
		  String dataSpa[] = DataSet.arrayListToStringArray(ReadFile.readAnyExternalFile(pathSpa, charsetSpa));
		  StringBuffer out = new StringBuffer("");
		  boolean start = false;
		  double times[] = null;
		  String data[] = null;
		  int index = 0;
		  for (int i=0; i<dataEng.length; i++) {
			  if (debug) report(dataEng[i]);
			  if (dataEng[i].indexOf("-->") > 0) {
				  start = true;
				  times = getTimes(dataEng[i]);
				  data = new String[0];
				  continue;
			  }
			  if (start) {
				  if (dataEng[i].trim().equals("")) {
					  start = false;
					  if (data.length == 0) continue;
					  if (eliminateAllBefore <= 0 || (times[0] > eliminateAllBefore && times[1] > eliminateAllBefore)) {
						  if (eliminateAllContaining1 == null || data[0].toLowerCase().indexOf(eliminateAllContaining1) < 0) {
							  if (eliminateAllContaining2 == null || data[0].toLowerCase().indexOf(eliminateAllContaining2) < 0) {
								  index ++;
								  out.append(""+index+sep+formatTime(times, offset)+sep+
										  DataSet.toString(getFusionedSubtitle(data, times, dataSpa, separator, eliminateAllBefore, eliminateAllContaining1, eliminateAllContaining2, index), sep)+
										  sep+sep);
							  }								  
						  }
					  }
				  } else {
					  data = DataSet.addStringArray(data, new String[] {dataEng[i]});
				  }
			  }
		  }
		  String s = out.toString();
		  s = DataSet.replaceAll(s, "<i>", "", true);
		  s = DataSet.replaceAll(s, "</i>", "", true);
		  //WriteFile.writeAnyExternalFile(path, s, charset);
		  return s;
	  }
 
	  private static String[] getFusionedSubtitle(String out[], double timesEng[], String dataSpa[], 
			  String separator, int eliminateAllBefore, String eliminateAllContaining1, String eliminateAllContaining2, int index) {
		  boolean start = false;
		  double times[] = null;
		  String data[] = null;
		  double mean = (timesEng[0] + timesEng[1]) / 2.0, duration = timesEng[1] - timesEng[0];
		  double minDif = -1;
		  String sub[] = null;
		  for (int i=0; i<dataSpa.length; i++) {
			  if (dataSpa[i].indexOf("-->") > 0) {
				  start = true;
				  times = getTimes(dataSpa[i]);
				  data = new String[0];
				  continue;
			  }
			  if (start) {
				  if (dataSpa[i].trim().equals("")) {
					  start = false;
					  if (data.length == 0) continue;
					  if (eliminateAllBefore <= 0 || (times[0] > eliminateAllBefore && times[1] > eliminateAllBefore)) {
						  if (eliminateAllContaining1 == null || data[0].toLowerCase().indexOf(eliminateAllContaining1) < 0) {
							  if (eliminateAllContaining2 == null || data[0].toLowerCase().indexOf(eliminateAllContaining2) < 0) {
								  double m = (times[0] + times[1]) / 2.0; //, d = times[1] - times[0];
								  double dif = Math.abs(m-mean);
								  if (dif < minDif || minDif == -1) {
									  minDif = dif;
									  sub = data;
								  }
							  }								  
						  }
					  }
				  } else {
					  data = DataSet.addStringArray(data, new String[] {dataSpa[i]});
				  }
			  }
		  }
 
		  String line1 = DataSet.toString(out, " ");
		  out = new String[] {line1, "-"};
		  if (sub == null) { // || minDif > Math.max(duration, 5)) {
			  report("*** Cannot match subtitles for index "+index);
			  return out;
		  }
 
		  // Method 1: English on first line, Spanish below
		  String line2 = DataSet.toString(sub, " ");
		  out[1] = line2;
 
		  /*
		  // Method 2: English on the left, Spanish on the right (seems a bad idea, and is more difficult)
		  if (sub.length == 2 && out.length == 1) {
			  String f[] = DataSet.toStringArray(out[0], " ");
			  out = new String[2];
			  int n = f.length/2-1;
			  out[0] = DataSet.toString(DataSet.getSubArray(f, 0, n), " ");
			  out[1] = DataSet.toString(DataSet.getSubArray(f, n+1, f.length-1), " ");
		  }
 
		  if (sub.length <= out.length) {
			  for (int i=0; i<sub.length; i++) {
				  out[i] = out[i] + separator + sub[i];
			  }
		  } else {
			  String nout[] = new String[sub.length];
			  for (int i=0; i<sub.length; i++) {
				  if (i < out.length) {
					  nout[i] = out[i] + separator;
				  } else {
					  nout[i] = DataSet.repeatString(".", out[0].length()) + separator;
				  }
				  nout[i] += sub[i];
			  }
			  out = nout;
		  }
		  */
		  return out;
	  }
 
	  private static double[] getTimes(String line) {
		  line = DataSet.replaceAll(line, "-->", "", true).trim();
		  String f1 = FileIO.getField(1, line, " ", true);
		  String f2 = FileIO.getField(2, line, " ", true);
		  return new double[] {getTime(f1), getTime(f2)};
	  }
 
	  private static double getTime(String f) {
		  String f1 = FileIO.getField(1, f, ":", true);
		  String f2 = FileIO.getField(2, f, ":", true);
		  String f3 = FileIO.getField(3, f, ":", true);
		  f3 = DataSet.replaceAll(f3, ",", ".", true);
		  return Double.parseDouble(f1) * 3600 + Double.parseDouble(f2) * 60 + Double.parseDouble(f3);
	  }
 
	  private static String formatTime(double times[], double offset) {
		  double val1[] = getValues(times[0]+offset);
		  double val2[] = getValues(times[1]+offset);
		  String v1 = ""+DateTimeOps.twoDigits((int)val1[0])+":"+DateTimeOps.twoDigits((int)val1[1])+":"+Functions.formatValue(val1[2], 3, 2, true);
		  String v2 = ""+DateTimeOps.twoDigits((int)val2[0])+":"+DateTimeOps.twoDigits((int)val2[1])+":"+Functions.formatValue(val2[2], 3, 2, true);
		  v1 = DataSet.replaceAll(v1, ".", ",", true);
		  v2 = DataSet.replaceAll(v2, ".", ",", true);
		  return v1 + " --> "+v2;
	  }
 
	  private static double[] getValues(double v) {
		  int v1 = (int) (v / 3600.0);
		  double r = v - v1 * 3600.0;
		  int v2 = (int) (r / 60.0);
		  double v3 = r - v2 * 60.0;
		  return new double[] {v1, v2, v3};
	  }
}

As you see, there is an option to use some graphical interface to simplify the process, although no special option is provided. Although you can change some variables at the beginning to extract languages other than Spanish and English, the main input is just the set of directories to process. Of course you should install mkvtoolnix before using this program. It is completely automatic, but has some limitations to be taken into account:

  • It is not possible to call mkvmerge through command line for movies containing problematic or strange characters. Probably this can be fixed using some kind of encoding, but I haven't taken the time to do that. What I simply do it to rename those files and execute the program again.
  • You could find a movie with several non forced Spanish or English tracks. In that case the last one will be used. This could be wrong in some cases, but obviously you should never have several non forced tracks for a given language. By the way, the program ignores all forced tracks.
  • The subtitles tracks must be included within the movie file in srt format. This is usual in modern mkv files with high definition movies. Otherwise this program will be almost useless.

The program is written to take by itself some 'inteligent' decissions in some cases. For instance, lines 188 to 190 will remove some spam from the srt source files, and also all subtitles written before a given number of seconds from the beggining of the movie. The program will write a log of the process to the console or to a window in case the graphical interface is enabled.

Using it I have automatically created the combined subtitles of more than 250 movies so far, and also the first three seasons of Game of Thrones.

Using the program without any knowledge of programming

In case you would like to experiment with this program and you know nothing about programming here are the steps to follow:

  • Install mkvtoolnix and Java (at least Java 1.6).
  • Download this file, which is a .jar file for the Java program 'subtitle fusion'.
  • In Windows double clic on the file, In Linux/Mac execute the command 'java -jar subtitleFusion.jar'.

The provided program has no requirements apart from those mentioned in the first point, and the use of the graphical input interface is activated (basic instructions are provided in the program). I cannot guarratee it will work for you, although it should run fine in Linux and Mac. In Windows it will probably run fine also, although I haven't tested it on Windows.

Usando el programa sin conocimientos de programación

En el caso de que quieras utilizar este programa y no sepas como compilar o utilizar programas escritos en Java, estos son los pasos a seguir:

  • Instala mkvtoolnix y Java (al menos Java 1.6).
  • Descarga este fichero, que es el fichero .jar del programa Java para fusionar los subtítulos.
  • En Windows haz doble clic en el fichero, en Linux/Mac ejecuta el comando 'java -jar subtitleFusion.jar'.

El programa proporcionado no tiene otros requerimientos aparte de los mencionados en el primer punto, y el uso de la interfaz gráfica está activada para que su uso sea más sencillo (el programa proporciona las instrucciones básicas necesarias). No puedo garantizar que funcione en tu PC, aunque debería correr sin problemas en cualquier PC con Linux o Mac. En Windows probablemente también funcione correctamente, aunque no lo he probado.

Posibles problemas que se pueden presentar:

  • Las nombres de los ficheros no deben contener caracteres extraños como tíldes por ejemplo. Si es así el programa probablemente no podrá procesar el fichero y no generará el fichero srt. En tal caso basta con renombrarlo y ejecutar de nuevo el programa.
  • Si una película contiene varias pistas de subtítulos no forzados para un mismo idioma, se usará la última que encuentre. Esto puede dar lugar a un fichero erróneo, aunque no deberías tener películas con varias pistas de subtítulos para el mismo idioma. El programa ignorará toda pista de subtítulos forzados.
  • Si los subtítulos no están en formato srt o no se encuentran para ambos idiomas el fichero será ignorado. Normalmente los ficheros mkv suelen tener las pistas de subtítulos, sobre todo si la película está en alta definición, pero esto no es siempre así. Si no tienes este tipo de ficheros no podrás sacar partido al programa.

Al final de la ejecución del programa se mostrará una ventana con un registro del proceso que se ha hecho. Cualquier problema que haya surgido aparecerá en forma de mensaje de aviso (Warning) o error (Error). Si tienes que renombrar algunos ficheros ahí verás cuáles y podrás generar los srt faltantes sin problemas, dado que el programa corre por defecto en modo incremental, ignorando (no sobreescribiendo) todo lo hecho anteriormente.

Discussion

Aliela , 2014/01/01 21:06

Bienes buena información se puede encontrar en el sitio .

Enter your comment
   ____   ____     __  ____   ____
  / __/  / __/ __ / / / __ \ /_  /
 / _/   _\ \  / // / / /_/ /  / /_
/_/    /___/  \___/  \___\_\ /___/
 
 
blog/subtitles_eng_spa.txt · created: 2013/12/10 15:40 (Last modified 2018/11/21 11:20) by Tomás Alonso Albi
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki