How to extract text from tex file

Using applications, configuring, problems
Post Reply
Message
Author
Taavi
Posts: 146
Joined: Fri 10 Mar 2006, 19:23
Location: Suomi, Finland

How to extract text from tex file

#1 Post by Taavi »

I'm trying to extract text between \begin{document} and \end{document} commands so that the commands don't get printed. Like this:

Code: Select all

\documentclass[a4paper,finnish]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\begin{document}
\subsection*{Nice things}
Sometihing very wise and beautiful things in here.
\end{document}
The part I want to get extracted is:

Code: Select all

\subsection*{Nice things}
Sometihing very wise and beautiful things in here.
I managed to do it with:

Code: Select all

sed -n '/\\subsection/,$p' "$file" | sed '$d'  >>  other.tex
For some reason it don't work with \begin{document} or \begin instead of \subsection. I've been googling and trying and this was my best result. I know there are better ways to do this so please tell me what they are.
Bruce B

#2 Post by Bruce B »

pack and upload the file
Taavi
Posts: 146
Joined: Fri 10 Mar 2006, 19:23
Location: Suomi, Finland

#3 Post by Taavi »

Thanks for your interest,

my problem is just how to extract from tex file everything that is between marks "\begin{document}" and "\end{document}". My tex files look like this:

\documentclass[a4paper,finnish]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{babel}
\begin{document}
\section*{Some header here}
Text here and more text and more.

With the empty line starts a new paragraph.
\subsection*{Subheader}
And text and text and text.

\end{document}

I want get the bolded part extracted to an other file. I did it with grep like this:

grep -v -e "[\]begin{document}" -e "[\]documentclass" -e "[\]usepackage" -e "[\]end{document}"

But it lose the paragraphs that are marked with empty lines in tex files. So I tried it with sed and the best one I got is the one on my first post. Sorry, this is hard for me to explain even in finnish.
Taavi
Posts: 146
Joined: Fri 10 Mar 2006, 19:23
Location: Suomi, Finland

#4 Post by Taavi »

Ok I get it working with grep. There was a typo in my script. The working solution is this:

for file in *.tex
do
TEXT=$(cat "$file" | grep -v -e "[\]begin{document}" -e "[\]documentclass" -e "[\]usepackage" -e "[\]end{document}" )
echo "$TEXT" >> other.tex
done

I had put double brackets by mistake (echo ""$TEXT"") and that messed up parahgraphs.
Taavi
Posts: 146
Joined: Fri 10 Mar 2006, 19:23
Location: Suomi, Finland

#5 Post by Taavi »

Ok I get it working with grep. There was a typo in my script. The working solution is this:

for file in *.tex
do
TEXT=$(cat "$file" | grep -v -e "[\]begin{document}" -e "[\]documentclass" -e "[\]usepackage" -e "[\]end{document}" )
echo "$TEXT" >> other.tex
done

I had put double brackets by mistake (echo ""$TEXT"") and that messed up parahgraphs.

But if somebody knows more beautiful solution, tell me.
Post Reply