o00000000000000000000000000000000000000000o 00 Kx xK LL aaaaa aaaaa tt 00 00 KK Kx LL aa aa ttttttt uu 00 00 KKK LL aaaaa aaaaa tt uu uu 00 00 KK Kx LL a aa aa a tt uu uu 00 o0 KK xKK LLLLaaaaa aaaaaa tt uuuu 0o o00000000000000000000000000000000000000000o ........:klaatu.howto.info.phylez:......... :----------------: : The Preamble : ;________________; | | \|/ ` Hi there, neighbour! Perhaps you've heard of docbook? It's an xml schema (whatever that means) so that you can write documents and then publish them as really nice html ebooks, pdfs, rtf, plain text, and all kinds of other formats. I think it's a great thing to learn because it enables you to take desktop publishing to a whole new level and create really nice pdfs with working tables of contents that are auto-generated, a pleasing layout, and all that. Anyway, this is a tutorial on how to start with some plain text (like this file!) and end up with really beautiful html or pdf output. Of course, I recommend Slackermedia; the do-it-yourself distro-from-text that is Slackware plus a bunch of select multimedia tools that will make content creation a breeze. If you haven't tried this, go to slackermedia.info and kiss a year of your life good-bye. But it'll be a really geeky and fun year and after that, you'll be able to do all kinds of cool multimedia stuff and not give one red cent to The Man. ,----------------, : The Secrets! : ;________________; | | \|/ ` Here's a general idea of how it'll go. We call this "the workflow" in The Biz: 1. Write yer book. I suggest keeping each chapter in a separate file; 01.txt 02.txt 03.txt and so on. 2. Use a shell script like text2docbook to convert your plain text to docbook markup. Have no idea how to use a shell script? Don't worry, we're gonna go through all of this step-by-ever-loving-step in just a few moments. Read on, reader. 3. Proofread the markup to make sure it's really formatted the way you want it to be. Docbook is a complex system of markup a lot like HTML, only moreso. Don't try to get too fancy on your first time out, tiger. p--------------------------------------------------------------------------q | Tip! and a Link! - if you're not familiar with docbook at all, you | | will want to download the free docbook crash course, available from | | http://www.slackermedia.info/downloads.html | | If you are familiar with docbook but need a solid reference, be sure to | | check out the http://www.docbook.org/tdg51/en/html/docbook.html which is | | (also available as documentation in some repos and port systems) | b--------------------------------------------------------------------------d 4. write a Makefile to compile all your text together and generate html and pdf and whatever else. Hey! calm down, we're gonna go over how to do that too. 5. ??? 6. Profit! ,----------------------------, .-----: Step-by-Ever-Lovin'-Step : | ;____________________________; | | \|/ ` OK, so the concept is above, now let's actually do a quick and frightfully easy example. I am going to assume you know nothing about docbook, and little about html, but are at least vaguely familiar with GNU or Unix. We should start from the very beginning, meaning that happily, you'll only have to do these first few steps once: --> Setting Up Your Environment <-- First you need to install docbook. Now, saying "docbook" is basically just like saying "html" or "css" in that it's not really an application you're going to ever launch and use, but you still need the files that tell your computer what to do with a docbook file. What your *nix distro calls this will differ depending on what you run, but you're looking specifically for the docbook schemas, written by Norm Walsh I think. !__Slackware________ Slackware comes with docbook and lots of doctools included if you've done a full install. The schemas are located in /usr/share/xml/docbook/xml-dtd-4.5/docbookx.dtd That's the important thing you need to know. That is your docbook path. I like to call it $DOCLOC as in "docbook location" but you can think of it however you want to. Anyway, it's a good thing to know. __freeBSD______ On freeBSD the 4.5 docbook schemas are installable via /usr/ports/textproc/docbook-450 and your resultant path will be /usr/local/share/xml/docbook/4.5/docbookx.dtd __deb and rpm_____ On other Linux distributions, like Fedora or Ubuntu, you can install a couple of packages easily found by searching your distro's repository: sudo aptitude search docbook su -c 'yum search docbook' You can scan through the list that returns and install what you like, but be sure to install docbook xml schemas; on *buntu for example: sudo aptitude install docbook-xml docbook-dsssl docbook-xsl and your resultant path to docbook will be /usr/share/xml/docbook/schema/dtd/4.4/docbookx.dtd Don't worry that it's docbook 4.4 and not 4.5; if you're good enough at docbook to notice the difference then you're not reading this document :^) __Source Code______ Of course you don't have to go repos or ports for this; you could always just download the schemas straight from the source: docbook.org. You can do a dummy check to see what version they're on, but it will be something like --> http://www.docbook.org/xml/4.5/docbook-xml-4.5.zip That zip is a dreaded zipbomb, so be careful or you'll get docbook files all over the place. But you can make a directory such as /usr/local/share/xml/docbook/4.5 and unzip the files there. --> Installing the Right Tools <-- So now the computer knows, more or less, what "docbook" is, but now you need some command line tools so you can do cool things with your docbook files. The applications you'll want to install are: txt2docbook (tool to convert .txt to docbook) xmlto (tool to convert xml to html and .fo) fop (apache's tool to convert .fo to a dynamic .pdf) For xmlto and fop, your repository or ports will have them for installation; otherwise, install them from source. To check if they're already installed, simply type either xmlto or fop at a command line. You'll know if they're installed or not, trust me. As for txt2docbook: p--------------------------------------------------------------------------q | Link -> txt2docbook can be found at http://txt2docbook.sourceforge.net | b--------------------------------------------------------------------------d Here is the safest way to download this and unzip it and get it configured: If you don't already have a bin directory in your home folder, you probably should make one, because everyone else does: bash$ mkdir ~/bin Now make a home for txt2docbook: bash$ mkdir ~/bin/txt2docbook bash$ cd txt2docbook bash$ wget http://cdnetworks-us-2.dl.sourceforge.net/project/txt2docbook/txt2docbook/0.9/txt2docbook-0.91.zip -O ./txt2docbook.zip bash$ unzip txt2docbook.zip Now you have the script and it's unzipped in its own directory. To set it up, you must edit output.pl in a text editor and comment out the SYSTEMIDENTIFIER and IDENTIFIER lines. If you know what you're doing and would prefer to USE those lines, you may, but I find it bothersome to have that information in every file txt2docbook converts, so I comment those lines out so it will look something like this: # !!!! IMPORTANT !!!! # You have to set a valid path to your docbook-DTD here. # If you omit XML system identifier, comment out both $SYSTEMIDENTIFIER and $IDENTIFIER. # $SYSTEMIDENTIFIER="docbook/docbookx.dtd"; # $IDENTIFIER=''; # !!!!!!!!!!!!!!!!!!! And finally, of course, you must chmod the script so it is executable: p-----------------------------------q | bash$ chmod +x txt2docbook.pl | | | b-----------------------------------d The script is now successfully configured. Unfortunately I'm too much a perl noob to know how to get the script to run outside of its own directory so in order to use it, you have to, I think, be IN the folder where the script and all of its little mates are. Basically, it's a path error (can't find output-module output.pl in current path) so there's a variable I need to set somewhere, I just don't know what it is. So, pushd over to the txt2docbook folder before you run the command: p--------------------------------------------------------------q | bash$ pwd | | myBook/ | | bash$ pushd ~/bin/txt2docbook | | bash$ ./txt2docbook.pl ~/myBook/foo.txt > ~/myBook/foo.xml | | bash$ popd | | bash$ pwd | | myBook/ | b--------------------------------/fig. 2: perl magick in action! But hey, it works. And now your toolkit is complete. ** So just in case you're new to pushd and popd, they're just quick ways to cd from one directory and then back to another. Has nothing to do with perl or txt2docbook. Just want to show you that you could get over to the txt2docbook directory quickly and easily (by using pushd instead of cd), run the command or commands that you need to run, and then popd back to your book directory. --> Writing Stuff <-- Now it's time to do what we're really here to do: write! A docbook file looks something a little like this: p----------------------------------------------------------------------q | | | | | The GNU General Public License | | | | The licenses for most software and other practical works are | | designed to take away your freedom to share and change the works. | | By contrast, the GNU General Public License is intended to | | guarantee your freedom to share and change all versions of a | | program--to make sure it remains free software for all its users. | | | | | | | b----------------------------------------/fig 3. an actual docbook file! It's as simple as that, really. Of course, writing creatively with all that markup is difficult for me, but if you can do that, go for it. The best and quickest way to learn the basics is to start with the free ebook, Docbook Crash Course by David Rugge, Mark Galassi, and Eric Bischoff. You can download this fine work from my private stash at http://www.slackermedia.info/downloads.html Working your way through even just the first one or two chapters of that book will be enough to get you started. Now, if you're like me and can't write with all that markup, then just write what you want in plain text. p--------------------------------------------------------------------------q | Tip! - There are lots of Free Software applications with which to write | | your book! For efficiency and geek cred, use vim or emacs. | ! For simplicity and familiarity, use Kwrite or Kate. ! b--------------------------------------------------------------------------d I find it best to keep all my chapters in separate files, in one consolidated directory: p--------------------------------q | bash$ pwd | | /home/klaatu/myBook | | bash$ ls | | 01.txt | | 02.txt | | 03.txt | b-------------/fig. 4: a glimpse into klaatu's writing folder --> Converting Stuff <-- Spell check, proofread, make your changes. All the usual stuff when you're writing. THEN you need to get this into DOCBOOK format. You could do this by hand, maybe while proof reading it for that fourth time, but I find that tedious (you can only insert so many tags before your hand cramps up). So the easiest way to do this is to use that perl script we downloaded; txt2docbook. The syntax is simple: p---------------------------------------------------q | bash$ ~/bin/txt2docbook/txt2docbook.pl foo.txt | | | b------------------------/fig. 5: recycling figure 2! This creates a new file called foo.xml and contains all the docbook tags it can figure out. Now if you have 30 chapters or so then you may want to write a quick loop so this task is repeated automatically until done. I have separate tutorials on doing that kind of thing in BASH at http://uberleethackerforce.deepgeek.us These xml files you will have to edit, probably; txt2docbook does great with paragraph breaks, but in terms of knowing what you want to be a chapter versus a section versus an article, etc, is really up to you. So, edit the tags by hand now; this is fairly easy and is really about page layout more than anything. Of course if you have no idea what this is going to look like yet, you can not edit it and first convert it to html or pdf, look at the result, and then go back and edit the tags later. --> Adding the Docbook Header <-- One thing text2doc did not add for you (because we commented it out) is the docbook header. This is the function of the config.sh I have with my sources; it permits the user to enter their docbook path (DOCLOC) and the creates a simple docbook.header file with the docbook header in it along with the appropriate as Adding the Docbook Header <-- One thing text2doc did not add for you (because we commented it out) is the docbook header. This is the function of the config.sh I have with my sources; it permits the user to enter their docbook path (DOCLOC) and the creates a simple docbook.header file with the docbook header in it along with the appropriate path to the docbookx.dtd file. If you are literally going to be working on just one system such that the location of docbookx.dtd will never change, you could hardcode this into the first xml file, or you can keep it modular and create a docbook.header file. Point is, this declaration needs to happen at the very beginning of your work: p--------------------------------------------------------------q | | | | b------------------------------------/fig 5. the docbook header! No, I don't know what it means so don't ask. Go read the manual if you really care. --> xmlto html and xmlto pdf <-- Now we're ready to take all that confusing xml and make it into HTML so we can look at it in a web browser and a pdf so we can send it to all of our friends and carry it around on our mobiles and tablets. The application we use first is xmlto. First, get all the xml files into one big document: p---------------------------------------------q | bash$ cat docbook.header *.xml > tmp.xml | | | b----------------------/fig 6. creating tmp.xml And now to create a directory for the htmls files, and process tmp.xml with xmlto: p---------------------------------------------q | bash$ mkdir ./html | | bash$ xmlto html tmp.xml -o ./html | | | b----------------------/fig 7. xmlto in action! Obviously the syntax of xmlto is... xmlto - the command html - the type of output we want -o - the flag to tell xmlto where to dump the output files And now if you navigate into the html folder you'll find a BUNCH of html files, and if you launch konqueror or some other web browser to that folder, then you'll see it lookin' all pretty and really nicely laid out and stuff. For a pdf, the first and second steps are essentially the same; if you already have a concatenated tmp.xml then you can skip that step, and the second is similar: p---------------------------------------------q | bash$ mkdir ./pdf | | bash$ xmlto fo tmp.xml -o ./pdf | | | b----------------/fig 8. xmlto in action again! WTF is an fo file? I don't know, but it's the intermediate step between raw unadulterated XML and a fancy hot-link-clickable PDF. It dumps out a tmp.fo in your ./pdf directory. To get the tmp.fo into pdf, we use Apache's fop: p-----------------------------------------------------q | bash$ fop ./pdf/tmp.fo ./pdf/myBook_by_myName.pdf | | | b-------------------------------------------/fig 9. fop And now in your ./pdf directory you have a really really cool pdf with a table of contents that is clickable, and text that can be copied and pasted, and all that good stuff, just like the pro's. Except, in our case, we didn't have to sell our souls to the evil that is Ad0be :^) So, that's it, you're done. Oh, well, unless you want to take it to the next level. I mean, if you think you can handle it. Well, take a moment, think it over, and if you want this to be a really lean-and-mean docbook-wielding machine, gather your party and venture forth: --> The Makefile <-- This section is going to assume that you have compiled code from source before. If you have never done this, you should go learn how to do that and then return to this section. I think I have an episode of my podcast "The Bad Applez" aka "GNU World Order" on the subject. So, do what you have to do; learn it, install GCC or bin-utils or whatever it's called on your distro. If you're using Slackware or freeBSD, you already have that stuff installed. So, Makefiles are basically litle scripts for GNU Make. They have a specific syntax, and are infinitely flexible, but we're gonig to keep it simple here because, well, that's all we need, plus I'm a Makefile noob. The Makefile syntax is: foobar -> colon -> targets -> instruction set ...which then becomes executable by typing make foobar. So create a text file in your editor and call it Makefile (capitalization counts) and try this: p----------------------------------------------------q | # Makefile by myName | | | | html: docbook.header *.xml > tmp.xml | | cat docbook.header *.xml credits > tmp.xml | | xmlto html tmp.xml -o ./html | | | b---------------------/fig. 10 your very own Makefile! So the line that starts with html is the target line, meaning that when you type make html, GNU Make looks at those files; if they are not present, it returns an error (that is, it gets borked). Assuming everything's good, GNU Make continues and processes the next line, which is the cat line that generates tmp.xml, and then the xmlto command. Try it: p-----------------------q | bash$ make html | | | b----------------/fig. 11 And watch in amazement as your html files are generated with that one simple step. This is helpful largely because in real life you'll be making your html files a lot, as you find little layout errors here, or you update your book there, and so on. You can do the same for your pdf generations: p-------------------------------------------------------------q | # pretend like the rest of the Makefile is right here | | | | pdf: docbook.header *.xml | | cat docbook.header *.xml > tmp.xml | | xmlto fo tmp.xml -o ./pdf | | fop ./pdf/tmp.fo ./pdf/myBook_by_myName.pdf | | | b--------------------------------/fig. 12 More of the Makefile! Same deal. So, since it is kind of probable that you'll be running make a lot, the chances of you generating lots of little tmp.xml files and html files and stuff like that is great. It's quite helpful, and a feature of GNU Make, to be able to clean all that cruft out. This way you can always get back to your base state and feel confident that your make isn't failing because of some old file lying around. This is usually done with "make clean" but I also like to implement a "make tidy", where "tidy" will remove the little intermediary files like the tmp.fo and tmp.xml, and "clean" removes those PLUS the big main files like the html and pdf files and my custom .header file. I don't know that there is a canonical way to do this but here's what I do: p-------------------------------------------------------------q | # pretend like the rest of the Makefile is right here | | | | tidy: | | -rm -f ./pdf/*.fo tmp.xml *.out | | | | clean: | | -rm -f ./pdf/*.fo ./pdf/.pdf tmp.xml *.out | | -rm -f *.header | | -rm -f html/*.html | | | b----------------------------------/fig. 14 Makefile additions! To test this out, type make tidy and then do an ls on the directory; you'll find that the intermediary tmp.xml is now gone, and if you have done a make pdf then you'll notice that pdf/tmp.fo is also gone but that your fancy lookin' pdf is still there. Do a make clean and kiss that fancy pdf and all those beaoootiful html files good-bye. But his is a good thing; because now you can go and make changes to your source code (you know, change "teh" to "the" and "pwn" to "own" and all those other typos you let slip by) and do a ./config.sh and a make foobar again and voila, the workflow is reimplemented in a matter of two commands. Truly, truly brilliant. ,======!!! SPECIAL BONUS SECTON !!!======, \;:::................................:::;/ /:..The following section requires:....:;\ \;.....+.imagemagick..................::;/ <;.....+.pdftk.......................:::;> /;:::;............................::::::;\ `========================================' --> Front and Back Covers <-- There are lots of ways to do this. One guy I know uses, I think, xsltproc, and there is probably even a way in docbook markup to make it happen -- so my way is a bona fide hack. But it works, it works well, and so I use it. So, you want a fancy cover on your book, right? Maybe you want a back cover too. First of all, go into Inkscape and GIMP and Krita and all your other fancy graphic design tools and make the covers. Save them, preferably, as high quality .tif files. Now, using Image Magick (command line graphics app) to convert those into pdf files: p-----------------------------------------------q | bash$ convert frontCover.tif frontCover.pdf | | bash$ convert backCover.tif backCover.pdf | | | b--------------------------/fig. 15 image magick! Now use pdftk to concatenate the covers onto the main pdf: p-------------------------------------------------------------q | bash$ pdftk frontCover.pdf myBook.pdf cat output tmp.pdf | | bash$ mv myBook.pdf delete_me.pdf | bash$ pdftk tmp.pdf backCover.pdf cat output myBook.pdf | | | b----------------------------------------/fig. 16 image magick! And now look at myBook.pdf. It will most certainly BLOW YOUR MIND. You have nice cover art on it, PLUS you got to keep all the fancy hotlinks and hyperlinks and textual goodness that was your pdf. --> Salutation <-- Thanks for reading this text file, and I hope you enjoy all the new tricks you may have learned! Check out slackermedia.info for lots of more multimedia tips and tricks and...stuff. -klaatu --> Obligatory Link Section <-- http://klaatu.hackerpublicradio.org my little oggcast about GNU Linux the first "podcast" to release only in ogg http://www.hackerpublicradio.org a daily podcast for hackers, by hackers http://www.kernelpanicoggcast.net a bunch of guys talking about GNU Linux http://www.linuxcranks.info geeks talking about...GNU Linux http://www.slackermedia.info a DIY turn slackware into a multimedia distro http://www.slackware.com get slack http://www.slackbuilds.org get slackbuilds http://www.sbopkg.org slackbuilds.org front end