Bash may not be the best way to handle all kinds of data, but there often comes a time when you are provided with a pure Bash environment, such as what we get in the common Linux based supercomputers and you just want an early result or view of the data before driving into the real programming, using Python, R and SQL, SPSS, and so on. Expertise in data-intensive languages comes at the price of spending a lot of time on them. In contrast, bash scripting is simple, easy to learn and perfect for mining textual data. Therefore, learning Bash shell should be the first step if you want to say, Hello to “Big Data”!
This book will demonstrate four practical flat file data mining projects involving four data projects (each with a different objective function).
If you haven’t used Bash before, feel free to skip the projects and get to the tutorials part . Complete the tutorials first and then come back to the projects again. The tutorial section will introduce with bash scripting, regular expressions, AWK, sed, grep and so on. The course finishes with a near-complete list of references to all the relevant command line and Big data tools.
Target audience and prerequisites: Almost everyone can benefit from learning to use Bash particularly in data mining: particularly students who want to learn Bash and the command line to improve their career prospects, researchers who want to add Bash and other command line tools to their bag of tricks, scientists who want to learn to explore and analyze the data that their lab generates.