Python is a very powerful open source programming language that supports a wide range add in libraries. One particular library that is great for data analysis and ETL is Pandas. Pandas can be used for data preprocessing (cleaning data, fixing formatting issues, transforming the shape, adding new columns or calculations, etc.).
Python Defined by Wikipedia:
Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than possible in languages such as C++ or Java. The language provides constructs intended to enable writing clear programs on both a small and large scale.
Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. It features a dynamic type system and automatic memory management and has a large and comprehensive standard library.
Pandas Defined by Wikipedia:
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. pandas is free software released under the three-clause BSD license.
Setting Up Python
Download the Anaconda distribution of Python. It comes with many handy third-party packages that will save you from having to install later (Pandas, Numpy, Scikit Learn, etc.). It also comes with the Jupyter Notebook reader (where you do the actual coding), which is a really good interface for beginners as it allows for easy debugging.
Learning Basic Syntax and Structure
Handy Resources for Python
Training for Python
Automate the Boring Stuff with Python (YouTube video series): https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=11&cad=rja...
See STARS for courses here at Stanford
See Lynda.com via Stanford (free for Stanford employees)