Unlock Analytics

Python for HR : Python Libraries

Python for HR : Libraries

Hello and welcome to this series of Python for HR.

Just started with python in HR? Having confusion about which libraries to use. Don’t worry we got you there. You just focus on your learning. 

Let’s start with some useful libraries you are going to use on a day to day basis for data analysis in HR.

There are plenty of libraries and different functions, but what you are going to use!!

So every analysis starts with basic mathematics and statistics. The most popular library for these functions is given below.

Core libraries for basic Mathematics and Statistics 

  • Numpy

Traditionally, we start our list with the libraries for scientific applications, and NumPy is one of the principal packages in this area. It is intended for processing large multidimensional arrays and matrices, and an extensive collection of high-level mathematical functions and implemented methods makes it possible to perform various operations with these objects.

  • Pandas

Pandas is a Python library that provides high-level data structures and a vast variety of tools for analysis. The great feature of this package is the ability to translate rather complex operations with data into one or two commands. Pandas contain many built-in methods for grouping, filtering, and combining data, as well as the time-series functionality. All of this is followed by impressive speed indicators.

  • StatsModels

Statsmodels is a Python module that provides many opportunities for statistical data analysis, such as statistical models estimation, performing statistical tests, and new multivariate methods – factor analysis, MANOVA, and repeated measures within ANOVA. etc. With its help, you can implement many machine learning methods and explore different plotting possibilities.

  • SciPy

Another core library for scientific computing is SciPy. It is based on NumPy and therefore extends its capabilities. SciPy main data structure is again a multidimensional array, implemented by Numpy. The package contains tools that help with solving linear algebra, probability theory, integral calculus, and many more tasks.

Visualization 

  • Matplotlib

Matplotlib is a low-level library for creating two-dimensional diagrams and graphs. With its help, you can build diverse charts, histograms and scatter plots to non-Cartesian coordinates graphs. Moreover, many popular plotting libraries are designed to work in conjunction with matplotlib.

  • Seaborn

Seaborn is essentially a higher-level API based on the matplotlib library. It contains more suitable default settings for processing charts. Also, there is a rich gallery of visualizations including some complex types like time series, joint plots, and violin diagrams.

  • Plotly

Plotly is a popular library that allows you to build sophisticated graphics easily. The package is adapted to work in interactive web applications. Among its remarkable visualizations are contour graphics, ternary plots, and 3D charts.

  • Bokeh

The Bokeh library creates interactive and scalable visualizations in a browser using JavaScript widgets. The library provides a versatile collection of graphs, styling possibilities, interaction abilities in the form of linking plots, adding widgets, and defining callbacks, and many more useful features.

Bokeh can boast with improved interactive abilities, like a rotation of categorical tick labels, as well as small zoom tool and customized tooltip fields enhancements.

Data Scraping

  • Scrapy 

Scrapy is a library used to create spiders bots that scan website pages and collect structured data. In addition, Scrapy can extract data from the API. The library happens to be very handy due to its extensibility and portability.

Machine Learning

  • SciKit-learn

This Python library is based on NumPy and SciPy is one of the best libraries for working with data. It provides algorithms for many standard machine learning and data mining tasks such as clustering, 

regression, classification, dimensionality reduction, and model selection.

  • XGBoost/CatBoost/LightGBM

Gradient boosting is one of the most popular machine learning algorithms, which lies in building an ensemble of successively refined elementary models, namely decision trees. Therefore, there are special libraries designed for fast and convenient implementation of this method. Namely, we think that XGBoost, LightGBM, and CatBoost deserve special attention. They are all competitors that solve a common problem and are used in almost the same way. These libraries provide highly optimized, scalable and fast implementations of gradient boosting.

Our next article in this series of Python for HR is Functions.