visit
By using analytical techniques to identify patterns and uncover insights, companies are able to gain actionable knowledge from their data that they can use to inform decisions, identify new opportunities, and develop strategies.
How can you begin to extract insights from your data in order to create data-driven decisions? Exploring your data to discover and comprehend the underlying patterns, information, and facts that can help you make a better decision is the first step.
In this article, you will mainly learn the following:
HarperB has the ability to run on various devices, from the edge to the cloud without duplicating data. What I love about HarperDB is its compatibility with different programming languages such as Python, which is most commonly used for Data analysis. Other programming languages are Javascript, Java, Go, C and NodeJS.
We need to put some sample data into the HarperDB database before we can automate the process of exploring the data. For this tutorial, I will use the loan dataset that is available here.
Step 1: Create a HarperDB Account
The first step is to create an account if you don't have one. You need to visit this link, and then click the navigation bar to see a link called “Start Free”.
Step 2: Create a HarperDB Cloud Instance
The second step is to create a cloud instance to store and fetch your data that will be automatically explored. You just need to click the "Create New HarperDB Cloud Instance" link to add a new instance to your account.
Step 3: Configure the HarperDB Schema and Table
Before you can add the data you want to explore from the database, you must first set up a schema and a table. You only need to load the HarperDB cloud instance you previously built from the dashboard and name the schema (such as “loans”) to complete the process.
Then add a table, such as “customers” that will contain the loan data. HarperDB will further request the hash attribute, which is like to an ID number.
Step 4: Import Loan data to the table
Import the loan data you have downloaded from this to the table you have created (for example- Customers table). Click the file icon for bulk upload from the table interface.
How to access the data from the HarperDB Database
The harperDB database can be integrated with the exploratory data analysis tool to explore and visualize the data. With a feature called Custom Function, HarperDB makes it easy to retrieve the data via API in a very straightforward manner.
1. Enable Custom Functions
Enable Custom functions by clicking “functions” in your HarperDB Studio. This feature is not enabled by default
2. Create your Project
The following step is to create a project by naming it. For example, loan-api-v1. It will also generate project configuration files, including:
Note: You will focus on the routes folder.
3. Define a Route
Create the first route in order to retrieve loan data from the customers' table in the HarperDB Datastore. Also, you should be aware that route URLs are resolved as follows:
In the route file (example.js) from the function page, you can see some template code as an example. You need to replace that code with the following code:
'use strict';
const customValidation = require('../helpers/example');
// eslint-disable-next-line no-unused-vars,require-await
module.exports = async (server, { hdbCore, logger }) => {
// GET, WITH NO preValidation AND USING hdbCore.requestWithoutAuthentication
// BYPASSES ALL CHECKS: DO NOT USE RAW USER-SUBMITTED VALUES IN SQL STATEMENTS
server.route({
url: '/',
method: 'GET',
handler: (request) => {
request.body= {
operation: 'sql',
sql: 'SELECT Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status FROM loan.customers'
};
return hdbCore.requestWithoutAuthentication(request);
}
});
In the code above, the GET method is used to define the route /loan-api-v1. The handler function will then send an SQL query to the database to get all of the data from the customers' table.
The SQL query will fetch information from the following columns in the customers’ table:-4. Use the API URL to access the data
You can now easily access your data using the API URL. For example, my API URL is Open your web browser and paste your URL to view the data from your database.
Sweetviz is an open-source Python library that generates stunning visualizations that are packed with insightful information to get exploratory data analysis off the ground with just two lines of code.
The library can be utilized to create a visual comparison of the dataset as well as the variables. The output application is a completely self-contained HTML file that can be loaded in a web browser.Install Sweetiz
This library can be installed using the following command:pip install sweetviz
Collect data from the API
We use the API endpoint to get to the data, and Pandas library can load and integrate the data with the Sweetviz library.import request
# api-endpoint
URL = "//functions-1-mlproject.harperdbcloud.com/loan-api-v1"
# sending get request and saving the response as response object
r = requests.get(url = URL)
# extracting data in json format
data = r.json()
The code above sends a request to the API endpoint using a python package called request and finally, the data will be extracted in JSON format.
Load data into pandas Dataframe
The extracted data from the API endpoint is then loaded into a DataFrame using the Pandas package.import pandas as pd
#load data using pandas
df = pd.DataFrame(data)
# show top 5 rows
df.head()
# importing sweetviz
import sweetviz as sv
#analyzing the dataset
loan_report = sv.analyze(df)
loan_report.show_html('loan_analysis.html',open_browser=False)
Report loan_analysis.html was generated and we can open it in the web browser.
Let’s open the loan_analysis.html to view the EDA report generated.From the above screenshot, you can see that the largest loan amount was 700 requested by one customer and the smallest loan amount is 9 also requested by only one customer. This insight can be used to help loan officers to make data-driven decisions.
Complete code
Below is the complete code with only 9 lines of code that, when executed, will automatically:-
# import packages to use
import request
import pandas as pd
import sweetviz as sv
# api-endpoint
URL = "//functions-1-mlproject.harperdbcloud.com/loan-api-v1"
# sending get request and saving the response as response object
r = requests.get(url = URL)
# extracting data in json format
data = r.json()
#load data using pandas
df = pd.DataFrame(data)
#analyzing the dataset and render a report
loan_report = sv.analyze(df)
loan_report.show_html('loan_analysis.html',open_browser=False)
As more new data are added to your HarperDB database, you can now just execute the above code to generate a new beautiful EDA report to collect more insight from the data.