How to use Python to monitor and measure website performance

Over the previous month, Google has introduced a lot of enhancements to how they may measure person expertise utilizing KPIs for pace and efficiency.

Coincidentally, I used to be engaged on making a Python script that makes use of the Google PageSpeed ​​Insights (PSI) API to gather metrics for a number of pages directly, with out having to run a check on each single URL.

After Google bulletins, I assumed now can be a superb time to share it and likewise clarify how one can create this Python script for newbies.

The perfect factor about scripting is that when you get the hold of the fundamentals, you possibly can extract a lot of totally different metrics that may be present in each the web page pace check and the Beacon evaluation.

Introduction to Very important Indicators of the Community

In early Might, Google launched Core Net Vitals, that are a subset of the Key Net Vitals metrics.

These metrics are used to offer suggestions for the standard of the person expertise on the web site.

Google described them as a strategy to “assist quantify your website’s efficiency and establish alternatives for enchancment,” additional highlighting their shift in direction of person expertise.

Core Net Vitals are user-centric metrics that measure key features of the person expertise; Charging time, interactivity and stability.

I will not go into particulars explaining this on this submit, you could find extra data right here, however these new metrics are:

  • The best content material of the portray.
  • Delay on first login.
  • Cumulative design change.

LCP - FID - CLS

Along with this, Google introduced final week that they are going to be introducing a brand new search rank token that may mix these metrics with present web page expertise tokens, akin to cellular compatibility and HTTPS safety, to make sure they proceed to serve web sites. prime quality for customers

Efficiency monitoring

This replace is anticipated to be launched in 2021 and Google has confirmed that no rapid motion is required.

Nevertheless, to assist us put together for these modifications, they up to date the instruments used to measure web page pace, together with PSI, Google Lighthouse, and the Google Search Console Pace ​​Report.

The place does the Pagespeed Insights API are available?

Google PageSpeed ​​Insights is a helpful abstract viewer of net web page efficiency that makes use of area and laboratory knowledge to generate outcomes.

It is a nice strategy to get an summary of a number of URLs as it’s used web page by web page.

Nevertheless, in case you are on a big website and wish large-scale data, the API may be helpful for parsing a number of pages on the similar time, with out having to individually hyperlink URLs.

Python script to measure efficiency

I created the next Python script to measure KPIs at scale to avoid wasting time spent manually testing every URL.

This script makes use of Python to ship requests to the Google PSI API to gather and retrieve metrics displayed in each PSI and Lighthouse.

I made a decision to jot down this script on Google Colab as it’s an effective way to get began writing in Python and makes sharing simple, so this submit will stroll by way of the settings utilizing Google Colab.

Nevertheless, it will also be run domestically, with some tweaks to obtain and add knowledge.

You will need to notice that among the steps can take a while, particularly when every URL is executed by way of an API, in order to not overload it with requests.

This manner you possibly can run the script within the background and return to it after finishing the steps.

Let’s check out the steps required to run this script.

Step 1. Set up the required packages

Earlier than we begin writing any code, we have to set up some Python packages which are required earlier than we are able to use the script. They’re simple to put in utilizing the import perform.

We want the next packages:

  • URLLIB: Work, open, learn and parse URLs.
  • JSON: Means that you can convert a JSON file to Python or a Python file to JSON.
  • petitions: HTTP library for sending all types of HTTP requests.
  • reminderA: Primarily used for knowledge evaluation and manipulation, we use it to create knowledge frames.
  • timeA module for working with time, we use it to offer a time interval between requests.
  • accounting: From Google Colab, this can assist you to add and obtain information.
  • I.O.: Default interface for accessing information.
# Import required packages 
import json
import requests
import pandas as pd
import urllib
import time
from google.colab import information
import io

Step 2: arrange your API request

The following step is to configure the API request. Full directions may be discovered right here, however in essence the command will seem like this:

urllib.request.urlopen and add it to a variable named (notice that this technique is for changing and loading JSON information into Google Colab).

Step 4: learn the JSON file

The JSON file will often seem like this when opened in a code editor of your alternative.

Read the JSON file

It is a bit tough to know, however utilizing an interactive JSON viewer will assist you to flip it right into a readable tree view.

readable tree view

The JSON file shows the sphere knowledge that’s saved in loadingExperience and the lab knowledge that you could find within the lighthouseResult.

To extract the metrics we wish, we are able to use the JSON file format, as we are able to see that the metric is below every part.

For instance, the primary login delay is in loadingExperience.

The first login delay is in loadingExperience

To this point, the primary significant paint is on lighthouseResult.

The first happy painting under the lighthouse Result

Many different metrics are saved through the lighthouseResult audit, akin to:

  • Pace ​​index.
  • The primary glad image.
  • Cumulative design change.

Step 5: load the CSV and save as a Pandas dataframe

The following step is to obtain the URL CSV file that we need to run by way of the PSI API. You possibly can create a listing of your website’s URLs utilizing a crawling instrument like DeepCrawl.

Since we’re utilizing an API, I’d advocate utilizing the smaller instance url given right here, particularly when you have a big website.

For instance, you should use the pages with the best site visitors or the pages that generate probably the most income. Alternatively, in case your website has templates, they are perfect for testing them.

You possibly can add as nicely (notice that this technique is for importing CSV information to Google Colab).

As soon as that is loaded, we are going to use the Pandas library to transform the CSV to DataFrame, which we are able to stroll by way of within the subsequent steps.

# Get the filename from the add so we are able to learn it right into a CSV.
for key in uploaded.keys():
  filename = key
# Learn the chosen file right into a Pandas Dataframe
df = pd.read_csv(io.BytesIO(uploaded[filename]))

df.head()

The DataFrame will seem like this beginning at index zero.

data frame

Step 6: retailer the ends in a response object

The following step includes utilizing an ax in a variety that may symbolize the URLs we loop by way of, in addition to a response object that forestalls URLs from overriding one another as we go and permits us to avoid wasting knowledge for future use.

Right here we will even use the column header variable to outline the URL request parameter earlier than changing it to a JSON file.

I additionally set the timeout right here to 30 seconds to cut back the variety of consecutive API calls.

Alternatively, you possibly can add an API key to the tip of the URL command if you wish to make requests sooner.

The indentation can also be vital right here, as a result of since every step is a part of a for loop, they have to be indented throughout the command.

Step 7: create an information body to carry responses

We additionally have to create a DataFrame that shops the metrics that we need to extract from the response object.

A DataFrame is a table-like knowledge construction with columns and rows that retailer knowledge. We simply want so as to add a column for every metric and identify it appropriately, like this:

# Create dataframe to retailer responses
df_pagespeed_results = pd.DataFrame(columns=
          ['url',
          'Overall_Category',
          'Largest_Contentful_Paint',
          'First_Input_Delay',
          'Cumulative_Layout_Shift',
          'First_Contentful_Paint',
          'Time_to_Interactive',
          'Total_Blocking_Time',
          'Speed_Index'])  

print(df_pagespeed_results)

For this state of affairs, I used Core Net Very important metrics in addition to the extra obtain and interactivity metrics used within the present model of Lighthouse.

These metrics are weighted in a different way, that are then used within the total efficiency rating:

You possibly can study extra about every metric, in addition to the way to interpret the scores, on their particular person touchdown pages above.

I additionally determined to incorporate a pace index and a basic class that may present sluggish, medium, or quick outcomes.

Step 8: extract metrics from the response object

After saving the response object, we are able to filter it and extract solely the metrics we want.

Right here, once more, we are going to use a for loop to loop by way of the response object file and set a sequence of record indices to return solely sure metrics.

To do that, we are going to outline the identify of the DataFrame column in addition to the particular class of the response object from which we are going to extract every metric for every URL.

for (url, x) in zip(
    response_object.keys(),
    vary(0, len(response_object))
):

        # URLs
        df_pagespeed_results.loc[x, 'url'] =
            response_object[url]['lighthouseResult']['finalUrl']

        # Total Class
        df_pagespeed_results.loc[x, 'Overall_Category'] =
            response_object[url]['loadingExperience']['overall_category']   

        # Core Net Vitals     

        # Largest Contentful Paint    
        df_pagespeed_results.loc[x, 'Largest_Contentful_Paint'] =
        response_object[url]['lighthouseResult']['audits']['largest-contentful-paint']['displayValue']

        # First Enter Delay 
        fid = response_object[url]['loadingExperience']['metrics']['FIRST_INPUT_DELAY_MS']
        df_pagespeed_results.loc[x, 'First_Input_Delay'] = fid['percentile']

        # Cumulative Format Shift    
        df_pagespeed_results.loc[x, 'Cumulative_Layout_Shift'] =
        response_object[url]['lighthouseResult']['audits']['cumulative-layout-shift']['displayValue']

        # Extra Loading Metrics 

        # First Contentful Paint 
        df_pagespeed_results.loc[x, 'First_Contentful_Paint'] =
        response_object[url]['lighthouseResult']['audits']['first-contentful-paint']['displayValue']

        # Extra Interactivity Metrics 

        # Time to Interactive  
        df_pagespeed_results.loc[x, 'Time_to_Interactive'] =
        response_object[url]['lighthouseResult']['audits']['interactive']['displayValue']

        # Whole Blocking Time   
        df_pagespeed_results.loc[x, 'Total_Blocking_Time'] =
        response_object[url]['lighthouseResult']['audits']['total-blocking-time']['displayValue']

        # Pace Index
        df_pagespeed_results.loc[x, 'Speed_Index'] =
        response_object[url]['lighthouseResult']['audits']['speed-index']['displayValue']

I configured this script to retrieve the important thing metrics I discussed above so as to instantly use it to gather this knowledge.

Nevertheless, a lot of different helpful metrics may be extracted, which may be present in each PSI assessments and Mayak evaluation.

That is the place the JSON file is beneficial for checking the place every metric is within the record.

For instance, when extracting metrics from Lighthouse audits, such because the Time to Interactive show worth, you need to use the next:

df_pagespeed_results.loc[x, 'Time_to_Interactive'] =
response_object[url]['lighthouseResult']['audits']['interactive']['displayValue']

Once more, it is very important make it possible for every of them falls right into a loop, in any other case they won’t be included within the iteration and just one consequence will likely be generated for the URL.

Our ultimate DataFrame will seem like this;

final data frame

Step 9: convert DataFrame to CSV file

The ultimate step is to create a abstract file to gather all the outcomes in order that we are able to convert it to a format that we are able to simply parse, akin to a CSV file.

abstract = df_pagespeed_results

df_pagespeed_results.head()

#Obtain csv file 
abstract.to_csv('pagespeed_results.csv')
information.obtain('pagespeed_results.csv')

Str.substitute technique for every column.

#Change the 's' with a clean house so we are able to flip into numbers
df_pagespeed_results['Largest_Contentful_Paint'] = df_pagespeed_results.Largest_Contentful_Paint.str.substitute('s', '')
df_pagespeed_results['First_Contentful_Paint'] = df_pagespeed_results.First_Contentful_Paint.str.substitute('s', '')
df_pagespeed_results['Time_to_Interactive'] = df_pagespeed_results.Time_to_Interactive.str.substitute('s', '')
df_pagespeed_results['Total_Blocking_Time'] = df_pagespeed_results.Total_Blocking_Time.str.substitute('ms', '')
df_pagespeed_results['Speed_Index'] = df_pagespeed_results.Speed_Index.str.substitute('s', '')

Then we are going to use Picture credit

All screenshots taken by the writer, June 2020