The purpose of this blog is to explain how websites work and how browsers transform the Hyper Text Mark-up Language (HTML) used to describe web pages into the interactive visual representations you see on screen. You will also learn about how web servers create these HTML pages and how web servers execute code to generate the web page dynamically.
How Browsers Work?
As we all know, a browser is software that loads files from a remote server and displays them in a way that makes the web page looks interactive and visually appealing.
- Parsing the page’s HTML
- Understanding the structure and content of the document
- Converting it to a series of drawing operations that the operating system can understand
Now to understand how web works let’s get into the process of the rendering pipeline:
Process of Rendering Pipeline
When the browser receives an HTTP response, it parses the HTML in the body of the response into a Document Object Model (DOM). DOM is a data structure that is used to describe the HTML document as a series of nested elements called DOM nodes, like input boxes, paragraphs of text, etc.
Once the browser generates the DOM before anything can be drawn onscreen, styling rules must be applied to each DOM element. These styling rules declare how each page element is to be drawn—the foreground and background color, the font style, and size, the position and alignment, etc.
To make the web page more presentable developer creates one or more stylesheets and declares how elements on the page should be rendered with an HTML document and will import these stylesheets using <style> tag referencing the external URL that hosts the stylesheet.
- Start processes or access other exiting processes
- Call operating system function
- Access local disks of the system like your pictures, games, etc. It allows websites to store small amounts of data locally, but this storage is abstracted from the file system itself.
- Access the operating system’s network layer
For understanding more about “XSS” check out: https://payatu.com/blog/dom-based-xss/
For proper construction of the web pages, you need two components, which can be broadly categorized into two sections – Static and Dynamic Components. But before we dive into it, let’s have a view on how the web pages are hosted.
What is a web server?
In simple words, a web server is just a computer that stores a website’s component files and a web server program that sends back an HTML page in response to HTTP requests, but now a web server also can generate web page HTML dynamically. It sends back two types of resources Static and Dynamic. Let’s discuss each one by one.
Initially, websites consisted mostly of static resources. Developers code HTML files manually and deploy a copy of that file to the server. So, whenever a user visits that website URL, the browser makes an HTTP request, and the web server follows the URL in the HTTP request.
Static resources resolve URLs based on the configuration file that maps the URL to a particular resource. These types of resources consist of HTML files, image files, or other types of files that the web server returns in HTTP responses.
How does URL resolution work?
Whenever the browser wants to access any resources, it includes the name of the resource in the URL, and the web server returns that file like “/image/black.png.” But now, developers also use a trick to access different resources via the same file path. It is achieved by unlinking the URL from the file path. That is why you see websites have only one path for all users, like “www.abc.in/profile.png” but provide each user with a different profile picture.
Content Delivery Networks
It is designed to improve the delivery speed of static files. In a content delivery network (CDN) developer will store duplicated copies of static resources in different data centers and quickly deliver those resources to browsers from the nearest physical location. The most common CDN providers are Cloudflare, Akamai, or Amazon CloudFront. They mostly send files and images to third-party websites.
Learn more about them here” https://www.cloudflare.com/learning/cdn/what-is-a-cdn/”
Content Management Systems
Content management systems (CMSs) provide authoring tools requiring little to no technical knowledge to write the content. CMSs impose a uniform style on the pages and allow administrators to update content directly in the browser.
Now it is not possible to only use static resources. Imagine a retail website developer has to code up a new web page every time a new user logs in or whenever a new product is added. That is when dynamic resource code is used.
A dynamic resource code loads data from a database to populate the HTTP response. Typically, the dynamic resource outputs HTML, though other content types can be returned depending on the expectations of the browser. With the help of this retail website can implement a single product web page to display many products.
Every time a user views a particular product on a site, the web page extracts the product id from the URL, loads the prices, image, and description from the database, and puts all this data in an HTML page dynamically. Adding new products to the retailer’s inventory then becomes a matter of simply entering new rows in the database. A similar thing happens on bank and stock market websites.
Since it was difficult to update the static content repetitively, the developers wrote code that generated the required content automatically.
Server-Side Languages for Web
Developers must find a way to execute code in the process of evaluating dynamic resources and that’s where web programming languages come into the picture. A web developer has a lot of choices, but he must choose based on his project requirement like:
- Ruby: It is good for large-scale web applications and makes them easy to implement with minimal configuration. The most common framework of ruby is Sinatra
- Python: It’s mostly popular for Data science and scientific computing projects. Web developers have a wide choice of actively maintained web servers to choose from frameworks like (Django and Flask). The diversity of web servers also acts as a security feature because hackers are less likely to target a particular platform
- Java: Java is haunted by its past popularity; legacy applications contain a lot of Java code that runs older versions of the language and frameworks. That is why a lot of developers still use java. Most popular framework of java is spring and Struts.
For understanding attackers against Framework check out:
We also have frameworks for client-side code like Angular. With angular website execute different type of changes in the browser as the page loads, parses the template HTML supplied by the server, and writes directly in DOM.
Templates are often used to build dynamic web pages. Templates are mostly HTML, but it has programmatic logic that contains instructions for the webserver. It mostly contains instructions for the webserver to pull data from the database or HTML request and insert them into the HTML template.
Caching refers to the process of storing a copy of data kept elsewhere in an easily retrievable form to speed up retrieval of that data. Cache can be placed between the Application Server and the Database where the application accesses the data from cache instead of the main datastore, which frequently accesses data in-memory to cut down latency & unnecessary load on it. There is no DB bottleneck when the cache is implemented.
Distributed caches are vulnerable to hacks in the same way that databases are. Thankfully, Redis and Memcached were developed in an age when these kinds of threats were well-known, so best practices are baked into software development kits (SDKs), the code libraries you use to connect with the caches.
That’s all, folks! This blog was aimed at giving a basic introduction to how the browser works and how the web server handles both static and dynamic data.