A Guide to Building a Comprehensive Business Directory
An online business directory is a website component which lists businesses or entities within market segments or niche-based categories. The online Yellow Pages is a good example of a directory. Directory entries can be compiled manually or through an automated process. Individual entries are often linked to the listing company’s website. There are thousands of business directories on which local businesses can create listings.
DATA FIELDS
The details in one business directory versus another may vary. However, common attributes or fields include business name, address, telephone numbers, and website URL. Additional fields could include a business description, email address, operating hours, and video. Some directories may include a contact form as a channel of communication between the viewer and listing provider.
Directories can supplement search engines to provide a more integrated and relevant response to search queries. Current maintenance of business directories can improve the information gathered and contained in large search engines as they crawl the business directories.
Comprehensive and complete directories provide more relevant query results. However, creating a comprehensive directory can be extremely challenging and resource intensive. Small businesses must examine feasible ways to optimize the process without cutting corners. The remainder of this article focuses on an example to programmatically create a comprehensive directory.
DETERMINE SEARCH TERMS
First, we need to determine the proper search terms used to gather the directory data. Experimentation with one or more search engines can help determine when relevant results are being returned. Doing this up front can save much time in the latter stages of this process.
DETERMINE LOCATIONS
Next, we need to determine the locations of interest. Locations should be organized in a hierarchy that goes from broad to specific locations. For example, a hierarchy could be country, territory, region, sub-region, metro area, city, district, neighborhood, and address. Hierarchies and organization can vary across the globe. Cities are a common reference that tends to be an appropriate level for organization of information.
Compiling the list of locations from which to search may be challenging. Ideally, locations should be geometrically spaced evenly across the area of interest. However, that may not be achievable in a real-life application. Data sources exist (e.g., spreadsheets) that contain information that may be used. The location data used for the creation of the ETR directory contains over 23,000 locations across the world.
COMPILE THE DATA
After determining the search terms and locations, we need to compile the raw information. Code and use of a search engine using special techniques provide an excellent means to do this. Search engines can be very effective because the information returned is consistently displayed in a common location (i.e., same HTML tag). All the desired fields may not be available so a multi-step process to get the desired information may be necessary.
For example, if one navigates to Google Maps and searches on “escape rooms”, they should see results returned relative to the default search location in the individual’s web browser. The business name, rating, numbers of reviews, business category, and address are identified for all business showing up in the results in a window referred to as the Google Search window.
If the individual clicks on one of the businesses listed, the results are modified to the Google Place window which shows additional information such as website URL, phone number, and business hours.
Other methods can collect remaining information. Code can be developed to gather the necessary information (e.g., email addresses) from the business website. This can get tricky because of data consistently and location of data within websites. Scanning across the website hierarchy may be useful. This code must have a more generic search routine to explore and pull the information.
Application Programming Interfaces (APIs) are also available to collect data. Queries are performed that alleviate the effort of manually gathering needed data from a site. However, APIs with useful information are rarely free. Google recently updated their APIs and pricing structure on a Pay-As-You-Go-Pricing, but beware that many data calls can get quite expensive. We went through $500 in 1/2 a day, but fortunately this only involved the consumption of free credits! Needless to say, we abandoned Googles APIs at that point.
DATA CLEANUP
Finally, the resulting data set is cleaned. Some results may not be relevant content. Duplicate information may also exist. Code can perform some of the cleaning, but some data may need to be examined contextually. A balanced approach of “good enough” can be applied where one has approached a point of diminishing returns. Any remaining non-relevant data can be removed later.
Note that global or multi-regional data may contain multiple languages. If translation into a single language is desirable, Google Translate can be handy.
Keep in mind that data from any data source, including the best search sources, may not be completely clean or up to date. Once businesses claim and keep the listings current, the data reaches its best state of completeness and relevancy.
See the results of this effort in our directory eBook:
Have you built a directory? Did we miss anything? Please comment below (must be logged into your account) and let us know your thoughts!