Skip to main content

Hi folks,

We’re interested in using the Gainsight Federated Search to extend the search tool to our other websites and resources. 

As I understand it, you supply information about other links using this JSON structure:

    {
"title": "string",
"content": "string",
"url": "string",
"source": "string"
}

My question is, how do you retrieve this information automatically to start with? 

For example, if I want to incorporate our documentation, there are more pages than I want to catalogue manually, plus the content changes on a daily basis.

So…  how do I retrieve the information in an automated process? Using a web crawler? 

If anyone else uses the Federated Search API I’d love to hear how you’re going about this or if you have a recommended crawler/bot.

Cheers

Mark

We’ve been using it since it was in beta 4 years ago and it’s one of my favorite things about Gainsight CC, and it’s even better with the new search improvements. 

However, with the latest enhancements (relevance based rankings), you do very much want to prioritize getting the full content in the content payload. So it’s great you’re asking this question. (Previously we only cared about the ~400 character meta description, but the search algorithm does evaluate all of the content and you can send many thousands of characters via the API. Without all the content, your federated content is a major disadvantage in the relevancy score of the algorithm.)

I recently updated my automation (Zapier) to pull all the content from our documentation site. I’ll share it below, but I’ll note that in order to do this, you need a list of everything that you want to crawl (I’m not sure how to do that; I had the advantage of using the list of what we had previously indexed.) Google Gemini was my thought partner in helping figure out how to automate this with Zapier. 

  1. GET API call to pull in the content from the URL
    • Gemini helped me figure out how to do this, which is possible for publicly available web page
  2. Formatter step to convert to plain text (strip out html)
  3. A few steps to split out the header and footer content on each page, leaving only the relevant content
  4. An AI step to write a short summary (what shows in search) 
  5. A step to combine the AI summary with all of the content (with some spaces in between to ensure the full content doesn’t show as part of the summary)
    • I also have a step to repeat the title 10x in the content to try to juice those keywords in the relevancy score for searches releated to it.
  6. The actual API call to index the content for federated search

Sending the API call is quite tricky because the content is part of a key:value pair in the json payload of the API operation, so things like line breaks and certain characters can cause errors. ​​​​



Here’s some additional considerations: 

 


Hey ​@FMEEvangelist what are those other sources running on in terms of tools/tech?

We do it for our Sharepoint site, and the first step was to pull the data via the Sharepoint API and then structure it to send it to Gainsight via the Gainsight API. All of this is ran through loops using PowerAutomate.


Hey ​@FMEEvangelist what are those other sources running on in terms of tools/tech?

I wish it was all one platform, but it’s a bit of a mix. Knowledge base articles on Zendesk, training materials on Skilljar, and our documentation team uses Madcap I believe.

I use our own automation products (FME, like Power Automate, but a bit more visual) but it still gives me the heebie-jeebies to think about connecting up all these systems for what should be a small-scale project!

But their content is all delivered through various websites, which is why I figured a webcrawler might help. I just don’t have any experience of that 🤷


We’ve been using it since it was in beta 4 years ago and it’s one of my favorite things about Gainsight CC, and it’s even better with the new search improvements. 

Yes, it’s the new search improvements that make us want to integrate the full federated search. Currently we’re integrating Coveo, and it’s caused a few problems.

That’s very useful to know about the content. I wasn’t entirely sure how it all is meant to work under the covers, so knowing I need to get the full content in there is priceless.

I saw your previous post, and it was very insightful too. I figure I can scrape the content and do the automation/API part without much problem (we don’t use Zapier, but still), but it’s generating the list of items to include that has me stumped. 

I can only imagine what our documentation team would say if I asked them to fill in a Google sheet for each new or updated item! I might be able to get notifications from Github, but that isn’t a direction I want to take.

I don’t know why I didn’t ask an AI for suggestions! I will see what it can suggest.


@FMEEvangelist I think all these have their own API that you could use to pull content. The google sheet can be a first step for you to see how it works, but it can all be automated through your FME tool. You just need to ask for the API credentials :-)


Hi ​@FMEEvangelist 👋,

You probably already know but we do have out of the box integration for federated search for Zendesk and Skilljar:

 

So you’d only need to use our Federated Search API for your documentation in Madcap. Does Madcap have an API call where you can retrieve all of the documentation?