Home | Krugle - Enterprise Scale Software Analysis and Search

Chapter 1 Quickstart

This section describes how content is crawled and indexed by Krugle Enterprise. Once your data is crawled and indexed, it can then be searched by your organization's users using the client features in Krugle Enterprise. Krugle "Projects" define the collections of content that are searchable with Krugle Enterprise. Krugle Projects can be defined by an Administrator using the Krugle Administration Console.

Projects can refer to a single file or any/all files in one or more data repositories. A data repository can be a file system, a source code management system, an issue tracking system, a database or any other information system that can be accessed via a Krugle SCM Connector.

Factors to consider when setting Krugle Projects

The first step in setting up your Krugle Projects is deciding which content you want to define in Krugle Enterprise. Some of the factors that should be considered in defining individual Krugle Projects include:

Access control requirements.
Update (change) frequency of the content.
The need to easily distinguish one body of content from a similar (branched/forked) set of files when searching with Krugle.
Granularity of file modification and user access reports.
Whether or not files from certain sources should listed ahead of (or below) similar files in Krugle search results.
The need to easily filter search results by specific versions of version managed content.
The need to identify persons or groups responsible for the creation, maintenance or protection of content.
Descriptions that will best result in a match between user queries and the content.

Importance of defining Krugle Projects

The following scenarios illustrate how Projects in Krugle can be defined to ensure good search and reporting results:

When Krugle users search on content, they can limit their search to include or exclude specified Projects.
When Projects are defined, the Project creator can specify access control settings for each project. This ensures that only those users with proper credentials will be able to access content in the project.
The Krugle overview page is organized by Projects. This allows you to see project activity trends related to the Projects you define.
You can define a Project to control the scope of one or more of the reports in Krugle Enterprise.
When users search on code they will see files from "boosted" Projects near the top of the list. Administrators can set the Project Rank priority to increase (or decrease) the visibility of selected Projects. This allows companies to ensure that code libraries, reuse components and other valued content collections have appropriate visibility in search results.
If organizations have multiple releases or versions of a particular code or content, they can uniquely define each version as a separate Project. When you do this, it is best to include the version identifier in the Krugle Project Name to ensure that differences between projects will be clearly understood by users who will later find this information.
Each content record or file has an easily accessible hypertext link to project information. This link gives users quick access to descriptions and references that help them fully understand the context of the content found in Krugle. As a result, users can quickly locate related source content, relevant documentation, and contacts for the content in question.
User activity (views, downloads, etc.) can be aggregated by Project to monitor activity for reporting.
Each Project contains a description; this description is used to assist in matching search results to content. The more unique information entered into a project description, the more likely users are to create relevant matches for the unique terms in the project description.

Projects are most commonly defined as collections of standalone content files, code or data records or libraries of content. These groups of files are commonly interrelated and maintained and accessed as a set of files. Most importantly, these file groupings fit the logical context and expectations of users who will be using Krugle.

Once you've decided how to organize your code into projects, collect the information needed to define each Project. At a minimum, the following information is required for each Project:

A Project name that will allow you and your users to identify your content group in Krugle.
Information (network domains, transfer protocols, access credentials, etc) that Krugle Enterprise needs to access the systems that manage the files that will be included in the project.
A Key Name description of the Project.

Note

If you lack the time or information needed to group content collections into individual projects, the simplest way to make your information searchable in Krugle is to associate a single Krugle Project with all files in each file system, repository or version control system.

The downside of having only one Project per content repository is that less information (in the form of project metadata) is available to create effective queries and refine code search results.

Note

You can add or remove Projects at any time after the initial configuration. This will allow for progressive refinement of Projects managed by Krugle Enterprise.

Setting up SCMI Connectors

It is first necessary to configure and install the appropriate SCM Connector on a host system within your organization's network. Consult the Krugle SCM Connector Deployment Guide for more information.

Krugle Enterprise Data Access Mechanisms

For a given data repository, Krugle Enterprise accesses files using a Krugle SCM Connector. The SCM Connector approach is not limited to specific SCM systems, as it can provide access to file systems, issue tracking systems, and other non-SCM sources of data. SCM Connectors run outside of the Krugle Administration Console, and are installed and configured separately. The only information required by Krugle Enterprise for a Data Repository is the information needed to access the SCM Connector.

Krugle Basic Note - To download and install SCMI connectors, please contact Krugle Basic Support.

Defining Projects in Krugle Enterprise

This section explains how Krugle Projects are defined, using either the Krugle Enterprise UI (interactive) or via a mass import file.

Interactive Entry of Project Information

The easiest way to define Krugle Projects is to manually enter their specifications, one project at a time. This is the interactive approach, versus the mass import approach which is described below.

Creating a New Krugle Project

To define a Krugle Project interactively, first sign in to the Krugle Enterprise Console and navigate to the Projects section:

Sign in to the Krugle Administration Console from an internet browser.
Enter the host name URL assigned to Krugle Enterprise, with ":8080" or ":admin" appended to the end of the URL. The Sign in dialog of the administration console will appear.
Enter valid administration credentials (e.g. those specified during initial installation of Krugle Enterprise) to Sign In to Krugle.
Click the Projects tab.
Click the "Add New Project" link located in the upper right corner of the Projects Summary page

Specify Project Metadata

From the Add/Edit Project page enter a name for your Project.
OPTIONAL: Click the Advanced Settings link to access optional metadata fields for your Project. Complete the information in these fields to control how project results are ranked in Krugle search results and to help users access key online information related this project. This section also allows you to specify access control rules for information in the Data Repository and set the frequency of automatic updating. See the Project Fields below for a description of each field.
Click the Next button.

Add a Data Set to your Krugle Project

A Krugle Project consists of one or more Data Sets. A Krugle Data Set is defined as reference to (i) a single Data Repository and (ii) a Data Set Location within that Data Repository.

After specifying the Project Name and optional metadata, you must define the Data Sets for the Project. The first step in adding a Data Set is to specify a Data Repository.

If you haven't created a Data Repository for the data you want searched in Krugle, Select "Create New Data Repository" from the Data Repository dropdown list. If you wish to use a Data Repository that you've already created, select it from the Data Repository dropdown list.
Then, follow the appropriate instructions below for either (i) a new Data Repository or (ii) an existing Data Repository.

Creating a Data Set with A New Data Repository

When creating a new Data Repository, set the fields as described by the Data Repository Fields table.

Enter the "Data Repository Host Location".
Click the "Next" button
Enter the Data Repository name, the optional Login/Password information, the Path and the Port.
Select the Connection type.
Click the "Save" button.

Once the Data Repository has been created, follow the steps described below, by first selecting this Data Repository from the dropdown list.

Creating A Data Set with An Existing Data Repository

If the Data Repository that you want to use for your Data Set has already been defined in Krugle:

Select the Data Repository name from the dropdown list in the "Add a Data Set to this project" area.
Enter the Data Set specification in the field(s) beneath the name just selected, as described by the Data Set Fields table.
Click the "Add Data Set" button. This will submit and add this Data Set to the Project - as indicated by a list entry in the upper "Data Sets" section of the Add/Edit project page. Note that more than one Data Set can be added to each named Project in Krugle.

OPTIONAL Project Information Mass Import

The Mass Import feature allows an Administrator to upload the definitions for multiple Projects with a single action. It is recommended that you verify proper operation of Krugle Enterprise and familiarize yourself with the interactive Project definition (previous section) before using the Mass Import feature. It is also recommended that when importing a large number of projects that you divide the mass import project collection into smaller groups - organized by repository. Start by importing several projects in a single file and increase the number of projects per mass import file as you progress.

To use Mass Import:

Create a list of the Projects that you want Krugle Enterprise to manage.
For each Project on the list, collect the Project Information described in the next section, "Project Information".
Assemble all Project information in a mass import csv file. An Excel template is available. To access the template: the Projects section, click the Mass Import button and then click the "Download Sample Import Demo" link. Open the template in Excel, enter project information and then choose "Save as..". Select CSV and respond "Yes" when asked if you want to keep the workbook in the CSV format (and leave out incompatible features).
Convert/save the table as a .CSV format (if needed).
Click the Projects tab.
Click the Mass Import button.
Click the Browse button and specify the .csv file that contains the Project in-formation.
Click the Import button.

Sample mass import file

We have provided a sample mass import file to use for testing, and as a template for your own projects. To use this Sample Mass Import file:

Download the Sample Mass Import file.
Optionally edit the file.
Click the Projects tab.
Click the Mass Import button in the upper right portion of the page.
Click the Browse button and select this file.
Click the Import button.

Note

If a Mass Import file contains an exact duplicate of a Project that is already defined in Krugle, the instance of the duplicated Project in the Mass Import file will be ignored during the Mass Import process. If a Mass Import file contains an instance of a Project that is already defined in Krugle, and the file's information differs from what already exists, the information from Mass Import file will be used to update the Project.

Mass Import Project Information

Information about each field can be find in the Project Fields , Data Repository Fields, and Data Set Fields tables. These are all of the required fields:

Project Name
Data Repository Name
Alias for Data Set

If the Data Repository Name has not already been created by the Krugle Enterprise Administrator, then the following additional fields are required in the first row of the Mass Import file that uses the (new) Data Repository Name (subsequent rows can leave these fields blank):

Host
Root path
Port
Connection Type

Project Fields

This table is a complete list of all fields that can be defined for a Project. These fields are in addition to the one or more Data Sets that specify what content will be indexed as part of the Project.

Field	Mass Import Column Name	Description
Project name	Project name	A name that uniquely identifies a collection of content in Krugle Enterprise. This Project name can be used as a query filter by the end user and will be used in Project based reports and analysis. Whenever possible, use a descriptive name that will be familiar to users. A unique project name is required for each Project. Note: Krugle Project names are NOT case sensitive.
Rank priority	Rank	(Optional, defaults to NORMAL) When users search on code they will see files from "boosted" Projects near the top of the list. Administrators can set the Project Rank priority to increase (or decrease) the visibility of selected Projects. This allows companies to ensure that code libraries, reuse components and other valued content collections have appropriate visibility in search results. Setting the Rank to IGNORE will prevent the project from showing up in search results. Possible values are IGNORE, LOW, NORMAL, and HIGH.
Crawl frequency	Crawl frequency	(Optional, defaults to DAILY) How often the SCM Connector for each Data Set is queried to get updates. Possible values are TWICE_AN_HOUR, HOURLY, TWICE_A_DAY, DAILY, TWICE_A_WEEK, WEEKLY, TWICE_A_MONTH, MONTHLY, and ONCE.
First Update	First Update	(Optional, defaults to the Project's creation time) Sets the earliest time for the first update after the initial sync. This is useful for delaying the first update until after a very large project has finished its initial crawl, as otherwise updates can "stack up" as they wait for the initial crawl to complete.
Description	Description	(Optional, defaults to empty) This is a human readable description of the content in this Project. A one to two paragraph summary of the Project's capabilities, technologies, related Projects, Project dependencies, etc. will help future users of the Project better understand and use the information contained in this Project. The use of unique terms in the description will improve search matching for those unique terms.
Homepage URL	Homepage URL	(Optional, defaults to empty) The homepage or project page for this Project. Use this optional URL reference to provide users with one-click access to non-code related information, the Project wiki, etc.
Documentation URL	Documentation URL	(Optional, defaults to empty) This reference URL can be used to direct users to specifications, reference documentation and similar Project documents.
Knowledgebase URL	Knowledge Base URL	(Optional, defaults to empty) This reference URL can provide users with a shortcut to an appropriate knowledge base from the Project description page. The knowledge base can reference information such as development notes, hints, tips or discussions.
Bug database URL	Bug database URL	(Optional, defaults to empty) This reference URL can provide users with a shortcut to the Project bug database from the Project description page
Owner	Owner	(Optional, defaults to no owner) A person who can be contacted with questions or issues about this Project. Usually, it is recommended that you enter the email address of the person responsible the Project.
License	License	(Optional, defaults to no license) A code license type to be associated with all files in the Project. Typically this is used for open source code that your organization is using, so that users will know what restrictions are placed on the code that they find.
Access Control	Access control	(Optional, defaults to --Unrestricted--) This setting specifies the groups (typically LDAP-based) that have access to this Project. In order to see or access files in a particular Project, a user must belong to one or more groups listed in this setting. The default setting (--Unrestricted--) will allow all users to access the Project.
<not supported during manual definition>	Key name	(Optional, defaults to a hash code based on the Project's required fields) The unique identifier for the project. This can be up to 128 characters - lower case letters, numbers, underscore and hyphen are allowed.
<not supported during manual definition>	Disable	(Optional, defaults to false) Disabling a project prevents it from being updated, and removes it from search results. To disable a project, first create it, then select it in the Projects Summary list, and click the Disable button at the top of the list.

Data Repository Fields

This table is a complete list of all fields that can be defined for a Data Repository.

Field	Mass Import Column Name	Description
Data Repository Host Location	Host	The network address for the server that hosts the SCM Connector (for example, localhost or 192.169.25.231).
Data Repository name	Data Repository name	Unique name to identify the data repository.
Login	Login	(Optional) If the SCM Connector is configured to require authentication, this is the user name required when logging into the SCM Connector.
Password	Password	(Optional) If the SCM Connector is configured to require authentication, this is the password required when logging into the SCM Connector.
Path	Root path	The path portion of the URL used to access the SCM Connector. By default this will be "/repository", unless the SCM Connector has a custom configuration.
Port	Port	The port used to access the SCM Connector. By default this will be 80 for HTTP, 443 for HTTPS, and 22 for SSH, unless the SCM Connector has a custom configuration.
Connection type	Connection type	The protocol used to talk to the SCM Connector. Options are HTTP, HTTPS, and SSH.

The "Data Repository type" field is only used by older versions of Krugle (V5 and earlier) during mass import, and if it exists in a mass import file being processed by Krugle V6 or later, it must be empty or set to "SCMI".

Data Set Fields

This table is a complete list of all fields that can be defined for a Data Set. A Data Set is based on a Data Repository, with additional fields to specify which project or sub-set of data available via the Data Repository is part of the Data Set.

Field	Mass Import Column Name	Description
Location	SCMI Data Set Location	This field defines the location of the Data Set's content within the Data Repository. For some Data Repositories it might not be required - see SCM Connectors for per-SCM Connector details.
Location Alias	Alias for SCMI Data Set Location	This is currently not supported by Krugle V5 or V6.
Parameter	Params for SCMI Data Set Location	This field defines additional parameters that are used when querying the Data Repository for the Data Set's content. For some Data Repositories it might not be required - see SCM Connectors for per-SCM Connector details.
Alias	Data Set name	The name of this Data Set (unique per-project). Up to 64 characters - lower case letters, numbers, underscore and hyphen are allowed. This is used as part of the URL path to Data Set files.

The "Data Set Location" field is only used by older versions of Krugle (V5 and earlier) during mass import, and if it exists in a mass import file being processed by Krugle V6 or later, it will be ignored.

Krugle Administration Guide V6

Chapter 1 Quickstart

Factors to consider when setting Krugle Projects

Importance of defining Krugle Projects

Note

Note

Setting up SCMI Connectors

Krugle Enterprise Data Access Mechanisms

Defining Projects in Krugle Enterprise

Interactive Entry of Project Information

Creating a New Krugle Project

Specify Project Metadata

Add a Data Set to your Krugle Project

Creating a Data Set with A New Data Repository

Creating A Data Set with An Existing Data Repository

OPTIONAL Project Information Mass Import

Sample mass import file

Note

Mass Import Project Information

Project Fields

Data Repository Fields

Data Set Fields

Admin Guide Menu