11 November 2010

MOSS 2007 - Windows Authentication to FBA using LDAP Authentication Provider

I am managing a web application based on SharePoint 2007, The web application works in an extranet model where users access the application over the internet, There is no anonymous access allowed any only the authenticated user is granted access. We are using the default Windows authentication and Active Directory is the user store. When the users accesses the site the users are given the default Windows Authentication Dialog for username and password as one below



Now since we also have password expiry policy on the domain and for other reasons, as administrator we used to receive a lot of requests for password resets. we thought of providing the users with a Forgot password functionality, but the challenge was as how to provide this in the current scenario as Windows authentication does provide much flexibility in terms of customizing the authentication mechanism and to provide link such as forgot password etc.

Therefore we decided to switch from Windows Authentication to Form Based Authentication so we can have a login page where we can put the forgot password functionality, but at the same time we had to make sure we do not introduce new user names & password or simply putting we had to use the existing user store which was Active Directory.

The solution for forgot password functionality is not part of this article and I will try to put that in a separate post

Since there is only one zone for our web app we didn't find the need to extend the application. Below are the steps we took to achieve this.

1. We put the following configuration in the web.config (System.Web Section) of our web application.



<membership defaultprovider="LdapMembership">
<providers><add name="LdapMembership" type="Microsoft.Office.Server.Security.LDAPMembershipProvider, Microsoft.Office.Server, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71E9BCE111E9429C" server="DomainControllerIP" port="389" usessl="false" userdnattribute="distinguishedName" usernameattribute="sAMAccountName" usercontainer="DC=XYZ,DC=COM" userobjectclass="person" userfilter="(ObjectClass=person)" scope="Subtree" otherrequireduserattributes="sn,givenname,cn">
</add>
</providers>
<rolemanager enabled="true" defaultprovider="LdapRoleProvider" cacherolesincookie="false" cookiename=".PeopleDCRole">
<providers>
<add name="LdapRoleProvider" type="Microsoft.Office.Server.Security.LDAPRoleProvider, Microsoft.Office.Server, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71E9BCE111E9429C" server="DomainControllerIP" port="389" usessl="false" groupcontainer="DC=XYZ,DC=COM" groupnameattribute="cn" groupmemberattribute="member" usernameattribute="sAMAccountName" dnattribute="distinguishedName" groupfilter="(ObjectClass=group)" scope="Subtree">
</add>
></providers>

</rolemanager></membership>




* Please change the Server, Port, UserContainer and GroupContainer according to your environment.

2. Now We put the following configuration in the web.config (System.Web Section) of Central Administration web application.



<membership defaultprovider="LdapMembership">
<providers>
<add name="LdapMembership" type="Microsoft.Office.Server.Security.LDAPMembershipProvider, Microsoft.Office.Server, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71E9BCE111E9429C" server="DomainControllerIP" port="389" usessl="false" userdnattribute="distinguishedName" usernameattribute="sAMAccountName" usercontainer="DC=XYZ,DC=COM" userobjectclass="person" userfilter="(|(ObjectCategory=group)(ObjectClass=person))" scope="Subtree" otherrequireduserattributes="sn,givenname,cn">
</add>
</providers>

<rolemanager enabled="true" defaultprovider="AspNetWindowsTokenRoleProvider">
<providers>
<add name="LdapRoleProvider" type="Microsoft.Office.Server.Security.LDAPRoleProvider, Microsoft.Office.Server, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71E9BCE111E9429C" server="DomainControllerIP" port="389" usessl="false" groupcontainer="DC=XYZ,DC=COM" groupnameattribute="cn" groupmemberattribute="member" usernameattribute="sAMAccountName" dnattribute="distinguishedName" groupfilter="(ObjectClass=group)" scope="Subtree">
</add>
</providers>
></rolemanager></membership>



* Please change the Server, Port, UserContainer and GroupContainer according to your environment.

3. Next thing we did, was to go to Central Administration -> Application Management -> Application Security -> Authentication Provider

4. Select the web application from the right hand side to be your web application.
5. Click on the existing zone name , Default in this case.
6. Do the following Settings in the next screen
  • Authentication Type = Forms
  • Membership Provider Name = LdapMembership
  • Role Manager Name = LdapRoleProvider
7. Next thing you need to do is to reconfigure the Site Collection administrator for your site based on the new membership provider you just configured. For this goto Central Administration -> Application Management -> SharePoint Site Management -> Site Collection Administrators

8. Now the last step left is to reconfigure the SharePoint Security settings you have applied, i.e. where ever you have assigned security to the AD User/Group you have to replace them with the same users but using the new membership and role provider.

9. You are all good to go now, simple access the web application now, and you will be welcomed by a login page.


* The only thing changes in this case is that you no longer need to put domain name in user name, it will be simple username instead of domain\username

26 May 2010

Been busy :S

Past few weeks has been extremely busy and didn't had the time to do research on any new blogs and at the same time I ordered some books from Amazon for my work but its been over a month than their supposed delivery date and I am still waiting, anyways I have promised myself to get my blog a bit more time, so will post something useful soon, in the meantime enjoy my new office's view.

28 March 2010

Database Architecture with SQL Server 2008 - Part 3, File Groups, Files & Partitions

In our last post we discussed the different storage options we have and which is best in what situation. I also said in that article that most of the stuff being discussed there is out of the area of databases itself as storage is mainly handled by the operating system and database literally don't care if it is a physical drive or a network drive as long as it can read and write on it. but saying that, Database's main work is to store and retrieve the data, and in this article we will discuss how SQL Server does that.

SQL Server stores its data in its files, there are two types of these files.
  1. Data File -- Where the actual data is stored. (.mdf, .ndf file extension)
  2. Log File -- Where the transaction log is maintained. (.ldf file extension)
SQL Server organizes its data into a data structure called "Data Pages", each data page is of 8 kilo bytes. A Data file consists of series of such data pages, the actual data is spread across these data pages, if the data in a table row is more that 8K, its spread against multiple data pages, otherwise same data page can be used for multiple rows of data, For Large columns such as varchar(max) or BLOB the data is not stored in the same data page with the rest of data for that row but an address pointer is stored instead which addresses to the data page in which the actual data for that column-row is stored.

SQL Server gives you the flexibility to configure the storage location of these files, which is pretty obvious and its behavior which is not very clear. To understand this first let me explain you what a File Group is

File Group

A file group is a collection of data files that serves as a logical storage unit for database, so when you define the database objects that are physically stored such as Tables & Indexes you specify the File group you are using for the object, There can be multiple files groups associated with a database instance and each file group can have multiple files in it. So the big question is what do we achieve by doing so?

In my previous posts I have already explained that the storage is finite, small files perform better than large files, distributed information is faster to access than information consolidated in one single file. so making all this presumptions basis of our argument its fair to say that if a database has multiple data files, located on different physical drives it will perform better and will have a better capacity planning than a single data file database.

One thing to note here is the Log file does not belong the a file group and normally there is one log file per database. Log file as the name implies, stores the log of all the changes (inserts, updates, deletes) you made in your database, its used by the DBAs to track the changes and rollback if required. Log files are also utilized in certain backups.

How File Groups & Files works

Now as we have understood what File groups and files are, let us focus on how they work. When you define a File group there are two important properties you configure.

  1. Read-Only -- This specifies the data in objects built on this file group cannot change. Normally this is set later on and not at the time of database creation. It is used in the cases of Lookup tables or historical data table where you want to ensure that there is no modifications for such tables.
  2. Default -- There can be only one default File group. At the time of object creation if you do not specify any file group the default one will be utilized. The default file group also contains some meta data about the database which the other file groups does not have.




In the example above I have created two File groups, PRIMARY & SECONDARY. Now once the File groups are defined we will now create files under it.



As you can see there are four files here, 3 of them are data file with file type "Rows Data", while one is a log file. I have associated two files to the PRIMARY file group while the other to the SECONDARY file group. The four important configurations for files are
  1. File Group -- Which file group this file belongs to
  2. Initial Size -- When the database is created what should be the file size by default.
  3. Auto Growth -- How the file grows when the data in the file reaches the file size.
  4. Path -- Where the file will be stored.
In the example we specified the initial size to be 2 MB for the data files and 1 MB for log file. so in the image below you can see the files created with these sizes.


Now once this is defined we are ready to create our tables and other objects. As you can see below that when we are creating an object we can select which file group this object will be created on.

SQL Server handles how the data will be distributed across the files in a file groups and it follows an intelligent round robin method by taking into account the file size and auto growth parameters.

Partitions

Now everything is good here, but what if one of your table has or it could have massive amount of data and this results in performance issues or storage capacity problems. What we have learned that the objects are created on file groups now how can we divide the storage of an object across multiple file groups to resolve the performance issues. This is where Partitions are used. Partition divides a Table across multiple file groups using a systematic scheme.

Partitioning is achieved by three steps.

  1. Partition Function -- This is a function that defines how we will distribute the data into partitions.
  2. Partition Scheme -- Here we tell which file group will be used for each partition created by the partition funciton
  3. Table Partition -- Once we define the above two we create the table on the partition scheme and the rest is taken care by the SQL Server itself.
So now lets take this example, We have a product table which we want to partition, we are dividing the table into two partitions based on its primary key ProductId, We want all the records with primary key less than 50000 to go in PRIMARY File group while the rest should go in the SECONDARY file group. So here how we will do this.

First we will define our Partition function.

Now based on this function we will create the partition scheme

And than assign this partition scheme to our table.

and finally tell the SQL Server that use the ProductId column for partitioning through Partition Column List.

21 March 2010

Database Architecture with SQL Server 2008 - Part 2, Storage

Storage means a device or set of devices that will digitally store the data which can be retrieved later, Storage can be permanent such as Hard Disks, CD/DVD, USB flash drives or temporary like RAM. In this article we will refer to storage as permanent storage devices.
Databases simply cannot exists without a storage, in fact database is a layer between the end user(application) and the physically stored data. So how you setup your storage defines how good your database design is. When a database storage or as a matter of fact any storage is designed, following points must be considered
  • Performance -- How fast is my storage
  • Reliability -- Will I lose data if my storage fails
  • Scalability -- What if I need more storage space
  • Ease of Management -- Can I manage my storage
  • And everyone's favorite "Cost" -- Will the cost of my storage fit my wallet.
Obviously you cannot have everything in one solution, hence you pick the things based on your preferences and compromise on the others. Normally Storage is designed before you setup your database and Storage setup is independent of Database and storage management is done by Operating systems or Storage management softwares and for Databases its pretty much hidden.

Before going into the Enterprise storage solutions let us first look into the difference between a single storage device vs RAID.

Single Device vs RAID

What we use in our daily lives is a single device, such as a hard disk. It can be internal or external. but that's one single device, you can attach another hard disk in your computer but that will be another single device. Both of these hard disks will work independently to each other and for good reasons don't even know the other exists or not and in your operating system they will show up as separate devices. Obviously you can have partitions on it but that's a different story. So whats wrong with these single devices and why do we need anything else, the answer for this you have already read, if not go back to the beginning of my article where I talked about the different things you should be looking in your storage i.e. Performance, Reliability. Scalability, Ease of Management and Cost. We will rule out the ease of management and cost here as single devices offer the best in this, but lets focus on the other important factors.

Performance - Single devices have single controllers hence they can write only one bit of information at a time or in other words multiple write operations cannot run in parallel. and one write operation will start when the other finishes. This means the single devices are not high performing.

Reliability - Single devices have definite single point of failure, if this device crashes, burns or destroyed you lose all your data with it. which is quite frankly very scary.

Scalability - Single devices are not scalable and your data is ever growing, so if you need more storage space, you either replace it with a bigger disk and go through the pain of transferring all your data or attach another one but obviously that will exist as a separate device and you could only have a few devices connected to your machine. So sooner or later you might be replacing your disk.

so here comes RAID (Redundant Array of Independent Disks) to the rescue, RAID is collection of two or more disks that are connected to the machine as one logical disk. RAID has different schemes which offers performance, reliability, scalability or all of them. We will talk about the most frequent schemes here for the rest you can refer to our favorite site Wikipedia. All different RAID schemes offers scalability so if you need more space just add another disk in the RAID and Eureka, Infact many RAID devices has hot swappable feature which makes it possible to Add or remove disk without disconnecting or switching of the RAID.

RAID 0: RAID 0 provides data stripping, which means the data is divided on the disks in the RAID, this scheme provides performance as the read/write operations are performed simultaneously on all the disks, the drawback with this scheme is that it provides no redundancy and a single disk failure results in full RAID failure.

RAID 1: This schemes provides data mirroring, which means that the same set of information is written across all the disks hence providing data redundancy. This model does not offer any considerable performance gains, but it provides fault tolerance so in case of a disk failure the other disk will provide the same data and there is no data loss.
RAID 1+0 or RAID 1E: This schemes combines the advantages of both model by providing data stripping and mirroring. This scheme works with atleast 3 disks in RAID by distributing the data on multiple disks to gain performance but maintaining the copy of each block on a other disk to achieve redundancy, so in case of a disk failure the rest of the disks can still provide all the data.
RAID 5 : RAID 1E looks like a perfect solution but as you know there is always a room for improvement, The problem with RAID 1E or RAID 1 is that we are dedicating a large portion of the disks for a redundant data which we might never use, and disks are not that cheap. So if we can find a way to have fault tolerance but at the same time reduce the space the redundant blocks of data are taking than it would be much better. So this is where RAID 5 is the savior. RAID 5 provides performance by data stripping across multiple disk but instead of storing a redundant block it stores the Parity information. and in case of failure it reconstructs the information by combining the parity information with the data on the other disks. RAID 3 and 4 also works on this parity model but the location of this parity block is dedicated while in case of RAID 5 parity is also distributed.
Now as we have understood the difference between single storage devices and RAID, we will now look into how these storage devices are used in an enterprise.

DAS (Direct attached storage)

Direct Attached Storage is a storage device (Single Device) or a collection of storage devices (RAID) that is connected the the Server directly, It can be internal or external and are connected to the server/machine using interfaces such as SATA, SCSI, USB or Firewire. These storage devices does not work independently and is managed by the machine its connected to. These devices can be shared over the network, but the request passes through its connected server machine.

NAS (Network Attached Storage)

NAS is a file server, NAS is similar to DAS the difference is that instead of it being connected to a server machine, its directly connected to the network, it has its own IP address/Host name and the NAS Device has its own OS which only provides File System level operations. Like DAS, NAS can also be in Single device or RAID model. NAS is used for centralized storage model, like backups etc.


SAN (Storage Area Network)

While NAS works fine in a network it has its own shortcomings, first it uses Ethernet protocol for communication which does not provide very high data transfer speeds and this ethernet is in most cases not dedicated for storage therefore you have other network traffic on it which further reduces the performance, secondly NAS does not distribute its storage among servers, it is one device with lots of storage space but all the machines accessing it has access to the same storage, except for if you put some security permissions. Now here is the beauty of SAN, SAN is a network of storage devices, so all the storage devices/servers you add in the SAN adds up into the total storage space of SAN, SAN has its own network that works over FibreChannel instead of ethernet that offers much much more faster data transfer speeds and only the servers that are required this storage are connected to this network using special NIC called Host Bus Adapter (HBA). SAN Storage shows up as physical drives instead of network shares in the servers. But the most prominent feature of SAN is its storage allocation, let us take an example. Say you have three servers you have connected to the SAN, and the total storage capacity of your SAN is 5 Tera bytes. so if you want to give Server 1 & 2 around 1 Tera byte each and the other rest of the storage space you could configure this in SAN Manager and each server will have drives with the required storage and at any point of time this storage division can be changed without affecting the servers. This gives the companies much more flexibility in terms of how to use your storage intelligently. SAN is almost the perfect technology but because of its obvious benefits and advanced technology the cost of implementing and maintaining a SAN is much more higher than the other schemes.



13 March 2010

Database Architecture with SQL Server 2008 - Part 1, Connectivity

In this series of articles I will focus on different aspects of SQL Server 2008 Database, A normal tendency in IT systems is to give too much emphasis on the Data model and not that much on other important aspects of a Database server, The objective of this series is to take you through all the different components of a database server and how can you design a Database architecture with, Performance, scalability, reliability and security.

Connectivity

In every Server/Client model the first and foremost thing is the communication protocol through which the server and client talks to each other. Microsoft SQL Server offers different protocols for connectivity, In this article I will try to explain each protocol to make easier for you to choose the protocol that is right for you. I will also focus on how the connectivity mechanism works in various protocol scenario. What we will not be looking into this article is the security mechanism during connectivity which I will cover in my later posts.

Following are the network protocols commonly used in SQL Server, Other protocols include, VIA, HTTP, DAC which are used in special circumstances.

1. Shared Memory

Shared memory is the fastest connectivity protocol available as its name explains the client and server communicates through memory, and it works in a single server scenario only, which means the client and the server must exists on the same machine to use this protocol. Unfortunately this option in not applicable in most of the production scenarios as Database server and application server resides on different machines.

2. Named Pipes

SQL Server also offers connectivity through Named pipes which is a queuing mechanism, Named pipes are system persistent pipes which works in FIFO mode, Named pipes works in single server or Local area network scenarios.

3. TCP/IP

TCP/IP is an industry wide network communication protocol used in LAN, WAN and internet scenarios. TCP/IP is the most widely used protocol for connectivity between database server & clients as it works in all different infrastructure topologies.

These protocols are installed as part of the Operating System installation and are not installed during SQL Server installation. If any protocol needs to be added it has to be first installed in the operating system. By default the above mentioned protocol comes with the Windows Operating system.

How the connectivity works.

Like in all Server Client models, there is a server application and multiple client application instances, in our case the Server application is the SQL Server Database engine. While the client application is called SQL Server Native Client (SQLCLI) Library, SQL Server Native Client (SQLNCLI10) is a data access technology that is new to Microsoft SQL Server, and it is a stand-alone data access Application Programming Interface (API) that is used for both OLE DB and ODBC. All other applications that connect to SQL Server Data engine internally uses the SQL Server Native client library. Previously MDAC or DB Library used to fill this role. It is mandatory for a successful connectivity that both server and client are working on a common protocol.

From MSDN: “When the SQL Server Database Engine communicates with an application, it formats the communication in a Microsoft communication format called a tabular data stream (TDS) packet. The network SQL Server Network Interface (SNI) protocol layer, encapsulates the TDS packet inside a standard communication protocol, such as TCP/IP or named pipes. The SNI protocol layer is common to both the Database Engine and SQL Server Native Client. The SNI protocol layer is not directly configured. Instead, the server and SQL Server Native Client are configured to use a network protocol. Then, the Database Engine and SQL Server Native Client automatically use the appropriate protocol settings. The server creates a SQL Server object called a TDS endpoint for each network protocol. On the server, the TDS endpoints are installed by SQL Server during SQL Server installation. For the named pipes and shared memory protocols, there can only be one endpoint per instance. There are no configurable endpoints for these protocol types. For TCP/IP and VIA, there is a default endpoint, but additional endpoints can be created.”

When a connection to the Database engine is initiated by the SQL Server Native Client, the network protocol is selected based on the order defined in the SQL Client Configuration. And an attempt to establish a connection to the that protocols TDS endpoint is tried, if the connection fails then the next network protocol in the list is attempted.



SQL Server Browser Service

When an instance of SQL Server starts, if the TCP/IP or VIA protocols are enabled for SQL Server, the server is assigned a TCP/IP port. If the named pipes protocol is enabled, SQL Server listens on a specific named pipe. This port, or "pipe," is used by that specific instance to exchange data with client applications. Because only one instance of SQL Server can use a port or pipe, different port numbers and pipe names are assigned for named instances. By default, when enabled, both named instances and SQL Server Express are configured to use dynamic ports, that is, an available port is assigned when SQL Server starts. If you want, a specific port can be assigned to an instance of SQL Server. When connecting, clients can specify a specific port; but if the port is dynamically assigned, the port number can change anytime SQL Server is restarted, so the correct port number is unknown to the client.

A SQL Server Browser services addresses this problem by assisting the client to locate the full address of the database engine, Upon startup, SQL Server Browser starts and claims UDP port 1434. SQL Server Browser reads the registry, identifies all instances of SQL Server on the computer, and notes the ports and named pipes that they use. When a server has two or more network cards, SQL Server Browser returns the first enabled port it encounters for SQL Server.

When SQL Server clients request SQL Server resources, the client network library sends a UDP message to the server using port 1434. SQL Server Browser responds with the TCP/IP port or named pipe of the requested instance. The network library on the client application then completes the connection by sending a request to the server using the port or named pipe of the desired instance.

If the SQL Server browser service is disabled, the client must provide the complete address of the database engine including the port number or pipe name in order to establish a connection, This only applies in the case of remote connections.

Alias

Another alternate to Full address problem is to create alias, you can define a friendly named alias for a database instance in the client configuration by specifying the protocol and the required details such as Server name or IP, and port or pipe address. And then using this alias in your client applications to address the database instance. Alias also gives you the flexibility of changing the Database instance’s address details without a need to change anything in the client applications, the only change would be to update the new address in the alias settings.