WebScarab architecture WebScarab is designed as a framework that contains a number of plugins that perform various tasks. The Plugins perform one or both of the following key functions: * Generating conversations * Analysing conversations For example, the Proxy plugin generates conversations by reading Requests from a browser. It does not perform any analysis of the conversations. The Spider plugin, on the other hand, parses HTML responses to identify links to resources that have not yet been seen (and can be extended to extract links from other content types, if desired). It then builds Requests based on those links. The Fragments plugin only does analysis, looking for interesting text in (currently) HTML responses. "Interesting" is defined as "scripts and comments", at the moment! ;-) The basic framework performs the following major functions: * Keeps a record of the conversations, and URL's identified * Calls each plugin when a new conversation is added. It also does session management, to support creating and loading new sessions. The most important classes of WebScarab are: Request - represents an HTTP request. Response - represents the HTTP response that corresponds to an HTTP request Message - represents a byte stream or byte array, with a number of associated Name-Value pairs. Extended by Request and Response, and also used to represent individual parts of multi-part MIME messages. HttpUrl - an HTTP or HTTPS URL. This includes some useful functionality that a standard java.net.URL does not offer. ConversationID - provides a reference to a particular Request/Response pair that has been added to the SiteModel. SiteModel - used to group all the conversations together, as well as providing a "view" of all the URL's that have been identified. It provides a simple means of storing some information about an HttpUrl, or about a conversation, and retrieving that information. It also provides notifications to registered listeners whenever something changes. Finally, it acts as a shared CookieJar, which plugins can use to synchronise cookies between themselves. For example, the Proxy plugin extracts cookies that it sees in the Responses, and the Spider plugin uses those cookies when generating Requests. Framework - maintains a list of the loaded plugins, and acts as an intermediary between the Plugins and the SiteModel, receiving conversations from the Plugins, adding them to the SiteModel, and notifying each plugin that a new conversation can be analysed. Then there are the classes that actually do the work of fetching a Response from an HTTP/S server. HTTPClientFactory - should be used to create properly parameterised HTTPClient's. Parameterised means, already configured with proxies, client certificates, etc. HTTPClient - defines the interface for an HTTPClient URLFetcher - does the "heavy lifting". This is an implementation of HTTPClient which connects to the HTTP server, submits the Request, retrieves the Response, and manages the socket for possible later reuse (Connection: keep-alive) AsyncFetcher - allows the caller to submit a Request that can be executed by one of a number of simultaneous threads. This provides a non- blocking interface that allows e.g. Spider to fetch a number of Responses at the same time. This simply wraps a number of URLFetchers. On top of this all is the user interface. I have tried to make WebScarab UI-neutral. In other words, it should be fairly easy to develop a SWT or browser-based user interface, without having to change too much existing code. I think I have done this reasonably well with the Framework, and the Swing UIFramework follows the MVC model, with the SiteModel as the "M", the UIFramework as the "V", and the UIFramework and Framework cooperating as the "C". However, I have not been as diligent in separating the model and the controller classes in the various plugins. I'm sure that there is a lot that could be improved in this area. The Swing UIFramework provides a few basic facilities: It allows the user to create new sessions, and open old sessions. It allows the user to parameterise the HTTPClientFactory, setting upstream proxies, and client certificates. It provides . . . Well, basically, it provides all the various menu options that you can see. ;-) It also provides a view of the URL's and the conversations that have been seen, in the "Summary". This panel shows all URL's that have a corresponding conversation (i.e. an URL that has been "seen"). It also shows the conversations. Implementation Note: The SummaryPanel itself only creates a few of the columns that exist when WebScarab is run. By default, the SummaryPanel shows which methods have been used for a particular URL, and what status responses have been returned. For conversations, it shows the ID, Method, Url, Parameters and Response status. These are considered to be the basic minimum information. All other columns are provided by the various UI plugins, using a ColumnDataModel, where information for the column is retrieved using the URL, or the ConversationID, depending on which table or treetable the column is in. Similarly, the only "right-click" action provided by the SummaryPanel is showing the conversation details. Other actions are provided by the Plugins. There are a number of useful classes provided in the ui.swing package, which can be used by the Swing UI plugins. Most likely, you will use the RequestPanel and the ResponsePanel to display and edit Requests and Responses. The RequestPanel and ResponsePanel offer two views of the data. One is a "parsed" view, where the message is broken down into individual pieces, making it easier for a human to comprehend, or access a specific part. The other is a "raw" view, which is a direct representation of the characters and bytes. The "parsed" views of the RequestPanel and ResponsePanel each contain a MessagePanel, which has a table for message headers, and a ContentPanel, which, depending on the Message's Content-Type, creates and populates various editors to display that content. It may make more sense, or be "better" in some way to distinguish between Renderers and Editors, as Sun has done for tables, etc. I just couldn't wrap my mind about how to implement this, so I did it this way. Contributions are welcome! ;-) At the moment, there are editors for plain text, HTML, GIF and JPG images, Multi-part content (which actually just wraps another MessagePanel), and arbitrary byte data (the Hex editor). Editors are registered with the EditorFactory, by specifying the Content-Types that they can handle. The ContentPanel then requests all editors for a particular Content-Type, and displays them in the tabs. Other useful classes are the ConversationListModel, ConversationTableModel, SiteTreeModelAdapter and SiteTreeTableModelAdapter classes. These basically wrap the SiteModel, and provide a Swing model interface, complete with dynamic listeners, etc. They do only provide a read-only view, though. The above classes should make it pretty easy to build a usable GUI for a plugin in a short time. So, how do I write a Plugin? All plugins must implement the Plugin interface. (At the moment, they actually extend the Plugin class, but that should change. There is actually no meaningful shared code in Plugin, so I plan to make it an interface instead.) Each plugin is instantiated once, at startup. A session is loaded, a new Thread is created for each plugin, and the Thread is started. At this point, the plugin can start generating Requests, using an HTTPClient to fetch the Responses, possibly perform some local analysis of the Response, and (optionally) submit it to the Framework for archiving, and distribution to all the plugins (including itself) for analysis. I say "optionally" above, because, for example, in the case of the SessionIDAnalysis plugin, it simply receives thousands of near-identical responses, and it takes care of extracting and recording the interesting information from those responses itself. The analysis() method of each plugin is called for EVERY conversation that is submitted to the Framework. This allows, e.g. the Spider plugin to extract links from all responses, not just the ones that it generates. So how do I store my data in the session? Plugins are created with a reference to the Framework. From the Framework, one can get a reference to the SiteModel. Depending on the complexity of your requirements, you may simply choose to put your data into the SiteModel as a Conversation property, or an Url property. The SiteModel supports lists of Strings, referenced by a property name. If this is not adequate, you should define a PluginStore interface for your plugin, which defines its requirements. Then implement a FileSystemStore that provides those requirements. When a new session is opened, or created, your plugin's setSession() method will be called, with a String stating the plugin type (currently only "FileSystem"), an Object representing the overall store (a java.io.File representing the session directory), and a session identifier, which might be used by a SQLStore to differentiate the rows for a number of different sessions. This is not currently used by the FileSystemStore. And how do I put a UI on top of my plugin? Swing plugin interfaces must implement the SwingPluginUI interface. This defines methods for returning a JPanel which is added to the main JTabbedPane (after the Summary), actions for the SummaryPanel's right-click menus, and ColumnDataModel's for the URL and Conversation tables. At the moment, I have also defined a "PluginUI" interface for each plugin, allowing the plugin to interact directly with its user interface, for example, to pop up an error dialog, a Proxy intercept dialog, etc. This works, I think, but I think I should also try to separate the "model" aspects out of each plugin, and provide "ModelListener" interfaces that the UI can implement to get notifications of changes to the model. For example, the SessionIDAnalysis plugin calls SessionIDAnalysisUI methods for each session id that it extracts. This means that the SessionID tablemodel and the XY datamodel almost HAVE to be implemented as inner classes in the overall UI, rather than registering as independent listeners of a hypothetical SessionIDModel. This will be fixed in due course, where it makes sense.