Alo Sarv
Preamble
Bittorrent protocol includes several rather unique
features
that go beyond the way other networks handle downloading. The most
obvious, and
most complex to handle in a multi-network application is the way
Bittorrent handles
files. Namely, at the protocol level, there are no files, there is only
one big
“virtual” file called a torrent. The
torrent data is the data of all files, in a single row. Each piece of
size X
has it’s own hash (SHA-1), but since the data of all files is in a
single row,
piece boundaries may exceed file boundaries. This introduces some
difficulties
in handling the files in a generic, multi-network client such as
Hydranode.
Hydranode hashing engine is designed to handle
single-files.
PartData submits request to hash a range in a file, or a full file,
which
hasher then does, and sends back the results. However, this is not
sufficient
for Bittorrent module.
Thanks to
the extendable interface of both PartData and Hasher API’s, Bittorrent
module
can over-ride the default hashing system, and implement the required
hashing
methods within the module.
Since a torrent may contain more than one file,
possibly in
a number of subdirs, implementation
should make it possible for User Interfaces to display the torrent
contents in
such a hierarchical way. For that, we have Object Hierarchy system
(also called
“Hydranode Virtual Filesystem” sometimes). Bittorrent module can create
the
requested amount of PartData objects, one for each file, but override
the
default parent (FilesList) with a custom Object-derived class, which
will then
indicate “a folder”. This has no effect on the other modules possibly
downloading the same files, since FilesList internal structures do not
depend
on Object Hierarchy system.
Folders in
the torrent hierarchy can implement additional features, such as
displaying the
overall speed and completeness of the PartData’s under them.
The
top-level “folder” in the hierarchy needs to be a PartData-derived
object as
well, since that is where Bittorrent will handle it’s pieces, hashes
and so on.
As mentioned earlier, PartData internals rely on the fact that one
download is
one file, the top-level, “virtual” file will have sum of the size of
all files in
the torrent, and will override the default flushBuffer() method,
forwarding the
data to the actual, non-virtual file.
Additional
care must be taken in order to keep ChunkMaps in sync in virtual files
(the
top-level file, as well as sub-folders), to avoid multiple modules
downloading
the same data chunk. As such, additional signaling system should be
implemented
in PartData to indicate that new data has arrived, and the location of
the
data. Existing solution, where EVT_PD_DATAADDED is emitted is
in-sufficient,
since the event cannot carry reference data with it.
While starting a torrent is performed by reading a
.torrent
file, the format of the file is in-sufficient for saving the overall
state of
the torrent. PartData has it’s own format for saving/loading it’s
state, which
the top-level “virtual” file can use. Since each PartData object is
accompanied
by a reference MetaData object, additional, custom fields should be
implemented
in the MetaData object in order to store the information about the
torrent –
most importantly, reference ID’s of the PartData objects belonging to
this
torrent, as well as the directory structure.
While there
is no ReferenceID used in PartData API
(there was in the early design phases, but it was removed), something
else,
unique, must be used. The most obvious solution would be to use the
randomly-generated PartData file-name, which is unique (also used for
.tmp /
.tmp.dat file names). For the directory structure, additional custom
fields in
MetaData structure shall be used.
Part II - Design
1.
Torrent structure and layout in memory
Torrent
+--- File1
+--- File2
+--- SubFolder1
| +------ File3
| +------ File4
+--- File5
As is known, for each PartData object, there must be a corresponding
SharedFile object. This is required for partial downloads to be
uploaded back to network (if appropriate hashes are available). This
does, however, introduce an additional, somewhat complex, variable into
the design.
Following that design principle, two objects need to be created for
a torrent - TorrentFile and PartialTorrent.
TorrentFile
represents a torrent, which has one or more files in it in a
directory structure. Whether or not the torrent is complete (e.g. in seeding state) or not is
irrelevant, just as SharedFile doesn't care if it is partial or not. TorrentFile object
is derived from SharedFile, has parent object set to FilesList, and
overrides virtual function read(). TorrentFile keeps reference to the
corresponding PartialTorrent object, as well as a list of SharedFile
objects contained within this torrent.
PartialTorrent keeps
track of the information related to torrents in downloading state. It is derived
from PartData, and implements virtual functions write(), doWrite() and flushBuffer(). PartialTorrent, on
it's own, is a "virtual" file, as it does not represent a physical file
itself, but rather the sum of all files in the torrent. It maintains a
list of child PartData objects, re-directing the implemented virtual
functions to the corresponding PartData objects, and scheduling
chunk-hashing as neccesery.
The files contained in the torrent can be implemented using normal
SharedFile and PartData objects, without over-riding any virtual
functions (thus, derivation is no longer needed there), but it is
neccesery to over-ride the default parent Object for those, pointing to the
parent object in the torrent structure. This is required to allow User
Interfaces to display the torrent in a properly structured hierarchy.
Since at Bittorrent protocol level, the actual location of a file in
the hierarchy is no longer relevant (only the order of files is
relevant), TorrentFile and PartialTorrent can simply keep a vector of
files, without needing to implement any additional hierarchy-handling -
this is all taken care by Object
hierarchy already.
TorrentFile
|
+--- SharedFile "File1"
| +---- PartData "File1"
+--- SharedFile "File2"
| +---- PartData "File2"
+--- TorrentFolder
| +---- SharedFile "File3"
| | +----- PartData "File3"
| +---- SharedFile "File4"
| +----- PartData "File4"
+--- SharedFile "File5"
+---- PartData "File5"
Messages passing:
» Reading
TorrentFile::read()
|
V
Look up correct SharedFile (based on read start offset)
|
V
SharedFile::read() (reads actual data from disk, returning it)
» Writing
PartialTorrent::write() (called by driver code)
|
V
Look up corrent PartData (based on write start offset)
|
V
PartData::write() (writes actual data, possibly calling flushBuffer())
|
V
signal(dataWritten) (signals back up that data has been added)
|
V
PartialTorrent::onChildDataWritten() (updates internal chunkmaps)
// All Bittorrent module classes reside in this namespace
namespace BT {
/**
* Base class implements module initialization/destruction,
* configuration storage, as well as additional features required
* by implementation.
*/
class BTBase : public ModuleBase {
public:
/**
* Called on module initialization
*
* @returns True on successful startup, false otherwise
*/
virtual bool onInit();
/**
* Called when module is unloaded and/or app is exiting
*
* @returns 0 on successful exit, nonzero exit code otherwise
*/
virtual int onExit();
/**
* Creates a new torrent out of the files
*
* @param fiels Files to create the torrent of
* @returns The newly-created torrent object
*
* \remarks The files are hashed asynchronously, after this function
* returns, thus the returned object may not be fully usable
* before hashing is finished.
*/
TorrentFile* createTorrent(
const std::vector<boost::filesystem::path&> files
);
/**
* Starts new torrent download
*
* @param refData The contents of .torrent file, containing reference
* data for this torrent
* @returns The newly-created torrent download
*
* \note This method also takes care of registering the created objects
* at FilesList and/or additional locations.
*/
TorrentFile* downloadTorrent(const std::string &refData);
};
/**
* TorrentFile represents one single torrent, which may be either "seeded" or
* "leeched", e.g. partial or complete in Hydranode terms.
*
* Upon construction, this object should be registred with FilesList class,
* as while acting as "virtual" file (there is no single underlying file, there
* may be more than one), various listings (in User Interfaces) may still want
* to see this object.
*
* Since this is a "virtual" file, all I/O calls are forwarded to the actual
* implementation object of type SharedFile.
*/
class TorrentFile : public SharedFile {
public:
/**
* Creates a "seeded" torrent, out of the files listed
*
* @param files Files to create the torrent of
*
* \throws std::runtime_error if any of the files isn't "readable"
* \note All files will be hashed (asyncronously) before this torrent
* will be available for publishing
*/
TorrentFile(const std::vector<boost::filesystem::path&> files);
/**
* Overrides default "read" method, since this is a "virtual" file.
* Forwards the call to the corresponding SharedFile object (listed in
* m_children member), based on the begin offset
*
* @param begin Begin offset to begin reading
* @param end End offset to stop reading
* @returns The data read
*
* \throws Utils::ReadError if reading fails for any reason
*/
virtual std::string read(uint64_t begin, uint64_t end);
private:
//! Shall contain pointers to all child objects
implementation_defined<SharedFile*> m_children;
};
/**
* PartialTorrent object handles everything related to downloading a torrent.
* PartialTorrent may never exist without a corresponding TorrentFile parent
* object. Since this is a "virtual" file, all I/O calls are forwarded to the
* actual implementation objects of type PartData.
*/
class PartialTorrent : public PartData {
public:
/**
* Create a "leeched" torrent to be downloaded. The torrent information
* is read from the passed parent pointer.
*
* @param parent Parent torrent object
*/
PartialTorrent(TorrentFile *parent);
protected:
/**
* Since this is a "virtual" file, and the physical files are found in
* m_children member, this method forwards the call to the correct child
* object(s), based on the start offset.
*
* @param offset Start offset for data
* @param data Data to be written
*
* \throws std::runtime_error if something goes wrong
*/
virtual void write(uint64_t offset, const std::string &data);
private:
//! Shall contain pointers to all child objects
implementation_defined<PartData*> m_children;
/**
* Handles dataAdded() signals from child objects, and updates internal
* maps accordingly.
*
* @param obj PartData that was added data to
* @param offset Begin offset where data was written
* @param len Length of data that was written
*/
void onDataAdded(PartData *obj, uint64_t offset, uint64_t len);
/**
* Handler for addSourceMask() signals from child objects; updates
* m_chunks member in this object accordingly.
*
* @param obj PartData that emitted the signal
* @param chunkSize Size of a chunk
* @param chunks Boolean vector containing "true" value for each
* chunk the client has, and "false" otherwise.
*/
void onSourceMaskAdded(
PartData *obj, uint32_t chunkSize,
const std::vector<bool> &chunks
);
};
} // end namespace BT
Appendix A - Required
changes to the Engine