Fork me on GitHub

PHPCR Guide

This is an introduction into the PHP content repository. You will mostly see code examples. It should work with any PHPCR implementation. We propose using Jackalope Jackrabbit to get started as it supports all features described here.

Installing Jackalope

Just follow the README of the jackalope-jackrabbit repository.

There are currently two options for browsing and modifying the contents of the PHPCR repository.

In a nutshell

The shortest self-contained example should output a line with ‘value’:

    <?php
    require("/path/to/jackalope-jackrabbit/vendor/autoload.php");

    $factoryclass = '\Jackalope\RepositoryFactoryJackrabbit';
    $parameters = array('jackalope.jackrabbit_uri' => 'http://localhost:8080/server');
    // end of implementation specific configuration

    $factory = new $factoryclass();
    $repository = $factory->getRepository($parameters);
    $credentials = new \PHPCR\SimpleCredentials('admin','admin');
    $session = $repository->login($credentials, 'default');
    $root = $session->getRootNode();
    $node = $root->addNode('test', 'nt:unstructured');
    $node->setProperty('prop', 'value');
    $session->save();

    // data is stored now. in a follow-up request you can do
    $node = $session->getNode('/test');
    echo $node->getPropertyValue('prop'); // outputs "value"

Still with us? Good, lets get in a bit deeper…

Introduction

In the following chapters, we will show how to use the API. But first, you need a very brief overview of the core elements of PHPCR. After reading this tutorial, you should browse through the API documentation to get an idea what operations you can do on each of those elements. See the conclusions for links if you want to have more background.

Not every implementation has to support all chapters of the specification. PHPCR is a modular standard and has a built-in way to discover the capabilities of your implementation. TODO: Add a section about capability testing to show how to write portable code.

Bootstrapping

You will need to make sure your php classes are available. Usually this means activating an autoloader. For a standalone project, just use the file generated by composer at vendor/.composer/autoload.php PHPCR and Jackalope follow the PSR-0 standard. If you want your own autoloading, use a PSR-0 compatible autoloader and configure it to find the code folder.

Once you have autoloading set up, bootstrap jackalope-jackrabbit like this:

    <?php
    require("vendor/autoload.php");

    // factory (the *only* implementation specific part)
    $factoryclass = '\Jackalope\RepositoryFactoryJackrabbit';
    // the parameters would typically live in a configuration file
    // see your implementation for required and optional parameters
    $parameters = array('jackalope.jackrabbit_uri' => 'http://localhost:8080/server');

    // end of implementation specific configuration
    // from here on, the whole code does not need to be changed when using different implementations

    $factory = new $factoryclass();
    $repository = $factory->getRepository($parameters);
    if (null === $repository) {
        var_dump($parameters);
        die('There where missing parameters, the factory could not create a repository');
    }

    // the login parameters would typically live in a configuration file

    $workspacename = 'default';
    $user = 'admin';
    $pass = 'admin';

    // create credentials and log in to get a session
    $credentials = new \PHPCR\SimpleCredentials($user, $pass);
    try {
        $session = $repository->login($credentials, $workspacename);
    } catch(\PHPCR\LoginException $e) {
        die('Invalid credentials: '.$e->getMessage());
    } catch(\PHPCR\NoSuchWorkspaceException $e) {
        die("No workspace $workspacename: ".$e->getMessage());
    }

    // if we get here, we have a session object that can be used to read and write the repository

Get some data into the repository

We will discuss the import feature in more detail later, but to have some data, we just import something here. Create an XML file test.xml like this:

    <data xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
        <node title="Test" content="This is some test content" />
        <sibling title="Test" content="This is another test content">
            <child1 title="Child1 title" />
            <child2 title="Child2 title" />
            <otherchild title="Otherchild title"/>
            <yetanother title="Yetanother title">
                <child title="Child title" />
            </yetanother>
        </sibling>
    </data>

Now import this into the repository:

    <?php
    $session->importXML('/', 'test.xml', \PHPCR\ImportUUIDBehaviorInterface::IMPORT_UUID_CREATE_NEW);
    $session->save();

Reading data and traversal

You can wrap any code into try catch blocks. See the API doc for what exceptions to expect on which calls. With PHPCR being ported from Java, there is a lot of Exceptions defined. But as this is PHP, you don’t have to catch them. As long as your content is as the code expects, it won’t matter.

    <?php
    $node = $session->getNode('/data/node');
    echo $node->getName(); // will be 'node'
    echo $node->getPath(); // will be '/data/node'

Reading properties

    <?php
    //get the node from the session
    $node = $session->getNode('/data/node');

    // get the php value of a property (type automatically determined from stored information)
    echo $node->getPropertyValue('title');

    // get the Property object to operate on
    $property = $node->getProperty('content');
    echo 'Size of '.$property->getPath().' is '.$property->getLength();

    // read a property that could be very long
    $property = $node->getProperty('content');

    // if it is binary convert into string
    $data = $property->getString();
    echo $data;

    // get binary stream. could be more performant with binary property
    $stream = $property->getBinary();
    fpassthru($stream);
    fclose($stream);

    // the above in short if you just want to dump a file that is in a binary propery:
    // fpassthru($node->getPropertyValue('binary-prop'));

Note: the backend stores the property types. When getting property values, they are returned with that type, unless you use one of the explicit PropertyInterface::getXX methods. For that case, type conversion is attempted and an exception thrown if this is not possible.

See the API doc for a list of all supported types.

    <?php
    // get all properties of this node
    foreach ($node->getPropertiesValues() as $name => $value) {
        echo "$name: $value\n";
    }
    // get the properties of this node with a name starting with 't'
    foreach ($node->getPropertiesValues("t*") as $name => $value) {
        echo "$name: $value\n";
    }

Traversing the hierarchy

    <?php
    //get the node from the session
    $node = $session->getNode('/data/node');

    // getting a node by path relative to the node
    $othernode = $node->getNode('../sibling'); // /sibling

    // get all child nodes. the $node is Iterable, the iterator being all children
    $node = $session->getNode('/data/sibling');
    foreach ($node as $name => $child) {
        if ($child->hasProperties()) {
            echo "$name has properties\n";
        } else {
            echo "$name does not have properties\n";
        }
    }

    // get child nodes with the name starting with 'c'
    foreach ($node->getNodes('c*') as $name => $child) {
        echo "$name\n";
    }

    // get child nodes with the name starting with 'o' or ending with '2' or named 'yetanother'
    foreach ($node->getNodes(array('o*', '*2', 'yetanother')) as $name => $child) {
        echo "$name\n";
    }

    // get the parent node
    $parent = $node->getParent(); // /

    // build a breadcrumb of the node ancestry
    $node = $session->getNode('/data/sibling/yetanother');
    $i = 0;
    $breadcrumb = array();
    do {
        $i++;
        $parent = $node->getAncestor($i);
        $breadcrumb[$parent->getPath()] = $parent->getName();
    } while ($parent != $node);
    var_dump($breadcrumb);

Node and property references

Nodes can be referenced by unique id (if they are mix:referenceable) or by path. getValue returns the referenced node instance. Properties can only be referenced by path because they can not have a unique id.

The test document we imported above does not contain the type information we need to show this example. Lets create a special one and load it into the repository with Session::importXML:

    <sv:node
         xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
         xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
         xmlns:xs="http://www.w3.org/2001/XMLSchema"
         xmlns:jcr="http://www.jcp.org/jcr/1.0"
         xmlns:sv="http://www.jcp.org/jcr/sv/1.0"
         xmlns:rep="internal"

        sv:name="idExample"
    >
        <sv:property sv:name="jcr:primaryType" sv:type="Name">
            <sv:value>nt:unstructured</sv:value>
        </sv:property>

        <sv:node sv:name="target">
            <sv:property sv:name="jcr:primaryType" sv:type="Name">
                <sv:value>nt:unstructured</sv:value>
            </sv:property>
            <sv:property sv:name="jcr:mixinTypes" sv:type="Name">
                <sv:value>mix:referenceable</sv:value>
            </sv:property>
            <sv:property sv:name="jcr:uuid" sv:type="String">
                <sv:value>13543fc6-1abf-4708-bfcc-e49511754b40</sv:value>
            </sv:property>
            <sv:property sv:name="someproperty" sv:type="String">
                <sv:value>Some value</sv:value>
            </sv:property>
        </sv:node>

        <sv:node sv:name="source">
            <sv:property sv:name="jcr:primaryType" sv:type="Name">
                <sv:value>nt:unstructured</sv:value>
            </sv:property>
            <sv:property sv:name="reference" sv:type="WeakReference">
                <sv:value>13543fc6-1abf-4708-bfcc-e49511754b40</sv:value>
            </sv:property>
            <sv:property sv:name="path" sv:type="Path">
                <sv:value>../target/someproperty</sv:value>
            </sv:property>
        </sv:node>

    </sv:node>

Now import the contents of that file instead of the other one. With this data, you can do this:

    <?php
    $node = $session->getNode('/idExample/source');
    // will return you a node if the property is of type REFERENCE or WEAKREFERENCE
    $othernode = $node->getPropertyValue('reference');

    // force a node
    $property = $node->getProperty('reference');
    // will additionally try to resolve a PATH or NAME property and even work
    // if the property is a STRING that happens to be a valid UUID or to
    // denote an existing path
    $othernode = $property->getNode();

    // get a referenced property
    $property = $node->getProperty('path');
    $otherproperty = $property->getProperty();
    echo $otherproperty->getName(); // someproperty
    echo $otherproperty->getValue(); // Some value

Shareable nodes

Optional feature, not yet implemented in Jackalope.

Graph structure instead of a tree, nodes can have more than one parent.

Same name siblings

Optional feature, not fully tested in Jackalope.

Nodes with the same parent can have the same name. They are distinguished by an index, as in xpath.

Query: Search the database

    <?php
    // get the query interface from the workspace
    $workspace = $session->getWorkspace();
    $queryManager = $workspace->getQueryManager();

    $sql = "SELECT * FROM [nt:unstructured]
        WHERE [nt:unstructured].[title] = 'Test'
        ORDER BY [nt:unstructured].content";
    $query = $queryManager->createQuery($sql, 'JCR-SQL2');
    $query->setLimit(10); // limit number of results to be returned
    $query->setOffset(1); // set an offset to skip first n results
    $queryResult = $query->execute();

    foreach ($queryResult->getNodes() as $path => $node) {
        echo $node->getName();
    }

Without building nodes

There can be a little performance boost if you do not need to fetch the nodes but just want to access one value of each node.

    <?php
    foreach ($queryResult as $path => $row) {
        echo $path . ' scored ' . $row->getScore();

        $row->getValue('a-value-you-know-exists');
    }

Large search results can be dangerous for performance. See below for some performance tips.

Using Query Object Model (QOM) for building complex queries

PHPCR provides two languages to build complex queries. SQL2 and Query Object Model (QOM). While SQL2 expresses a query in a syntax similar to SQL, QOM expresses the query as a tree of PHPCR objects.

In this section we will cover QOM. See the JCR docs for an exposition of both languages.

You can access the QueryObjectModelFactory from the session:

    <?php
    $qomFactory = $mySession->getWorkspace()->getQueryManager()->getQOMFactory();

The QOM factory has a method to build a QOM query given four parameters, and provides methods to build these four parameters:

$queryObjectModel = $QOMFactory->createQuery(SourceInterface source, ConstraintInterface constraint, array orderings, array columns);

source is made out of one or more selectors. Each selector selects a subset of nodes. Queries with more than one selector have joins. A query with two selectors will have a join, a query with three selectors will have two joins, and so on.

constraint filters the set of node-tuples to be retrieved. Constraint may be combined in a tree of constraints to perform a more complex filtering. Examples of constraints are:

orderings determine the order in which the filtered node-tuples will appear in the query results. The relative order of two node-tuples is determined by evaluating the specified orderings, in list order, until encountering an ordering for which one node-tuple precedes the other.

columns are the columns to be included in the tabular view of query results. If no columns are specified, the columns available in the tabular view are implementation determined. In Jackalope include, for each selector, a column for each single-valued non-residual property of the selector’s node type.

The simplest case is to select all [nt:unstructured] nodes:

    <?php
    $source = $qomFactory->selector('a', '[nt:unstructured]');
    $query = $qomFactory->createQuery($source, null, array(), array());
    $queryResult = $query->execute();

The Query Builder: a fluent interface for QOM

Sometimes you may prefer to build a query in several steps. For that reason, the phpcr-utils library provides a fluent wrapper for QOM: the QueryBuilder. It works with any PHPCR implementation.

An example of query built with QueryBuilder:

    <?php
    use PHPCR\Query\QOM\QueryObjectModelConstantsInterface;
    use PHPCR\Util\QOM\QueryBuilder;

    $qf = $qomFactory;
    $qb = new QueryBuilder($qomFactory);
    //add the source
    $qb->from($qomFactory->selector('a', 'nt:unstructured'))
        //some composed constraint
        ->andWhere($qf->comparison($qf->propertyValue('a', 'title'),
        QueryObjectModelConstantsInterface::JCR_OPERATOR_EQUAL_TO,
        $qf->literal('Test')))
        //orderings (descending by default)
        ->orderBy($qf->propertyValue('a', 'content'))
        //set an offset
        ->setFirstResult(0)
        //and the maximum number of node-tuples to retrieve
        ->setMaxResults(25);
    $result = $qb->execute();

    foreach ($result->getNodes() as $node) {
        echo $node->getName() . " has content: " . $node->getPropertyValue('content') . "\n";
    }
    //node has content: This is some test content
    //sibling has content: This is another test content

Writing data

With PHPCR, you never use ‘new’. The node works as a factory to create new nodes and properties. This has the nice side effect that you can not add a node where there is no parent.

Everything you do on the Session, Node and Property objects is only visible locally in this session until you save the session.

    //get the node from the session
    $node = $session->getNode('/data/node');

    // add a new node as child of $node
    $newnode = $node->addNode('new node', 'nt:unstructured'); // until we have shown node types, just use nt:unstructured as type

    // set a property on the new node
    $newproperty = $newnode->setProperty('my property', 'my value');

    // persist the changes permanently. now they also become visible in other sessions
    $session->save();


    // have a reference
    $targetnode = $session->getNode('/data/sibling/yetanother');

    // make sure the target node is referenceable.
    $targetnode->addMixin('mix:referenceable');
    // depending on the implementation, you might need to save the session at
    // this point to have the identifier generated

    // add a reference property to the node. because the property value is a
    // Node, PHPCR will automatically detect that you want a reference
    $node->setProperty('my reference', $targetnode);

    $session->save();

Moving and deleting nodes

    <?php
    // move the node yetanother and all its children from its parent /sibling to
    // the new parent /sibling/child1
    // the target parent must already exist, it is not automatically created
    // as the move includes the target name, it can also be used to rename nodes
    $session->move('/data/sibling/yetanother', '/data/sibling/child1/yetanother');

    // for this session, everything that was at /sibling/yetanother is now under /sibling/child1/yetanother
    // i.e. /sibling/child1/yetanother/child
    // once the session is saved, the move is persisted and visible in other sessions
    // alternatively, you can immediatly move the node in the persistent storage

    // rename node child2 to child2_new
    $workspace = $session->getWorkspace();
    $workspace->move('/data/sibling/child2', '/data/sibling/child2_new');

    // copy a node and its children (only available on workspace, not inside session)
    $workspace->copy('/data/sibling/yetanother', '/data/sibling/child1/yetanother');

    // delete a node
    $session->removeItem('/data/sibling/child1/yetanother');

Orderable child nodes

While moving is about changing the parent of a node, ordering is used to set the position inside the child list. Preserving and altering order is an optional feature of PHPCR.

The only method needed is Node::orderBefore

    <?php
    //get the node from the session
    $node = $session->getNode('/data/node');

    $node->addNode('first');
    $node->addNode('second'); // new nodes are added to the end of the list
    // order is: first, second

    // ordering is done on the parent node. the first argument is the name of
    // the child node to be reordered, the second the name of the node to moved
    // node is placed before
    $node->orderBefore('second', 'first');
    // now the order is: second, first

Versioning

Versioning is used to track changes in nodes with the possibility to get back to older versions.

A node with the mixin type mix:versionable or mix:simpleVersionable can be versioned. Versioned nodes have a version history, containing the root version and all versions created. Each version contains the meta data (previous versions, next versions and creation date) and provides a snapshot of the node at that point, called “frozen node”.

    <?php
    //get the node from the session
    $node = $session->getNode('/data/node');

    $node->setProperty('foo', 'fafa');
    // mark the node as versionable
    $node->addMixin('mix:versionable');
    $session->save();

    // version operations are done through the VersionManager
    $versionManager = $session->getWorkspace()->getVersionManager();

    // put the versionable node into edit mode
    $versionManager->checkout($node->getPath());
    $node->setProperty('foo', 'bar'); // need a change to see something
    $session->save(); // you can only create versions of saved nodes
    // create a new version of the node with our changes
    $version = $versionManager->checkin($node->getPath());
    // Version extends the Node interface. The version is the node with additional functionality

    // walk back the versions
    $oldversion = $version->getLinearPredecessor();
    // the version objects are just the meta data. call getFrozenNode on them
    // to get a snapshot of the data when the version was created
    echo $version->getName() . ': ' . $version->getFrozenNode()->getPropertyValue('foo') . "\n"; // 1.1: bar
    echo $oldversion->getName() . ': ' . $oldversion->getFrozenNode()->getPropertyValue('foo'); // 1.0: fafa

    // get the full version history
    $history = $versionManager->getVersionHistory($node->getPath());
    foreach ($history->getAllFrozenNodes() as $node) {
        if ($node->hasProperty('foo')) {
            // the root version does not have the property
            echo $node->getPropertyValue('foo') . "\n";
        }
    }

    // restore an old version
    $node->setProperty('foo', 'different');
    $versionManager->checkout($node->getPath());
    $session->save(); // restoring is only possible if the session is clean
    $current = $versionManager->getBaseVersion($node->getPath());
    $versionManager->restore(true, $current);
    echo $node->getPropertyValue('foo'); // fafa

Locking

In PHPCR, you can lock nodes to prevent concurrency issues. There is two basic types of locks:

Note that jackalope currently only implements session based locks.

    <?php
    //get the node from the session
    $node = $session->getNode('/data/sibling');
    //the node has to be lockable
    $node->addMixin('mix:lockable');
    $session->save(); //node needs to be clean before locking

    // get the lock manager
    $workspace = $session->getWorkspace();
    $lockManager = $workspace->getLockManager();
    var_dump($lockManager->isLocked('/data/sibling')); // should be false
    $lockManager->lock('/data/sibling', true, true); // lock child nodes as well, release when session closed
    // now only this session may change the node //sibling and its descendants
    var_dump($lockManager->isLocked('/data/sibling')); // should be true
    var_dump($lockManager->isLocked('/data/sibling/child1')); // should be true because we locked deep

    // getting the lock from LockManager is not yet implemented with jackalope-jackrabbit
    $lock = $lockManager->getLock('/data/sibling');
    var_dump($lock->isLockOwningSession()); // true, this is our lock, not somebody else's
    var_dump($lock->getSecondsRemaining()); // PHP_INT_MAX because this lock has no timeout
    var_dump($lock->isLive()); // true

    $node = $lock->getNode(); // this gets us the node for /sibling
    $node === $lockManager->getLock('/data/sibling')->getNode(); // getnode always returns the lock owning node

    // now unlock the node again
    $lockManager->unlock('/data/sibling'); // we could also let $session->logout() unlock when using session based lock
    var_dump($lockManager->isLocked('/data/sibling')); // false
    var_dump($lock->isLive()); // false

Transactions

The PHPCR API in itself uses some sort of ‘transaction’ model by only persisting changes on session save. If you need transactions over more than one save operation or including workspace operations that are dispatched immediatly, you can use transactions.

Note that Jackalope does not support the full transactions.

    <?php
    // get the transaction manager.
    $workspace = $session->getWorkspace();
    $transactionManager = $workspace->getTransactionManager();
    // start a transaction
    $transactionManager->begin();
    $session->removeNode('/data/sibling');
    $session->getRootNode()->addNode('insideTransaction');
    $session->save(); // wrote to the backend but not yet visible to other sessions
    $workspace->move('/data/node', '/new'); // will only move the new node if session has been saved. still not visible to other sessions
    $transactionManager->commit(); // now everything become persistent and visible to others

    // you can abort a transaction
    try {
        ...
    } catch(\Exception $e) {
        if ($transactionManager->inTransaction()) {
            $transactionManager->rollback();
        }
        ...
    }

Import and export data

As promised, here are some more details on importing and exporting data. There are two formats:

As an analogy, think about an SQL dump file with SQL statements and the dump of an SQL table into a csv file. You can restore the data from both, but the SQL dump knows every detail about your field types and so on while the CSV just knows the data.

When exporting, you tell explicitly to which format you want to export.

    <?php
    $file = fopen('/tmp/document.xml', 'w+');

    // dump the tree at /foo/bar into a document view file
    $session->exportDocumentView(
        '/data/sibling',
        $file,
        true, // skip binary properties to not have large files in the dump
        false // recursivly output the child nodes as well
    );

    fclose($file);

    $file = fopen('/tmp/system.xml', 'w+');
    // export the tree at /foo/bar into a system view xml file
    $session->exportSystemView(
        '/data/sibling',
        $file,
        false, // do not skip binary properties
        false
    );

    fclose($file);

Importing detects the format automatically. If the document is a valid JCR system view, it is interpreted according to that format, otherwise if it is a valid XML document it is imported as document.

    <?php
    $filename = 'dump.xml';
    $session->getRootNode()->addNode('imported_data', 'nt:unstructured');
    $session->importXML(
        '/imported_data', // attach the imported data at this node
        $filename,
        ImportUUIDBehaviorInterface::IMPORT_UUID_CREATE_NEW
    );

When importing nodes with a uuid, a couple of different behaviors can be used:

Observation

Observation enables an application to receive notifications of persistent changes to a workspace. JCR defines a general event model and specific APIs for asynchronous and journaled observation. A repository may support asynchronous observation, journaled observation or both.

Note that Jackrabbit supports the full observation API but Jackalope currently only implements event journal reading.

Write operations in Jackalope will generate journal entries as expected.

    <?php
    use PHPCR\Observation\EventInterface; // Contains the constants for event types

    // Get the observation manager
    $workspace = $session->getWorkspace();
    $observationManager = $workspace->getObservationManager();

    // Get the unfiltered event journal and go through its content
    $journal = $observationManager->getEventJournal();
    $journal->skipTo(strtotime('-1 day')); // Skip all the events prior to yesterday
    foreach ($journal as $event) {
        // Do something with $event (it's a Jackalope\Observation\Event instance)
        echo $event->getType() . ' - ' . $event->getPath();
    }

    // Filtering and using the journal as an iterator
    // You can filter the event journal on several criteria, here we keep events for node and properties added
    $journal = $observationManager->getEventJournal(EventInterface::NODE_ADDED | EventInterface::PROPERTY_ADDED);

    while ($journal->valid()) {
        $event = $journal->current();
        // Do something with $event
        $journal->next();
    }

Node Types

PHPCR supports node types. Node types define what properties and children a node can or must have. The JCR specification explains exhaustivly what node types exist and what they are required to have or not: JCR 2.0: 3.7.11 Standard Application Node Types

In a nutshell: * nt:unstructured does not define any required properties but allows any property or child. * nt:file and nt:folder are built-in node types useful to map a file structure in the repository. (With jackalope-jackrabbit, files and folders are exposed over webdav) * for your own things, use nt:unstructured and PHPCR will behave like a NoSQL database * if you need to store additional properties or children on existing node types like files, note that while a node can have only one primary type, every node can have any mixin types. Define a mixin type declaring your additional properties, register it with PHPCR and addMixin it to the nodes that need it.

You can define your own node types if you want the equivalent of a strictly defined database structure. See JCR 2.0: 3.7 Node Types and JCR 2.0: 19 Node Type Management / PHPCR Node Type Namespace.

Performance considerations

While PHPCR can perform reasonably well, you should be careful. You are working with an object model mapping interlinked data. Implementations are supposed to lazy load data only when necessary. But you should take care to only request what you actually need.

The implementations will also use some sort of storage backend (Jackrabbit, (no)SQL database, …). There might be a huge performance impact in configuring that storage backend optimally. Look into your implementation documentation if there are recommendations how to optimize storage.

One thing not to worry about is requesting the same node with Session::getNode or Node::getNode/s several times. You always get the same object instance back without overhead.

Only request what you need

Remember that you can filter nodes on Node::getNodes if you only need a list of specific nodes or all nodes in some namespace.

The values of binary properties can potentially have a huge size and should only loaded when really needed. If you just need the size, you can get the property instance and do a $property->getSize() instead of filesize($node->getPropertyValue). Any decent implementation will not preload the binary stream when you access the property object.

When getting the properties from a node, you can use Node::getPropertiesValues(filter, false). This allows the implementation to avoid instantiating Property objects for the property values (and saves you coding). The second boolean parameter tells wheter to dereference reference properties. If you do not need the referenced objects, pass false and you will get the UUID or path strings instead of node objects.(If you need one of them, you can still get it with Session::getNodeByIdentifier. But then the implementation will certainly not be able to optimize if you get several referenced nodes.)

But request in one call as much as possible of what you need

If you need to get several nodes where you know the paths, use Session::getNodes with an array of those nodes to get all of them in one batch, saving round trip time to the storage backend.

Also use Node::getNodes with a list of nodes rather than repeatedly calling Node::getNode.

TODO: intelligent filtering criteria to do as little in-memory operations to apply criteria.

If you do not need the node objects but just some value, query for that value and use the result Row to avoid instantiating Node objects alltogether. If you need the Node objects, help PHPCR to optimize by using QueryResult::getNodes and iterating over the nodes instead of getting the rows, iterating over them and calling getNode on each row. (Actually, if you first do the getNodes(), you can then iterate over the rows and get the individual nodes and still use the special row methods as the implementation should have prefetched data on the getNodes.)

Conclusions

We hope this tutorial helps to get you started. If you miss anything, have suggestions or questions, please contact us on jackalope-dev@googlegroups.com or #jackalope on irc.freenode.net

Further reading

Browse through the API documentation to see what each of the core elements mentioned in the introduction can do.

To fully understand the concepts behind the content repository API, we suggest reading the Java content repository specification and then the simplifications we did for PHP.

Not yet implemented

A couple of other advanced functionalities are defined by the API. They are not yet implemented in any PHPCR implementation. This document will be updated once there is an implementation for them.