Rectangle 27 94

Module / Test folder organization in the file system

I'll start of by linking to the manual and then going into what I've seen and heard in the field.

My recommended approach is combining the file system with an xml config.

tests/
 \ unit/
 | - module1
 | - module2
 - integration/
 - functional/

with a phpunit.xml with a simple:

<testsuites>
  <testsuite name="My whole project">
    <directory>tests</directory>
  </testsuite>
</testsuites>

you can split the testsuites if you want to but thats a project to project choice.

Running phpunit will then execute ALL tests and running phpunit tests/unit/module1 will run all tests of module1.

source/
tests/unit/

You have one TestClass per ProductionClass anyways so it's a good approach in my book.

It's not going to work anyways if you have more than one test class in one file so avoid that pitfall.

  • Don't have a test namespace

It just makes writing the test more verbose as you need an additional use statement so I'd say the testClass should go in the same namespace as the production class but that is nothing PHPUnit forces you to do. I've just found it to be easier with no drawbacks.

phpunit --filter Factory
phpunit tests/unit/logger/

You can use @group tags for something like issue numbers, stories or something but for "modules" I'd use the folder layout.

It can be useful to create multiple xml files if you want to have:

  • one just for the unit tests (but not for the functional or integration or long running tests)
their phpunit.xmls

As it is related to starting a new project with tests:

  • My suggestion is to use @covers tags like described in my blog (Only for unit tests, always cover all non public functions, always use covers tags.
  • Don't generate coverage for your integration tests. It gives you a false sense of security.
  • Always use whitelisting to include all of your production code so the numbers don't lie to you!

You don't need any sort of auto loading for your tests. PHPUnit will take care of that.

Use the <phpunit bootstrap="file"> attribute to specify your test bootstrap. tests/bootstrap.php is a nice place to put it. There you can set up your applications autoloader and so on (or call your applications bootstrap for that matter).

  • Use the xml configuration for pretty much everything
  • Your unit test folders should mirror your applications folder structure
phpunit --filter
phpunit tests/unit/module1
  • Use the strict mode from the get go and never turn it off.

@TheCandyMan My pleasure; Feel free to join the php chat on SO or freenode #phpunit if you run into issues with that ;)

@cmt Thanks for the update. I've update the link in the post to my new blog location: edorian.github.io/ - Cheers!

@edorian where i should placed phpunit.xml file and where should i call xml file.

php - PHPUnit best practices to organize tests - Stack Overflow

php phpunit
Rectangle 27 9

I think your best option is to use the new File API in Javascript. Is has a lot of functions to read files from the file system.

<input type="file" id="fileinput" multiple />
<script type="text/javascript">
  function readMultipleFiles(evt) {
    //Retrieve all the files from the FileList object
    var files = evt.target.files; 

    if (files) {
        for (var i=0, f; f=files[i]; i++) {
              var r = new FileReader();
            r.onload = (function(f) {
                return function(e) {
                    var contents = e.target.result;
                    alert( "Got the file.n" 
                          +"name: " + f.name + "n"
                          +"type: " + f.type + "n"
                          +"size: " + f.size + " bytesn"
                          + "starts with: " + contents.substr(1, contents.indexOf("n"))
                    ); 
                };
            })(f);

            r.readAsText(f);
        }   
    } else {
          alert("Failed to load files"); 
    }
  }

  document.getElementById('fileinput').addEventListener('change', readMultipleFiles, false);
</script>

You can find a good explanation and helpful code here.

This helped me, I just posted the modified code in the post :) Thanks.

@Alin: Very good. If you want to, please post the code you are using in a separate answer if you'd like to. Don't post 'answer' code in the question.

javascript - Get all images from local folder - Stack Overflow

javascript jquery ajax
Rectangle 27 11

FileReader reads from files on the file system.

Perhaps you intended to use something like this to load a file from the class path

// this will look in src/main/resources before building and myjar.jar! after building.
InputStream is = MyClass.class.getClassloader()
                     .getResourceAsStream("config.txt");

Or you could extract the file from the jar before reading it.

src/main/resources does not work cause they will be copied to the target/classes folder and be packaged into the jar.

I have corrected that. Thank you. It works if you have . in your class path but for the wrong reasons. i.e. it reads the original rather than the copy.

getClassLoader()

java - Resource from src/main/resources not found after building with ...

java maven maven-2
Rectangle 27 4

As a rule of thumb, don't save files in the database.

With Web servers, store images and other binary assets as files, with the path name stored in the database rather than the file itself. Most Web servers are better at caching files than database contents, so using files is generally faster. (Although you must handle backups and storage issues yourself in this case.)

Works fine, but take so much time that i expected. Hence, image are 33% bigger size, and totally looks bulgy.

As you discovered, unwanted overhead in encoding/decoing + extra space used up which means extra data transfer back and forth as well.

down vote As @mike-m has mentioned. Base64 encoding is not a compression method. Why use Base64 encoding is also answered by a link that @mike-m posted What is base 64 encoding used for?.

In short there is nothing to gain and much to loose by base64 encoding images before storing them on the file system be it S3 or otherwise.

What about Gzip or other forms of compression without involving base64. Again the answer is that there is nothing to gain and much to lose. For example I just gzipped a 1941980 JPEG image and saved 4000 bytes that's 0.2% saving.

The reason is that images are already in compressed formats. They cannot be compressed any further.

When you store images without compression they can be delivered directly to browsers and other clients and they can be cached. If they are compressed (or base64 encoded) they need to be decompressed by your app.

Modern browsers are able to display base64 images embedded to the HTML but then they cannot be cached and the data is about 30% larger than it needs to be.

User can post there data and image and all are secure.

I presume that you mean a user can download images that belong to him or shared with him. This can be easily achieved by savings the files off the webspace in the file system and saving only the path in the database. Then the file is sent to the client (after doing the required checks) with fpassthru

How they take care about images file. In performance issue, when large user involved, it seams to me, i need 100000 folder for 100000 user and their sub folder. When large amount of user browse same root folder, how file system process each unique folder.

Use a CDN or use a file system that's specially suited for this like BTRFS

Yes Indeed. Use it to the fullest by saving all the information about the file and it's file path in the database. Then save the file itself in the file system. You get best of both worlds.

php - Slowness found when base 64 image select and encode from databas...

php mysql angularjs ionic-framework base64
Rectangle 27 15

Some browsers implement strong security measures to prevent downloaded webpages from accessing arbitrary files on the file system.

Switch to a browser with weaker security (I think Firefox permits access to local files via XHR) or stop trying to run a website without HTTP.

chrome --allow-file-access-from-files

html - Ajax in Jquery does not work from local file - Stack Overflow

jquery html ajax local
Rectangle 27 14

Some browsers implement strong security measures to prevent downloaded webpages from accessing arbitrary files on the file system.

Switch to a browser with weaker security (I think Firefox permits access to local files via XHR) or stop trying to run a website without HTTP.

chrome --allow-file-access-from-files

html - Ajax in Jquery does not work from local file - Stack Overflow

jquery html ajax local
Rectangle 27 86

Instead store the 'default' file inside the Jar. If it is changed, store the altered file in another place. One common place is a sub-directory of user.home. When checking for the file, first check the existence of an altered file on the file system, and if it does not exist, load the default file.

Note that it is generally better to describe the goal, rather than the strategy. 'Store changed file in Jar' is a strategy, whereas 'Save preferences between runs' might be the goal.

I just wish enough people understood that note so that I did not have to be constantly suggesting it ( often to no good effect - BTW ;).

Whatever is called strategy can also be a goal, as programs can be data.

java - How can an app use files inside the JAR for read and write? - S...

java jar inputstream outputstream
Rectangle 27 34

Method 1 : Serving the file directly via HTTP

Node.js does not run in a browser, therefore you will not have a document object available. Actually, you will not even have a DOM tree at all. If you are a bit confused at this point, I encourage you to read more about it before going further.

There are a few methods you can choose from to do what you want.

Because you wrote about opening the file in the browser, why don't you use a framework that will serve the file directly as an HTTP service, instead of having a two-step process? This way, your code will be more dynamic and easily maintainable (not mentioning your HTML always up-to-date).

There are plenty frameworks out there for that :

The most basic way you could do what you want is this :

var http = require('http');

http.createServer(function (req, res) {
  var html = buildHtml(req);

  res.writeHead(200, {
    'Content-Type': 'text/html',
    'Content-Length': html.length,
    'Expires': new Date().toUTCString()
  });
  res.end(html);
}).listen(8080);

function buildHtml(req) {
  var header = '';
  var body = '';

  // concatenate header string
  // concatenate body string

  return '<!DOCTYPE html>'
       + '<html><header>' + header + '</header><body>' + body + '</body></html>';
};

And access this HTML with http://localhost:8080 from your browser.

(Edit: you could also serve them with a small HTTP server.)

If what you are trying to do is simply generating some HTML files, then go simple. To perform IO access on the file system, Node has an API for that, documented here.

var fs = require('fs');

var fileName = 'path/to/file';
var stream = fs.createWriteStream(fileName);

stream.once('open', function(fd) {
  var html = buildHtml();

  stream.end(html);
});

This is the most basic Node.js implementation and requires the invoking application to handle the output itself. To output something in Node (ie. to stdout), the best way is to use console.log(message) where message is any string, or object, etc.

var html = buildHtml();

console.log(html);
buildHtml

If your script is called html-generator.js (for example), in Linux/Unix based system, simply do

$ node html-generator.js > path/to/file

Because Node is a modular system, you can even put the buildHtml function inside it's own module and simply write adapters to handle the HTML however you like. Something like

var htmlBuilder = require('path/to/html-builder-module');

var html = htmlBuilder(options);
...

You have to think "server-side" and not "client-side" when writing JavaScript for Node.js; you are not in a browser and/or limited to a sandbox, other than the V8 engine.

Extra reading, learn about npm. Hope this helps.

I find no reason for a Document object not to exist on the server. Let's not forget what window.document is on the client: An XML Document instance with methods that allow you DOM manipulation. We should (and can) do DOM manipulation on the server using any language

@Loupax, the question, here, is not whether you can or can't, but if there is one by default. Server side JavaScript do not have a global window object. It does not have a Document either. It can have either, but don't need any. Therefore, and unless you create or add an implementation of either, they won't exist. Try not confuse novice programmers with Node.js. There must be a distinction between server side programming and client side; both are using the same language and syntax, but are not sharing the same privileges and container.

Maybe, but I find it a bit wrong when the question was about "how can I do DOM manipulation on the server" and all answers are "you cannot" when you clearly can. You might need to install a library but you can. And this is worth mentioning at least IMO

@Loupax I believe you have misread the question. It is not about DOM manipulation, but about generating HTML. The only mention of the Document object is what the OP tried to do, which is why I said "you cannot do that", and I am 100% correct about that; he was confusing JavaScript for the browser, and JavaScript in a Node.js application. I repeat, this has nothing to do with whether you can or can't do DOM manipulation. Now, please re-read the question and my answer and stop arguing about it.

javascript - Node.js Generate html - Stack Overflow

javascript html node.js
Rectangle 27 33

Method 1 : Serving the file directly via HTTP

Node.js does not run in a browser, therefore you will not have a document object available. Actually, you will not even have a DOM tree at all. If you are a bit confused at this point, I encourage you to read more about it before going further.

There are a few methods you can choose from to do what you want.

Because you wrote about opening the file in the browser, why don't you use a framework that will serve the file directly as an HTTP service, instead of having a two-step process? This way, your code will be more dynamic and easily maintainable (not mentioning your HTML always up-to-date).

There are plenty frameworks out there for that :

The most basic way you could do what you want is this :

var http = require('http');

http.createServer(function (req, res) {
  var html = buildHtml(req);

  res.writeHead(200, {
    'Content-Type': 'text/html',
    'Content-Length': html.length,
    'Expires': new Date().toUTCString()
  });
  res.end(html);
}).listen(8080);

function buildHtml(req) {
  var header = '';
  var body = '';

  // concatenate header string
  // concatenate body string

  return '<!DOCTYPE html>'
       + '<html><header>' + header + '</header><body>' + body + '</body></html>';
};

And access this HTML with http://localhost:8080 from your browser.

(Edit: you could also serve them with a small HTTP server.)

If what you are trying to do is simply generating some HTML files, then go simple. To perform IO access on the file system, Node has an API for that, documented here.

var fs = require('fs');

var fileName = 'path/to/file';
var stream = fs.createWriteStream(fileName);

stream.once('open', function(fd) {
  var html = buildHtml();

  stream.end(html);
});

This is the most basic Node.js implementation and requires the invoking application to handle the output itself. To output something in Node (ie. to stdout), the best way is to use console.log(message) where message is any string, or object, etc.

var html = buildHtml();

console.log(html);
buildHtml

If your script is called html-generator.js (for example), in Linux/Unix based system, simply do

$ node html-generator.js > path/to/file

Because Node is a modular system, you can even put the buildHtml function inside it's own module and simply write adapters to handle the HTML however you like. Something like

var htmlBuilder = require('path/to/html-builder-module');

var html = htmlBuilder(options);
...

You have to think "server-side" and not "client-side" when writing JavaScript for Node.js; you are not in a browser and/or limited to a sandbox, other than the V8 engine.

Extra reading, learn about npm. Hope this helps.

I find no reason for a Document object not to exist on the server. Let's not forget what window.document is on the client: An XML Document instance with methods that allow you DOM manipulation. We should (and can) do DOM manipulation on the server using any language

@Loupax, the question, here, is not whether you can or can't, but if there is one by default. Server side JavaScript do not have a global window object. It does not have a Document either. It can have either, but don't need any. Therefore, and unless you create or add an implementation of either, they won't exist. Try not confuse novice programmers with Node.js. There must be a distinction between server side programming and client side; both are using the same language and syntax, but are not sharing the same privileges and container.

Maybe, but I find it a bit wrong when the question was about "how can I do DOM manipulation on the server" and all answers are "you cannot" when you clearly can. You might need to install a library but you can. And this is worth mentioning at least IMO

@Loupax I believe you have misread the question. It is not about DOM manipulation, but about generating HTML. The only mention of the Document object is what the OP tried to do, which is why I said "you cannot do that", and I am 100% correct about that; he was confusing JavaScript for the browser, and JavaScript in a Node.js application. I repeat, this has nothing to do with whether you can or can't do DOM manipulation. Now, please re-read the question and my answer and stop arguing about it.

javascript - Node.js Generate html - Stack Overflow

javascript html node.js
Rectangle 27 2

The git repository distinguishes files by checksum, not by name or location. If you commit, and then move a file to a different location and commit, the file in the before location and the file in the after location have the same checksum (because they have the same content). Therefore the repository does not store a new "copy" of the file; it merely records the fact that a file with this checksum now has the second location.

But if you move the file in the same commit, it will detect it as a move and keep the history appropriately.

@siride That's not the point. The O.P. said "If [git] treats moved files as copies, then you could have a repo easily get very large even though you didn't actually add any new files." Well, I'm telling him that it doesn't so it won't. Not for that reason, anyway. Under what circumstances git may or may not correctly trace the history of what a human being would think of as "the same" file thru the renaming / moving process is a whole different (and very messy) ball of wax. Sometimes the magic works, sometimes it doesn't.

How does git handle moving files in the file system? - Stack Overflow

git
Rectangle 27 3

Applications cannot write to the file system in any of the runtime environments. An application can read files, but only files uploaded with the application code. The app must use the App Engine datastore, memcache or other services for all data that persists between requests. The Python 2.7 environment allows bytecode to be read, written, and modified.

You'll need to return to using the blobstore, or try the Google Cloud Storage API, depending on the needs of your application.

Want to upload a file to directory in google appengine using python - ...

google-app-engine
Rectangle 27 25

Surely it is possible to write on the file system of Azure Websites. However your write permissions are limited to the root folder of your app. So, if you use ASP.NET, you shall be able to write anywhere within the Server.MapPath("~/from_here_on"). Meaning you shall be able to perform read/write/delete operations on files which are located in the root folder of your app and below. If you use PHP, the root folder can be get from $_SERVER['DOCUMENT_ROOT'] environment variable.

And a web application shall not need more privileges. For sure will not be able to write on the operating system folders.

"Shall not need" is not quite accurate. Not all web apps store everything in the public directory. Many frameworks store cached templates outside of the document root.

That'S the broken assumption of the PHP world - it assumes it can write everywhere. Anyway, I am not 100% sure, and it can be tested, but I think that the user the site runs under, can also write in one directory up from the document root. On the other hand, you can fairly easily make a sub-folder of DOCUMENT_ROOT be a non-public - not served by the web server at all. You can do this with Apache's .htaccess, you can also do it with IIS'es web.config. And if you cannot instruct your framework where to write its temp, then there is something terribly broken at its design.

It's not a design issue, PHP should be allowed to do exactly what any other server language does. Assuming that it is document-root centric is the broken assumption.

well, in my web server setup, I would be really concerned if any web worker has a write permissions to anything but the System's TEMP and its document root. But that's just me. Others may decide to run the web worker with a root account ...

You've incorretly assumed that I'm advocating running using a root account. Massive difference. Permissions still apply here - as they should. The issue at hand here is limiting access to the filesystem under to be only under the document-root.

filesystems - Can I write to file system on azure web site? - Stack Ov...

azure filesystems azure-web-app-service
Rectangle 27 61

In general, there's no way to make arbitrary edits in the middle of a file. It's not a deficiency of Ruby. It's a limitation of the file system: Most file systems make it easy and efficient to grow or shrink the file at the end, but not at the beginning or in the middle. So you won't be able to rewrite a line in place unless its size stays the same.

There are two general models for modifying a bunch of lines. If the file is not too large, just read it all into memory, modify it, and write it back out. For example, adding "Kilroy was here" to the beginning of every line of a file:

path = '/tmp/foo'
lines = IO.readlines(path).map do |line|
  'Kilroy was here ' + line
end
File.open(path, 'w') do |file|
  file.puts lines
end

Although simple, this technique has a danger: If the program is interrupted while writing the file, you'll lose part or all of it. It also needs to use memory to hold the entire file. If either of these is a concern, then you may prefer the next technique.

You can, as you note, write to a temporary file. When done, rename the temporary file so that it replaces the input file:

require 'tempfile'
require 'fileutils'

path = '/tmp/foo'
temp_file = Tempfile.new('foo')
begin
  File.open(path, 'r') do |file|
    file.each_line do |line|
      temp_file.puts 'Kilroy was here ' + line
    end
  end
  temp_file.close
  FileUtils.mv(temp_file.path, path)
ensure
  temp_file.close
  temp_file.unlink
end

Since the rename (FileUtils.mv) is atomic, the rewritten input file will pop into existence all at once. If the program is interrupted, either the file will have been rewritten, or it will not. There's no possibility of it being partially rewritten.

The ensure clause is not strictly necessary: The file will be deleted when the Tempfile instance is garbage collected. However, that could take a while. The ensure block makes sure that the tempfile gets cleaned up right away, without having to wait for it to be garbage collected.

you are about to close the temp_file, why rewind it?

@hihell, BookOfGreg's edit added the rewind; his remark was: "FileUtils.mv will write a blank file unless the temporary file is rewound. Also best practice is to make sure temp file is closed and unlinked after usage."

What happens in the second scenario to the file's created date? Will FileUtils.mv cause us to end up with a file that looks as if it had been created just now? If so, that's a very big difference between the two scenarios (as the first one leaves the file created date alone).

@Matt I've never thought about it this technique's effect upon the creation date, but it seems obvious that you are correct.

Read, edit, and write a text file line-wise using Ruby - Stack Overflo...

ruby file io
Rectangle 27 2

Obviously, when you're taring a file, it must be read by the process running tar. This is exactly what happens on my system. I created a 512-byte file from /dev/urandom and ran tar -cf file.tar file.xyz. After filtering out all the noise related to loading libraries into the process' image, you can see the actual relevant lines that strace reports :

creat("file.tar", 0666)                 = 3

We can see that the output file from the tar command is being created with read/write permissions for the owner, group, and world (which is probably influenced by the umask reported by your shell), and the new file's descriptor inside this process is 3.

openat(AT_FDCWD, "file.xyz", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 4

Here, the file to be archived is opened and assigned the file descriptor 4.

fstat(4, {st_mode=S_IFREG|0644, st_size=512, ...}) = 0

tar calls fstat on an open file descriptor in order to find out if the file is readable and its size (probably).

read(4, "\225\243\263uG\320-\354!%\337\3376\311\210&\377T=aiO\10\203\375|y\304\231\203x."..., 512) = 512

We can see the file being actually read.

close(4)                                = 0
write(3, "file.xyz\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240) = 10240

The file referenced by descriptor 3 - our output file - is being written to. We can't really see the contents of file.xyz in the write call, but this is probably because of the structure of the tar file.

close(3)                                = 0

Now, the output file is closed, as well as the whole process (not shown here).

Interestingly, at first I created an empty file with touch, and tried to tar it. However, it seems like tar checks if the file is empty and, if it is, does not insert the data inside the tar archive. newfstatat returns the information about the size, which tar probably uses to make this decision.

However, you should really read the source to see how the actual execution looks. It is possible that, for example, files which are much larger are mmaped into the process, and read this way, while smaller files are simply read with read.

Great answer. Can you confirm that it is actually being mmapped? Wouldn't strace show the file being opened with mmap as well?

Of course it would, since mmap is a system call just like all the others. This is just a speculation - I didn't try to strace the program with larger input files.

linux - How does tar read files to create an archive? - Stack Overflow

linux tar
Rectangle 27 31

There's really nothing wrong with this, it's just a question of whether you call it a unit test or an integration test. You just have to make sure that if you do interact with the file system, there are no unintended side effects. Specifically, make sure that you clean up after youself -- delete any temporary files you created -- and that you don't accidentally overwrite an existing file that happened to have the same filename as a temporary file you were using. Always use relative paths and not absolute paths.

It would also be a good idea to chdir() into a temporary directory before running your test, and chdir() back afterwards.

+1, however note that chdir() is process-wide so you might break the ability to run your tests in parallel, if your test framework or a future version of it supports that.

Unit testing code with a file system dependency - Stack Overflow

unit-testing dependency-injection dependencies
Rectangle 27 4

You should read up on the File System support in node.js.

The following method is probably the simplest way to do what you want, but it is not necessarily the most efficient, since it creates/opens, updates, and then closes the file every time.

function myWrite(data) {
    fs.appendFile('output.txt', data, function (err) {
      if (err) { /* Do whatever is appropriate if append fails*/ }
    });
}

is it a must that i hv to be prepare when i encounter an error? i have no idea wt to do?

You could just output an error to the console, if you don't know what else to do. For example: console.error("Failed to output data. Data: %s, Error: %s", data, err);

same problem here, i dont hv any error, but i couldnt find my file, im not sure if the code is working

You can explicitly specify the full path for the file, so that you know where it goes.

Writing data to text file in Node.js - Stack Overflow

node.js file-io
Rectangle 27 4

You should read up on the File System support in node.js.

The following method is probably the simplest way to do what you want, but it is not necessarily the most efficient, since it creates/opens, updates, and then closes the file every time.

function myWrite(data) {
    fs.appendFile('output.txt', data, function (err) {
      if (err) { /* Do whatever is appropriate if append fails*/ }
    });
}

is it a must that i hv to be prepare when i encounter an error? i have no idea wt to do?

You could just output an error to the console, if you don't know what else to do. For example: console.error("Failed to output data. Data: %s, Error: %s", data, err);

same problem here, i dont hv any error, but i couldnt find my file, im not sure if the code is working

You can explicitly specify the full path for the file, so that you know where it goes.

Writing data to text file in Node.js - Stack Overflow

node.js file-io
Rectangle 27 5

Profiling with XDebug

An extension to PHP called Xdebug is available to assist in profiling PHP applications, as well as runtime debugging. When running the profiler, the output is written to a file in a binary format called "cachegrind". Applications are available on each platform to analyze these files. Node application code changes are necessary to perform this profiling.

To enable profiling, install the extension and adjust php.ini settings. Some Linux distributions come with standard packages (e.g. Ubuntu's php-xdebug package). In our example we will run the profile optionally based on a request parameter. This allows us to keep settings static and turn on the profiler only as needed.

// Set to 1 to turn it on for every request
xdebug.profiler_enable = 0
// Let's use a GET/POST parameter to turn on the profiler
xdebug.profiler_enable_trigger = 1
// The GET/POST value we will pass; empty for any value
xdebug.profiler_enable_trigger_value = ""
// Output cachegrind files to /tmp so our system cleans them up later
xdebug.profiler_output_dir = "/tmp"
xdebug.profiler_output_name = "cachegrind.out.%p"

Next use a web client to make a request to your application's URL you wish to profile, e.g.

http://example.com/article/1?XDEBUG_PROFILE=1

As the page processes it will write to a file with a name similar to

/tmp/cachegrind.out.12345

By default the number in the filename is the process id which wrote it. This is configurable with the xdebug.profiler_output_name setting.

Note that it will write one file for each PHP request / process that is executed. So, for example, if you wish to analyze a form post, one profile will be written for the GET request to display the HTML form. The XDEBUG_PROFILE parameter will need to be passed into the subsequent POST request to analyze the second request which processes the form. Therefore when profiling it is sometimes easier to run curl to POST a form directly.

Once written the profile cache can be read by an application such as KCachegrind.

  • Call time, both itself and inclusive of subsequent function calls
  • Number of times each function is called
  • Slow-running functions. Where is the application spending most of its time? the best payoff in performance tuning is focusing on those parts of the application which consume the most time.

Note: Xdebug, and in particular its profiling features, are very resource intensive and slow down PHP execution. It is recommended to not run these in a production server environment.

profiling - Simplest way to profile a PHP script - Stack Overflow

php profiling
Rectangle 27 24

  • But it's stored by its object-ID, which is unique for whatever data is in the file.

Let's say you have a new repo with one huge file in it:

$ mkdir temp; cd temp; git init
$ echo contents > bigfile; git add bigfile; git commit -m initial
[master (root-commit) d26649e] initial
 1 file changed, 1 insertion(+)
 create mode 100644 bigfile

The repo now has one commit, which has one tree (the top level directory), which has one file, which has some unique object-ID. (The "big" file is a lie, it's quite small, but it would work the same if it were many megabytes.)

Now if you copy the file to a second version and commit that:

$ cp bigfile bigcopy; git add bigcopy; git commit -m 'make a copy'
[master 971847d] make copy
 1 file changed, 1 insertion(+)
 create mode 100644 bigcopy

the repository now has two commits (obviously), with two trees (one for each version of the top level directory), and one file. The unique object-ID is the same for both copies. To see this, let's view the latest tree:

$ git cat-file -p HEAD:
100644 blob 12f00e90b6ef79117ce6e650416b8cf517099b78    bigcopy
100644 blob 12f00e90b6ef79117ce6e650416b8cf517099b78    bigfile

That big SHA-1 12f00e9... is the unique ID for the file contents. If the file really were enormous, git would now be using about half as much repo space as the working directory, because the repo has only one copy of the file (under the name 12f00e9...), while the working directory has two.

If you change the file contents, thougheven one single bit, like making a lowercase letter uppercase or somethingthen the new contents will have a new SHA-1 object-ID, and need a new copy in the repo. We'll get to that in a bit.

Now, suppose you have a more complicated directory structure (a repo with more "tree" objects). If you shuffle files around, but the contents of the "new" file(s)under whatever name(s)in new directories are the same as the contents that used to be in old ones, here's what happens internally:

$ mkdir A B; mv bigfile A; mv bigcopy B; git add -A .
$ git commit -m 'move stuff'
[master 82a64fe] move stuff
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename bigfile => A/bigfile (100%)
 rename bigcopy => B/bigcopy (100%)

Git has detected the (effective) rename. Let's look at one of the new trees:

$ git cat-file -p HEAD:A
100644 blob 12f00e90b6ef79117ce6e650416b8cf517099b78    bigfile

The file is still under the same old object-ID, so it's still only in the repo once. It's easy for git to detect the rename, because the object-ID matches, even though the path name (as stored in these "tree" objects) might not. Let's do one last thing:

$ mv B/bigcopy B/two; git add -A .; git commit -m 'rename again'
[master 78d92d0] rename again
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename B/{bigcopy => two} (100%)

Now let's ask for a diff between HEAD~2 (before any renamings) and HEAD (after renaming):

$ git diff HEAD~2 HEAD
diff --git a/bigfile b/A/bigfile
similarity index 100%
rename from bigfile
rename to A/bigfile
diff --git a/bigcopy b/B/two
similarity index 100%
rename from bigcopy
rename to B/two

Even though it was done in two steps, git can tell that to go from what was in HEAD~2 to what is now in HEAD, you can do it in one step by renaming bigcopy to B/two.

Git always does dynamic rename detection. Suppose that instead of doing renames, we'd removed the files entirely at some point, and committed that. Later, suppose put the same data back (so that we got the same underlying object IDs), and then diffed a sufficiently old version against the new one. Here git would say that to go directly from the old version to the newest, you could just rename the files, even if that's not how we got there along the way.

In other words, the diff is always done commit-pair-wise: "At some time in the past, we had A. Now we have Z. How do I go directly from A to Z?" At that time, git checks for the possibility of renames, and produces them in the diff output as needed.

Git will still (sometimes) show renames even if there's some small change to a file's contents. In this case, you get a "similarity index". Basically, you can tell git that given "some file deleted in rev A, some differently-named file added in rev Z" (when diffing revs A and Z), it should try diffing the two files to see if they're "close enough". If they are, you'll get a "file renamed and then changed" diff. The control for this is the -M or --find-renames argument to git diff: git diff -M80 says to show the change as rename-and-edit if the files are at least "80% similar".

Git will also look for "copied then changed", with the -C or --find-copies flag. (You can add --find-copies-harder to do a more computationally-expensive search against all files; see the documentation.)

This relates (indirectly) to how git keeps repositories from blowing up in size over time, as well.

If you have a large file (or even a small file) and make a small change in it, git will store two complete copies of the file, using those object-IDs. You find these things in .git/objects; for instance, that file whose ID is 12f00e90b6ef79117ce6e650416b8cf517099b78 is in .git/objects/12/f00e90b6ef79117ce6e650416b8cf517099b78. They're compressed to save space, but even compressed, a big file can still be pretty big. So, if the underlying object is not very active and appears in a lot of commits with only a few small changes every now and then, git has a way to compress the modifications even further. It puts them into "pack" files.

In a pack file, the object gets further compressed by comparing it to other objects in the repository.1 For text files it's simple to explain how this works (although the delta compression algorithm is different): if you had a long file and removed line 75, you could just say "use that other copy we have over there, but remove line 75." If you added a new line, you could say "use that other copy, but add this new line." You can express large files as sequences of instructions, using other large files as the basis.

Git does this sort of compression for all objects (not just files), so it can compress a commit against another commit, or trees against each other, too. It's really quite efficient, but with one problem.

Some (not all) binary files delta-compress very badly against each other. In particular, with a file that is compressed with something like bzip2, gzip, or zip, making one small change anywhere tends to change the rest of the file as well. Images (jpg's, etc) are often compressed and suffer from this sort of effect. (I don't know of many uncompressed image formats. PBM files are completely uncompressed, but that's the only one I know of off-hand that is still in use.)

If you make no changes at all to binary files, git compresses them super-efficiently because of the unchanging low-level object-IDs. If you make small changes, git's compression algorithms can (not necessarily "will") fail on them, so that you get multiple copies of the binaries. I know that large gzip'ed cpio and tar archives do very badly: one small change to such a file and a 2 GB repo becomes a 4 GB repo.

Whether your particular binaries compress well or not is something you'd have to experiment with. If you're just renaming the files, you should have no problem at all. If you're changing large JPG images often, I would expect this to perform poorly (but it's worth experimenting).

1In "normal" pack files, an object can only be delta-compressed against other objects in the same pack file. This keeps the pack files stand-alone, as it were. A "thin" pack can use objects not in the pack-file itself; these are meant for incremental updates over networks, for instance, as with git fetch.

+1. I guess that is why bup (stackoverflow.com/a/19494211/6309) redid the xdelta for packfile...

While the process of committing shows the "renaming" I am trying to see how I can look at a past commit and see what was considered a "rename" in that commit. Git show just seems to treat them as creates from what I can tell.

@OpenLearner: you can't look at a past commit in isolation, in order to ask whether there was a "rename": you have to compare it to something. You'll need to compare the commit to the one before or after it, probably in this case, "before". For instance, to compare what's three revs back from master with what was there before it: git diff master~4 master~3 (4-back being "one before" 3-back). Incidentally there's a special form of git log for that: git whatchanged.

git whatchanged

Couldn't remember off-hand (because I have been using git whatchanged for years) but git log --raw shows the same data, now that git whatchanged is "discouraged". Using --raw does a git diff-tree on each rev's top tree, against the previous rev's top tree (or the empty tree for the first commit). Note that you can add --diff-filter=... arguments as well.

How does git handle moving files in the file system? - Stack Overflow

git
Rectangle 27 61

node.js as a web server: express

Node.js is a javascript motor for the server side. In addition to all the js capabilities, it includes networking capabilities (like HTTP), and access to the file system. This is different from client-side js where the networking tasks are monopolized by the browser, and access to the file system is forbidden for security reasons.

Something that runs in the server, understands HTTP and can access files sounds like a web server. But it isn't one. To make node.js behave like a web server one has to program it: handle the incoming HTTP requests and provide the appropriate responses. This is what Express does: it's the implementation of a web server in js. Thus, implementing a web site is like configuring Express routes, and programming the site's specific features.

Serving pages involves a number of tasks. Many of those tasks are well known and very common, so node's Connect module (one of the many modules available to run under node) implements those tasks. See the current impressing offering:

  • logger request logger with custom format support
  • session session management support with bundled MemoryStore
  • staticCache memory cache layer for the static() middleware
  • limit limit the bytesize of request bodies

Connect is the framework and through it you can pick the (sub)modules you need. The Contrib Middleware page enumerates a long list of additional middlewares. Express itself comes with the most common Connect middlewares.

Install node.js. Node comes with npm, the node package manager. The command npm install -g express will download and install express globally (check the express guide). Running express foo in a command line (not in node) will create a ready-to-run application named foo. Change to its (newly created) directory and run it with node with the command node <appname>, then open http://localhost:3000 and see. Now you are in.

great reply thanks. This is the kind of simple crap every blog post misses, the simple setup which can be ??? if you've never done it before. Yea it's simple when you have already done it but you have no clue how to start for the FIRST time! I hate it when devs overlook that in blog posts, it's essential. I don't want to have to FIND another blog post just to find setup. Just provide a link to another blog post in your other posts, that's extremely helpful so I don't have to hunt around for one. Save me the hunting trip!

What is Node.js' Connect, Express and "middleware"? - Stack Overflow

node.js middleware